That worked thanks to you and lewis. One thing that came up was I first tried to delete the old /apache-solr-3.3.0/example/solr/data/index by renaming it and creating a new directory but solr wouldn't start.
After restoring the folder, changing solr schema.xml to <field name="content" type="text" stored="true" indexed="true"/> and then re-running /bin/nutch solrindex... it was OK. On Wed, Aug 3, 2011 at 2:42 PM, Way Cool <[email protected]> wrote: > Potentially you need to make two changes: > 1. As Lewis suggested, make sure to change the content field in > solr/conf/schema.xml as below: > <field name="content" type="text" stored="true" indexed="true"/> > 2. Append the following as a part of search url: > &hl=on&hl.fl=content site url title > OR > Add the following to solrconfig.xml as a part of browse search component if > you are using solr/browse: > <str name="hl">on</str> > <str name="hl.fl">url site title content</str> > > You should be able to see something like this when you search in Solr: > <lst name="highlighting"> > <lst name="http://thetechietutorials.blogspot.com/"><arr > name="content"><str>, June 15, 2011 A Custom <em>Solr</em> Search Component > example - RedirectSearchComponent Currently Apache > <em>Solr</em></str></arr></lst><lst name=" > > http://thetechietutorials.blogspot.com/2011/06/working-example-of-java-annotations.html > "><arr > name="content"><str>) ▼ June (5) A working example of Java Annotations A > Custom <em>Solr</em> Search Component example - Redirect</str></arr></lst> > ... > </lst> > > You can also look at my blog about a customized solr browser interface for > Nutch data if you are interested. Here is the url: > > http://thetechietutorials.blogspot.com/2011/07/customized-solr-browser-interface-for.html > > Thanks. > > On Wed, Aug 3, 2011 at 12:31 AM, Kiks <[email protected]> wrote: > > > This question was posted on solr list and not answered because nutch > > related... > > > > > > The indexed contents of 100 sites were imported to solr from nutch using: > > > > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb > crawl/linkdb > > crawl/segments/* > > > > now, a solr admin search for 'photography' includes these results: > > > > <doc> > > <float name="score">0.12570743</ > > float> > > <float name="boost">1.0440307</float> > > <str name="digest">94d97f2806240d18d67cafe9c34f94e1</str> > > <str name="id">http://www.galleryhopper.org/</str> > > <str name="segment">...</str> > > <str name="title">Gallery Hopper: Todd Walker's photography ephemera. > > Read, enjoy, share, discard.</str> > > <date name="tstamp">...</date> > > <str name="url">http://www.galleryhopper.org/</str> > > </doc> > > > > but highlighting options are on the title field not page text. > > > > My question: Where is the stored parsetext content of the pages? What is > > the > > solr command to send it from nutch with url/id key? The information is > > contained in the crawl segments with solr id field matching nutch url. > > > > Thanks. > > >

