That worked thanks to you and lewis.

One thing that came up was I first tried to delete the old
/apache-solr-3.3.0/example/solr/data/index
by renaming it and creating a new directory but solr wouldn't start.

After restoring the folder, changing solr schema.xml to
<field name="content" type="text" stored="true" indexed="true"/>

and then re-running /bin/nutch solrindex... it was OK.



On Wed, Aug 3, 2011 at 2:42 PM, Way Cool <[email protected]> wrote:

> Potentially you need to make two changes:
> 1. As Lewis suggested, make sure to change the content field in
> solr/conf/schema.xml as below:
> <field name="content" type="text" stored="true" indexed="true"/>
> 2. Append the following as a part of search url:
> &hl=on&hl.fl=content site url title
> OR
> Add the following to solrconfig.xml as a part of browse search component if
> you are using solr/browse:
>  <str name="hl">on</str>
>  <str name="hl.fl">url site title content</str>
>
> You should be able to see something like this when you search in Solr:
> <lst name="highlighting">
> <lst name="http://thetechietutorials.blogspot.com/";><arr
> name="content"><str>, June 15, 2011 A Custom <em>Solr</em> Search Component
> example - RedirectSearchComponent Currently Apache
> <em>Solr</em></str></arr></lst><lst name="
>
> http://thetechietutorials.blogspot.com/2011/06/working-example-of-java-annotations.html
> "><arr
> name="content"><str>) ▼  June (5) A working example of Java Annotations A
> Custom <em>Solr</em> Search Component example - Redirect</str></arr></lst>
> ...
> </lst>
>
> You can also look at my blog about a customized solr browser interface for
> Nutch data if you are interested. Here is the url:
>
> http://thetechietutorials.blogspot.com/2011/07/customized-solr-browser-interface-for.html
>
> Thanks.
>
> On Wed, Aug 3, 2011 at 12:31 AM, Kiks <[email protected]> wrote:
>
> > This question was posted on solr list and not answered because nutch
> > related...
> >
> >
> > The indexed contents of 100 sites were imported to solr from nutch using:
> >
> > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb
> crawl/linkdb
> > crawl/segments/*
> >
> > now, a solr admin search for 'photography' includes these results:
> >
> >  <doc>
> >    <float name="score">0.12570743</
> > float>
> >    <float name="boost">1.0440307</float>
> >    <str name="digest">94d97f2806240d18d67cafe9c34f94e1</str>
> >    <str name="id">http://www.galleryhopper.org/</str>
> >    <str name="segment">...</str>
> >    <str name="title">Gallery Hopper: Todd Walker's photography ephemera.
> > Read, enjoy, share, discard.</str>
> >    <date name="tstamp">...</date>
> >    <str name="url">http://www.galleryhopper.org/</str>
> >  </doc>
> >
> > but highlighting options are on the title field not page text.
> >
> > My question: Where is the stored parsetext content of the pages? What is
> > the
> > solr command to send it from nutch with url/id key? The information is
> > contained in the crawl segments with solr id field matching nutch url.
> >
> > Thanks.
> >
>

Reply via email to