My understanding is that I can use the index-state plugin if the
information doesn't change... remains static.... but, there will be a
different crawl_name every time I run a new crawl.  I'd like to take that
new crawl name and add it into solr somehow.

Is my understanding correct?  Or is there a way to override that field on a
per-crawl basis?

Thanks

On Wed, Apr 8, 2015 at 9:54 AM, Iain Lopata <[email protected]> wrote:

> Katrina,
>
> If I am understanding you correctly, you could do this with the
> index-static plugin which is configured with the following property:
>
> <property>
>   <name>index.static</name>
>   <value> fieldname:fieldcontent </value>
>   <description>
>   A simple plugin called at indexing that adds fields with static data.
>   You can specify a list of fieldname:fieldcontent per nutch job.
>   It can be useful when collections can't be created by urlpatterns,
>   like in subcollection, but on a job-basis.
>   </description>
> </property>
>
> Use crawlname as your fieldname and use a different config directory for
> each of your crawls with an appropriate value for fieldcontent set in each.
>
> Iain
>
> -----Original Message-----
> From: Katrina Riehl [mailto:[email protected]]
> Sent: Wednesday, April 8, 2015 9:41 AM
> To: [email protected]
> Subject: Re: Adding field to Nutch / Solr
>
> Right, I can create multiple collections no problem... but, what I'd
> really love is to put them into the same collection, just adding a field
> like "crawl_name" to the index.
>
> Any way I can do that?
>
> Thanks!
>
>
> On Wed, Apr 8, 2015 at 9:15 AM, Iain Lopata <[email protected]> wrote:
>
> > Katrina,
> >
> > When you specify the solr instance as the third parameter to bin/crawl
> > try  specifying the collection name in the path e.g.
> > http://localhost:8080/solr/collection1
> >
> > Iain
> >
> > -----Original Message-----
> > From: Katrina Riehl [mailto:[email protected]]
> > Sent: Wednesday, April 8, 2015 8:51 AM
> > To: [email protected]
> > Subject: Adding field to Nutch / Solr
> >
> > Hello,
> >
> > I am new to using Nutch.  I'm developing an application that crawls
> > websites, and then indexes information about those websites into a
> > Solr instance.  The problem is, it's putting all the crawled documents
> > into the same Solr collection.
> >
> > Is there a way for me to add a field specifying which crawl the index
> > came from?  Is there a command line option I can add when I start the
> crawl?
> >
> > Thank you so much for your help.
> >
> > --
> > Katrina Riehl
> > Continuum Analytics
> > [email protected]
> >
> >
>
>

Reply via email to