My understanding is that I can use the index-state plugin if the information doesn't change... remains static.... but, there will be a different crawl_name every time I run a new crawl. I'd like to take that new crawl name and add it into solr somehow.
Is my understanding correct? Or is there a way to override that field on a per-crawl basis? Thanks On Wed, Apr 8, 2015 at 9:54 AM, Iain Lopata <[email protected]> wrote: > Katrina, > > If I am understanding you correctly, you could do this with the > index-static plugin which is configured with the following property: > > <property> > <name>index.static</name> > <value> fieldname:fieldcontent </value> > <description> > A simple plugin called at indexing that adds fields with static data. > You can specify a list of fieldname:fieldcontent per nutch job. > It can be useful when collections can't be created by urlpatterns, > like in subcollection, but on a job-basis. > </description> > </property> > > Use crawlname as your fieldname and use a different config directory for > each of your crawls with an appropriate value for fieldcontent set in each. > > Iain > > -----Original Message----- > From: Katrina Riehl [mailto:[email protected]] > Sent: Wednesday, April 8, 2015 9:41 AM > To: [email protected] > Subject: Re: Adding field to Nutch / Solr > > Right, I can create multiple collections no problem... but, what I'd > really love is to put them into the same collection, just adding a field > like "crawl_name" to the index. > > Any way I can do that? > > Thanks! > > > On Wed, Apr 8, 2015 at 9:15 AM, Iain Lopata <[email protected]> wrote: > > > Katrina, > > > > When you specify the solr instance as the third parameter to bin/crawl > > try specifying the collection name in the path e.g. > > http://localhost:8080/solr/collection1 > > > > Iain > > > > -----Original Message----- > > From: Katrina Riehl [mailto:[email protected]] > > Sent: Wednesday, April 8, 2015 8:51 AM > > To: [email protected] > > Subject: Adding field to Nutch / Solr > > > > Hello, > > > > I am new to using Nutch. I'm developing an application that crawls > > websites, and then indexes information about those websites into a > > Solr instance. The problem is, it's putting all the crawled documents > > into the same Solr collection. > > > > Is there a way for me to add a field specifying which crawl the index > > came from? Is there a command line option I can add when I start the > crawl? > > > > Thank you so much for your help. > > > > -- > > Katrina Riehl > > Continuum Analytics > > [email protected] > > > > > >

