Katrina,

If I am understanding you correctly, you could do this with the index-static 
plugin which is configured with the following property:

<property>
  <name>index.static</name>
  <value> fieldname:fieldcontent </value>
  <description>
  A simple plugin called at indexing that adds fields with static data.
  You can specify a list of fieldname:fieldcontent per nutch job.
  It can be useful when collections can't be created by urlpatterns,
  like in subcollection, but on a job-basis.
  </description>
</property>

Use crawlname as your fieldname and use a different config directory for each 
of your crawls with an appropriate value for fieldcontent set in each.

Iain

-----Original Message-----
From: Katrina Riehl [mailto:[email protected]] 
Sent: Wednesday, April 8, 2015 9:41 AM
To: [email protected]
Subject: Re: Adding field to Nutch / Solr

Right, I can create multiple collections no problem... but, what I'd really 
love is to put them into the same collection, just adding a field like 
"crawl_name" to the index.

Any way I can do that?

Thanks!


On Wed, Apr 8, 2015 at 9:15 AM, Iain Lopata <[email protected]> wrote:

> Katrina,
>
> When you specify the solr instance as the third parameter to bin/crawl 
> try  specifying the collection name in the path e.g.
> http://localhost:8080/solr/collection1
>
> Iain
>
> -----Original Message-----
> From: Katrina Riehl [mailto:[email protected]]
> Sent: Wednesday, April 8, 2015 8:51 AM
> To: [email protected]
> Subject: Adding field to Nutch / Solr
>
> Hello,
>
> I am new to using Nutch.  I'm developing an application that crawls 
> websites, and then indexes information about those websites into a 
> Solr instance.  The problem is, it's putting all the crawled documents 
> into the same Solr collection.
>
> Is there a way for me to add a field specifying which crawl the index 
> came from?  Is there a command line option I can add when I start the crawl?
>
> Thank you so much for your help.
>
> --
> Katrina Riehl
> Continuum Analytics
> [email protected]
>
>

Reply via email to