Re: Store seed-url in Solr

chethan Thu, 02 May 2013 04:58:31 -0700

Hi,

Do you mean you need to store the domain of a page in solr? For example, if
*http://www.xyz.com/intro.html *is* *indexed, do you wish to store *
http://www.xyz.com *in the solr document as well? If so, you can simply
write a Nutch plugin and add a custom indexing filter in which you can add
the field to the document. Refer to
http://wiki.apache.org/nutch/WritingPluginExample


You need to change the Solr schema.xml and define this new field as well.

Thanks
Chethan


On Thu, May 2, 2013 at 5:15 PM, Urs Hofer <[email protected]> wrote:

> Hi all
>
> I'm new with nutch.
>
> I have a running System (Solr 4, Nutch 1.6), currently indexing about
> 360000 Documents. In order to execute kind of a source specific search,
> I'd like to store the original seed-url in Solr as well.
>
> My crawl is limited to the domain: db.ignore.external.links=true
>
> Currently, I'm solving the problem by limiting the search to the same
> domain
> as the seed-url. That works (mostly) quite fine.
>
> But I have several seed urls starting in the same domain, which cannot
> be seperated using this way.
>
> Any suggestions?
> Thanks
> Hofer
>
>
>

Re: Store seed-url in Solr

Reply via email to