Re: How to crawl and index parallel way from Nutch into Solr

Talat Uyarer Fri, 21 Mar 2014 09:21:18 -0700

This is possible. Moreover you can run crawler more than one. You can
research Apache Oozie
 21 Mar 2014 13:44 tarihinde "reddibabu" <[email protected]> yazdı:


> My requirement is to crawl and index urls based on -depth 100 and -topN
> 100.
> The Nutch crawl command crawls all the urls first and then indexes them and
> sends the data all at once to Solr. As the depth and topN are 100 each, the
> whole process (crawling and indexing) takes around 4-5 hours.
>
> I would like to know if there is a way where crawling and indexing can be
> done in parallel so that some data can be seen in the Solr admin screen
> while the Nutch crawl job is still in progress.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-crawl-and-index-parallel-way-from-Nutch-into-Solr-tp4125990.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Re: How to crawl and index parallel way from Nutch into Solr

Reply via email to