Hi Sebastian,

Here is the usecase discussed 
http://lucene.472066.n3.nabble.com/Dynamic-Crawling-URL-with-query-parameters-td4312316.html
<http://lucene.472066.n3.nabble.com/Dynamic-Crawling-URL-with-query-parameters-td4312316.html>
  

I wanted to high light this one from the above link, in case you choose not
to go through it ;)
*Every time the user is making search to the system the crawling should
trigger, my concern would be about the scale when there are large number of
users say 1000000 ;) *

To add more we are expecting the html/js being crawled. We got to push the
parsed contents to the Kafka which I have already implemented and it seems
to be working. I implemented it last week, however I think it would benefit
everyone if it packaged with the Nutch distribution.

Yes I was aware of the multiple implementations, thanks for pointing it
here.

>>That means a different configuration per document/page or per host/domain? 

As pointed in the discussion thread that we may have large number of URL's
so we may have different rules per domain mostly but yes there could be
cause per page too. I don't know how it will shape but making it generic
makes room for us.

Regards,
Vicky






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dymanic-Xpath-plugin-tp4314525p4315294.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to