nutch and solr centralization

codegigabyte Wed, 16 Nov 2011 17:19:03 -0800

Hey guys.

Over the past few weeks I have learn a lot on nutch with solr and alotmore to learn.

I am thinking of using nutch as a pure web crawler to extract the purehtml (maybe including headers) and url solely to pass it to solr.

I know I can modify the index-basic filter of nutch. But I am wonderingif there is any easier and cleaner way to do, maybe via the modifcationof schema etc without modify any source code of nutch?

The reason I want to do it this way is because it is cleaner, so i justneed to focus on solr plugin customization rather than trying to modifynutch and solr at the same time. Indexing will be done at solr level.Anyone, any ideas?


Thanks in advance. =)

nutch and solr centralization

Reply via email to