Hey guys.
Over the past few weeks I have learn a lot on nutch with solr and alot
more to learn.
I am thinking of using nutch as a pure web crawler to extract the pure
html (maybe including headers) and url solely to pass it to solr.
I know I can modify the index-basic filter of nutch. But I am wondering
if there is any easier and cleaner way to do, maybe via the modifcation
of schema etc without modify any source code of nutch?
The reason I want to do it this way is because it is cleaner, so i just
need to focus on solr plugin customization rather than trying to modify
nutch and solr at the same time. Indexing will be done at solr level.
Anyone, any ideas?
Thanks in advance. =)
- nutch and solr centralization codegigabyte
-