Hi,
I need to modify the Nutch Indexer class because for me it is very
useful to add some fields to the generated Lucene index. I was trying
and I find out that it is possible to add fields to the Document with
doc.addField() in the reduce function. My point is that for those fields
I need the html content of the webpage to process it, but it looks not
to be present yet in the Document because it throws a null pointer
exception with getField("content"), maybe that is not the correct way to
access it, or the correct place. So, How and where can I access to the
html content of the document to add a new field to the Lucene Document
and so on to the generated index?
Any advice will be very helpful,
Thanks in advance.
Javier.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers