On Jun 18, 2007, at 7:36 PM, Micah Vivion wrote:
>
> I am trying to add change the behavior of how Nutch indexes web
> pages content information. I would like to have the content of
> intranet web pages be stored Based on previous information that I
> found reading through the mailing list archives the recommend way
> to achieve this is to modify BasicIndexingFilter.java on line 72:
>
> change
> doc.add(new Field("content", parse.getText(), Field.Store.NO,
> Field.Index.TOKENIZED));
>
> to
>
> doc.add(new Field("content", parse.getText(), Field.Store.YES,
> Field.Index.TOKENIZED));
>
> After making these changes, rebuilding Nutch, the field of content
> is still not stored in the index
Just to be clear, you are re-crawling after making this change? You
need to delete the index and re-crawl before seeing this change.
If you are, make sure bin/nutch is accessing the right .jar. Simplest
way to test this is to log or print a debug string right before the
doc.add() line you edited.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general