What i want to do is i should add some header info in parse-filter which
will be used by index-filter to add my own nature of the new FIELD
Rgds
Prabhu
I would recommend doing it at the index phase if possible. If the end
goal is to have it searchable from the index, ask if you really need to have
the information at the parsing stage. If you decide you want to
tweak your keywords, it's easy to re-index. If you do it at the parsing
stage, it will take twice as long since you have to re-parse and then
re-index. Plus re-parsing is not complicated, but involves kind of a
hack with renaming a bunch of directories.
One reason to do your analysis at parse time is that it's easier to
get the entire page contents like HTML tags in case you need that
for categorization. If you don't need this stuff, you probably don't
need to categorize at the parsing phase.
If you really want to do it at parse time, it's not difficult. Take a
look at parse-html. You can use the metadata object to store
your category. Look in HtmlParseFilter.java in getParse. Just do:
metadata.put("myfield", "sports");
In your index filter, you can then do a metadata.get to get your
category and then index it.
Howie
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general