You need to write your own indexing filter plugin. Take a look
at index-basic. In BasicIndexingFilter.java there are a whole
bunch of lines that do something like:

doc.add(Field.Text("myfield", myFieldValue));

Just add your own field. You have access to title, anchor,
and page text in this function. Search the text for your
keywords and add whatever field you want.

To search on this field, you'll have to create a query filter plugin also
so that you can search for "myfield:sports".  See query-site for an
example. You'll only have to change a couple of lines of code:

public class MyQueryFilter extends RawFieldQueryFilter {
 public MyQueryFilter() {
   super("myfield");
 }
}

Don't forget to add your new plugins to nutch-site.xml.

By the way, I would recommend writing some extra code to
allow yourself to read in keywords from a file and map them
to your category. It's kind of a pain to edit the code every
time you think of a new keyword.

Howie

Hi guys

Sorry for the follow up mail

My requirement as i was mentioning previously shud let me stamp documents
with some kind of type


How do i do it ?


For example add sports to a field TYPEFIELD on seeing football,tennis in
extracted text

For example add technology to the same field TYPEFIELD on seeing
web,internet


Where do i add this ??

Rgds

Prabhu




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to