To do what I mentioned, you basically have to write two plugins,
an IndexFilter plugin and a QueryFilter plugin. I think this page has
some info on writing plugins:

http://wiki.apache.org/nutch/WritingPlugins

It will probably be easiest if you copy the src/plugins/index-basic
directory, and just change all the build files and filenames as needed. If you look at BasicIndexingFilter.java file, you'll see that the modifications needed
aren't bad at all. There are a whole bunch of lines that do something like:

   doc.add(Field.Text("myfield"), "somevalue");

You should figure out if the url is from a movie page and then
add your field:

   if (isFromMovieSite(url)) {
       doc.add(Field.Text("type"), "movies");
   } else if (isFromMusicSite(url)) {
       doc.add(Field.Text("type"), "music");
   }  else {
       // Need to make sure all docs have the field,
       // Otherwise it will crash when you search
       doc.add(Field.Text("type"), "miscellaneous");
   }

Doing the query filter is even easier, just copy the src/plugins/query-site
directory, change filenames and build files as needed. And change the
line that says:

   super("site");

to:

   super("type");

That's pretty much it. You'll have to edit your conf/nutch-*.xml files to
include your new plugins.


Can you explain what exactly you have in mind

Say that i have fetched sites under movie category (a list of websites which
i have ),how do i add
a field to it  and have fetched sites for songs.
How do i specifically add a field to first set of pages (ie that of movies)
and a separate field to the second (ie that of songs)

And field search ,How can i search by this field

How will nutch understand this query
newfield:uniquename

I thought you needed to create a query-plugin for each field u create .
(like query-url)

I still did not get what u meant .If you can clearly mention ,it will be
helpful

Thanks .
Raghavendra Prabhu R




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to