I have been looking through the API docs and I can't
figure this out. Here is my question:
Is there a way to search based on meta-information,
such as content type, or even the value of header
fields? For example, let's say I would like to find
only PDFs, or perhaps put higher weight on PDFs vs.
other kinds of documents. Can this be done?
I looked at the query interface. It looks like
NutchBean allows me to specify a Query, and a Query is
basically made up of Strings which are in the content.
I can't find any way to specify meta-information I'm
looking for.
Any ideas on this?
Thanks
____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs