Hello everyone,

I'm currently thinking of using Nutch in a new website project.
My aim is to index files (HTML, TXT, PDF ...) stored on a filesystem (which
Nutch can ), but some of the files may have meta-information stored in a
separate file.
Then, a web user may search the index containing those files.

For example, the " technical_documentation.pdf " file, may have a "
technical_documentation.xml " linked to it (for example in the same folder
), this XML containing informations like " <type>documentation</type> " and
so.

Is there any way to achieve this using Nutch ? Is it able to combine
informations/content from two files into a single searchable item ? Or maybe
I'm not choosing the right tool to achieve this?

Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-Searching-both-file-and-meta-information-file-tp2574567p2574567.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to