To use these plugins you have to edit your conf/nutch-site.xml configuration file incluing something like this:
<property> <name>plugin.includes</name> <value>nutch-extensionpoints|protocol-http|language-identifier|urlfilter-regex|parse-(text|html|pdf|msword)|index-(basic|more)|query-(basic|site|url|more)</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded.</description> </property> This have to be done both in the backend application and the web application. When you crawl a site, nutch will parse the file type in parse-(text|html|pdf|msword). The index-more will give you more field's to your lucene index (date, filetype, etc), making seachable by query-more plugin, in your web application. For a while, that's all. I hope I may help you :) . For more explanations about nutch see: http://lucene.apache.org/nutch/ http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html http://wiki.media-style.com/display/nutchDocu/Home http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine Regards, Lourival Júnior On 4/24/07, ekoje ekoje <[EMAIL PROTECTED]> wrote:
I'm not sure to understand everything. I'm still a novice. How can i use index-more and query-more ? Do you mind to help me ? Thanks E > You can use the plugins index-more and query-more to create a field on > your > index indicating the file type of the document. So, in you search you can > use "type:pdf" or "type:msword" to filter these files. I used nutch 0.7.2 > to > make it work... > > Regards, > > Lourival Júnior > > On 4/24/07, ekoje ekoje <[EMAIL PROTECTED]> wrote: >> >> Hi Guys, >> >> I would like to add a new button on my webpage to make an adanced search >> using the keywords. >> Once the user will click on it it will search for keywords only in the >> different PDF/WORD or Excel document indexed. >> >> Do you know how i can filter/limit my search on PDF/WORD/EXCEL documents >> ? >> >> Thanks for your help. >> E >> > > > > -- > Lourival Junior > Universidade Federal do Pará > Curso de Bacharelado em Sistemas de Informação > http://www.ufpa.br/cbsi > Msn: [EMAIL PROTECTED] >
-- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]
