how can I search Nutch crawl results from my own webapp?

2005-11-15 Thread Kasper Hansen
Hi, how can I search the results of my Nutch crawl? Which Lucene Fields do I need to use? I cannot find documentation on this. I can search my own Lucene indexes, but not indexes made by Nutch Thanks Kasper

ATB: Nutch through proxy

2005-11-15 Thread Aled Jones
What settings can I setup in the xml config file for our proxy? Our proxy requires authentication. Our browsers use a script to configure their internet access through the proxy, e.g http://ourproxyserver:8080/array.dll?Get.Routing.Script Is there anything setting I can add to the file to use

Re: PDF indexing support?

2005-11-15 Thread Stefan Groschupf
Verify that you have the very latest PDFBOX from there website. A lot of people notice that pdf box is a little bit buggy. Stefan Am 15.11.2005 um 22:27 schrieb Håvard W. Kongsgård: Nutch won't index some of my PDF files I get this error: reason: failed(2,202): Content truncated at 66608

Re: PDF indexing support?

2005-11-15 Thread Håvard W. Kongsgård
conf/nutch-default Jérôme Charron wrote: http.content.limit=542256565536 and file.content.limit=4541165536 still the same error: where do you specify these values? in nutch-default or nutch-site? Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

Re: PDF indexing support?

2005-11-15 Thread Jérôme Charron
conf/nutch-default Checks that they are not overrided in the conf/nutch-site If no, sorry, no more idea for now :-( Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

Re: PDF indexing support?

2005-11-15 Thread Håvard W. Kongsgård
Don't have a conf/nutch-site.xml Jérôme Charron wrote: conf/nutch-default Checks that they are not overrided in the conf/nutch-site If no, sorry, no more idea for now :-( Jérôme -- http://motrech.free.fr/ http://www.frutch.org/