Hi,
how can I search the results of my Nutch crawl?
Which Lucene Fields do I need to use? I cannot find documentation on this. I
can search my own Lucene indexes, but not indexes made by Nutch
Thanks
Kasper
What settings can I setup in the xml config file for our proxy?
Our proxy requires authentication.
Our browsers use a script to configure their internet access through the
proxy, e.g
http://ourproxyserver:8080/array.dll?Get.Routing.Script
Is there anything setting I can add to the file to use
Verify that you have the very latest PDFBOX from there website.
A lot of people notice that pdf box is a little bit buggy.
Stefan
Am 15.11.2005 um 22:27 schrieb Håvard W. Kongsgård:
Nutch won't index some of my PDF files I get this error:
reason: failed(2,202): Content truncated at 66608
conf/nutch-default
Jérôme Charron wrote:
http.content.limit=542256565536 and file.content.limit=4541165536
still the same error:
where do you specify these values? in nutch-default or nutch-site?
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
conf/nutch-default
Checks that they are not overrided in the conf/nutch-site
If no, sorry, no more idea for now :-(
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
Don't have a conf/nutch-site.xml
Jérôme Charron wrote:
conf/nutch-default
Checks that they are not overrided in the conf/nutch-site
If no, sorry, no more idea for now :-(
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/