I am telling you I have all of the content size limits all set to 0, which I thought meant no truncation. I was getting lots of truncation anyway in PDF files. I reread my config and notcied the easily missed detail that file and ftp are 0 from no trunccation, but http need be -1
Here is the help I got in nutch-user from Jermoe, who I noticed is a developer. >Edit your nutch-site.xml (or nutch-default.xml) and change the http.content.limit (set it to 0 if you don't want no content truncation at >all). >Jérôme This is very inconsistant, and unless theres a reason for it it should be changed for the next version I think. Otherwise it becomes a support problem. This is so easy to miss that one of you developers missed it. This is sound like a bug, something suitable for nutch-dev I think. Site config that works for no truncation: <property> <name>file.content.limit</name> <value>0</value> </property> <property> <name>ftp.content.limit</name> <value>0</value> </property> <property> <name>http.content.limit</name> <value>-1</value> </property> ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
