Hello, Is it possible that Nutch (0.7.1) can stop to look for urls in a html file because of an error in the file? -- I have this impression but I don't know how to test it to be sure.
Here is what I have done: - the file is 34 kb (so there is no content-length limit) - there are approx. 100 links in it - but only the first 54 are identified, then non of the following ones - however no error is reported by Nutch - the regexp-urlfilter file only contains this line: +. I was wondering if it was the structure of the links themselves but I tried to put them in another file and they were identified fine. The file has quite a lot of javascript in it. If Nutch indeed does stop parsing, does it report the error somewhere? Thanks, Fr. ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
