Links not extracted / parsing stops

Franz Werfel Tue, 14 Mar 2006 08:16:59 -0800

Hello,
Is it possible that Nutch (0.7.1) can stop to look for urls in a html
file because of an error in the file? -- I have this impression but I
don't know how to test it to be sure.


Here is what I have done:
- the file is 34 kb (so there is no content-length limit)
- there are approx. 100 links in it
- but only the first 54 are identified, then non of the following ones
- however no error is reported by Nutch
- the regexp-urlfilter file only contains this line: +.

I was wondering if it was the structure of the links themselves but I
tried to put them in another file and they were identified fine.

The file has quite a lot of javascript in it.

If Nutch indeed does stop parsing, does it report the error somewhere?

Thanks, Fr.

Links not extracted / parsing stops

Reply via email to