When I was researching this issue I first thought it was the
deflateBytes method as well but when I changed things in the code the
problem persisted until I changed the regex filter. Maybe your problem
actually is in the deflate bytes method. The forum I was talking about
earlier was nutch-user but if you don't have the regex then those posts
wouldn't help you. Here is the text of a previous conversation I had
about this with Stefan.
------
I have this suspicion that the inflater class in Java 1.5 is causing
some problems with spinning but I can't prove it. We are using about
the same java and linux versions.
The problem I think is line 343 of the SequenceFile.Reader:
while (!inflater.finished()) {
try {
int count = inflater.inflate(inflateIn);
inflateOut.write(inflateIn, 0, count);
} catch (DataFormatException e) {
throw new IOException (e.toString());
}
}
Count can sometimes return 0 and I am wondering if when it does is it
possible that inflater.finished() can return false. If that is the case
I think this can drop into an infinite loop and some processes will sit
and spin until they timeout. It would explain some things because it
probably would happen while inflating a strange byte combination and
since this drops into a native method would probably affect one platform
(in this case linux) more than another. What do you think?
Dennis
------
Hope this helps.
Dennis
Daniel Varela Santoalla wrote:
Hello Dennis et al
Dennis Kubes wrote:
We have seen this before too. If is the same problem it is the regex
url filter. Comment out the -.*(/.+?)/.*?\1/.*?\1/ expression in
the regex-urlfilter.txt file and it should resolve itself.
I'm afraid I didn't have a line like that in my regex-urlfilter.txt.
Anyway I removed everything except the last line accepting all, but no
improvement.
Also search the forum for "Fetcher stops pushes cpu to 100%".
Which forum? I tried both nutch-user and nutch-dev without luck...
Dennis
BTW, I'm using 0.7.2.
Regards
Daniel