When I was researching this issue I first thought it was the
deflateBytes method as well but when I changed things in the code the
problem persisted until I changed the regex filter. Maybe your problem
actually is in the deflate bytes method. The forum I was talking about
earlier was nutch-user but if you don't have the regex then those posts
wouldn't help you. Here is the text of a previous conversation I had
about this with Stefan.
------
I have this suspicion that the inflater class in Java 1.5 is causing
some problems with spinning but I can't prove it. We are using about
the same java and linux versions.
The problem I think is line 343 of the SequenceFile.Reader:
while (!inflater.finished()) {
try {
int count = inflater.inflate(inflateIn);
inflateOut.write(inflateIn, 0, count);
} catch (DataFormatException e) {
throw new IOException (e.toString());
}
}
Count can sometimes return 0 and I am wondering if when it does is it
possible that inflater.finished() can return false. If that is the case
I think this can drop into an infinite loop and some processes will sit
and spin until they timeout. It would explain some things because it
probably would happen while inflating a strange byte combination and
since this drops into a native method would probably affect one platform
(in this case linux) more than another. What do you think?
Dennis
------
Hope this helps.
Dennis
Daniel Varela Santoalla wrote:
> Hello Dennis et al
>
> Dennis Kubes wrote:
>> We have seen this before too. If is the same problem it is the regex
>> url filter. Comment out the -.*(/.+?)/.*?\1/.*?\1/ expression in
>> the regex-urlfilter.txt file and it should resolve itself.
>
> I'm afraid I didn't have a line like that in my regex-urlfilter.txt.
> Anyway I removed everything except the last line accepting all, but no
> improvement.
>
>> Also search the forum for "Fetcher stops pushes cpu to 100%".
>
> Which forum? I tried both nutch-user and nutch-dev without luck...
>
>>
>> Dennis
>>
>
> BTW, I'm using 0.7.2.
>
> Regards
> Daniel
>
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general