I'm sending this to you because you are active on the nutch-users
list and I am too lazy to subscribe at this particular moment. Please
pass on / act as you see fit. Wiki itself seems immutable at least
to the likes of me.
-Jeff
= currently
By default the [WWW] file plugin is
[
http://issues.apache.org/jira/browse/NUTCH-151?page=comments#action_12361242 ]
Paul Baclace commented on NUTCH-151:
Analysis:
CommandRunner uses CyclicBarrier is to synchronize the thread that does the
exec (lets call it the main thread) with the io
[ http://issues.apache.org/jira/browse/NUTCH-151?page=all ]
Paul Baclace updated NUTCH-151:
---
Attachment: CommandRunner.java
Minimal required changes to fix bug NUTCH-151:
1. The pipe io threads should be daemons.
2. The main thread should always
[ http://issues.apache.org/jira/browse/NUTCH-151?page=all ]
Paul Baclace updated NUTCH-151:
---
Attachment: CommandRunner.java.patch
Here is the patch for CommandRunner (previously, I attached the actual file).
CommandRunner can hang after the main thread
[ http://issues.apache.org/jira/browse/NUTCH-152?page=all ]
Paul Baclace updated NUTCH-152:
---
Attachment: TaskRunner.java.patch
The patch addresses each issue listed in the detailed description of this bug.
The detailed description is suitable as a
TextParser is only supposed to parse plain text, but if given postscript, it
can take hours and then fail
-
Key: NUTCH-153
URL: http://issues.apache.org/jira/browse/NUTCH-153
[ http://issues.apache.org/jira/browse/NUTCH-153?page=all ]
Paul Baclace updated NUTCH-153:
---
Attachment: TextParser.java.patch
A patch to reject files with %!PS-Adobe in the first 40 characters of the
file.
TextParser is only supposed to parse plain
[
http://issues.apache.org/jira/browse/NUTCH-128?page=comments#action_12361254 ]
Paul Baclace commented on NUTCH-128:
In general, it might be helpful to issue an INFO level log msg whenever a
configuration attribute is overridden. If the override