[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash

2008-03-17 Thread Mark DeSpain (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579321#action_12579321 ] Mark DeSpain commented on NUTCH-620: Hi Andrzej, Though I'm very interested in using

[jira] Commented: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval

2008-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579371#action_12579371 ] Andrzej Bialecki commented on NUTCH-615: - I'll apply the parts of the current

[jira] Closed: (NUTCH-615) Redirected URL are fetched wihtout setting any FetchInterval

2008-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-615. --- Resolution: Fixed Assignee: Andrzej Bialecki I applied the relevant parts of the

[jira] Closed: (NUTCH-616) Reset Fetch Retry counter when fetch is successful

2008-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-616. --- Resolution: Fixed I applied the latest patch with minor changes, in rev. 637861 . Thank you!

[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash

2008-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579438#action_12579438 ] Andrzej Bialecki commented on NUTCH-620: - It would be interesting to see the source

Re: Retire the original Fetcher before the release?

2008-03-17 Thread Dennis Kubes
We continue to run on Fetcher1. What are the benefits of moving to Fetcher2. Not opposed to it, just hadn't thought about it yet as Fetcher1 seemed to be working fine for us? Dennis Andrzej Bialecki wrote: Hi all, I'd like to remove the original Fetcher in favor of Fetcher2. Maintaining

Re: Retire the original Fetcher before the release?

2008-03-17 Thread Andrzej Bialecki
Dennis Kubes wrote: We continue to run on Fetcher1. Since you're running large crawls, could you run one of them with Fetcher2 and comment on the results? Note that Fetcher2 needs a lot fewer threads than Fetcher - usually running a large crawl with 100 threads is more than sufficient.

Re: Retire the original Fetcher before the release?

2008-03-17 Thread Dennis Kubes
Andrzej Bialecki wrote: Dennis Kubes wrote: We continue to run on Fetcher1. Since you're running large crawls, could you run one of them with Fetcher2 and comment on the results? Note that Fetcher2 needs a lot fewer threads than Fetcher - usually running a large crawl with 100 threads

[jira] Closed: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException

2008-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-220. --- Resolution: Fixed Fix Version/s: 1.0.0 Assignee: Andrzej Bialecki PDF Box

[jira] Commented: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression

2008-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579507#action_12579507 ] Andrzej Bialecki commented on NUTCH-243: - Duplicate of NUTCH-255 . Some

[jira] Closed: (NUTCH-243) Some meta-refresh urls get ignored due to matching regular expression

2008-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-243. --- Resolution: Duplicate Some meta-refresh urls get ignored due to matching regular expression

[jira] Closed: (NUTCH-610) Can't Update or modify an index while web gui is running

2008-03-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-610. --- Resolution: Invalid Can't Update or modify an index while web gui is running

[jira] Commented: (NUTCH-620) BasicURLNormalizer should collapse runs of slashes with a single slash

2008-03-17 Thread Mark DeSpain (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579707#action_12579707 ] Mark DeSpain commented on NUTCH-620: Sure :) I'm a bit swamped at the moment, but I'll

Build failed in Hudson: Nutch-trunk #393

2008-03-17 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/393/changes Changes: [ab] Add missing license file. [ab] NUTCH-223 Crawl.java uses Integer.MAX_VALUE instead of Long.MAX_VALUE. [ab] NUTCH-220 Upgrade to PDFBox 0.7.3. [ab] NUTCH-616 Reset Fetch Retry counter when fetch is successful.

[jira] Commented: (NUTCH-616) Reset Fetch Retry counter when fetch is successful

2008-03-17 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579731#action_12579731 ] Hudson commented on NUTCH-616: -- Integrated in Nutch-trunk #393 (See