[jira] Commented: (NUTCH-472) NullPointerException in ZipTextExtractor if no MIME type for zipped file

2007-05-10 Thread Antony Bowesman (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494649 ] Antony Bowesman commented on NUTCH-472: --- Not sure how to turn source code in description into a patch file, but

Hudson build is back to normal: Nutch-Nightly #81

2007-05-10 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/81/

[jira] Commented: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt

2007-05-10 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494734 ] Doğacan Güney commented on NUTCH-446: - So, does anyone have objections to this? It fixes an annoying (albeit

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-05-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494764 ] Chris A. Mattmann commented on NUTCH-444: - Hi Doğacan, Well I must say, with all the discussion that's

Re: [jira] Updated: (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9

2007-05-10 Thread Mike Schwartz
Sami, Thanks for your response. In answer to the points you made: please use diffs against trunk in future, they're more easy to check (svn diff file) will do there is no junit tests at all, however there is tiny piece of test code in class GeoIndexingFilter, atleast this code could perhaps

[jira] Resolved: (NUTCH-456) parse msexcel plugin speedup

2007-05-10 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-456. -- Resolution: Fixed committed with minor modifications (used StringBuilder instead of StringBuffer,

[jira] Updated: (NUTCH-424) CLONE - Problem persists with Nutch 0.8.1 (Nekohtml 0.9.4) - NekoHTML's DOMFragmentParser hangs on certain URLs

2007-05-10 Thread Mike Brzozowski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Brzozowski updated NUTCH-424: -- Affects Version/s: 0.8.1 0.9.0 CLONE - Problem persists with Nutch

[jira] Commented: (NUTCH-424) CLONE - Problem persists with Nutch 0.8.1 (Nekohtml 0.9.4) - NekoHTML's DOMFragmentParser hangs on certain URLs

2007-05-10 Thread Mike Brzozowski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494777 ] Mike Brzozowski commented on NUTCH-424: --- This problem appears to persist in nutch-0.9. Is there a workaround?

[jira] Updated: (NUTCH-424) NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4))

2007-05-10 Thread Mike Brzozowski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Brzozowski updated NUTCH-424: -- Summary: NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch

[jira] Commented: (NUTCH-424) NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4))

2007-05-10 Thread Mike Brzozowski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494782 ] Mike Brzozowski commented on NUTCH-424: --- It looks like if you have multiple threads they block on each other.

[jira] Resolved: (NUTCH-446) RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt

2007-05-10 Thread Sami Siren (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren resolved NUTCH-446. -- Resolution: Fixed I just committed this, keep the patches coming Doğacan! RobotRulesParser should