[jira] [Updated] (NUTCH-966) Behavior of NOINDEX,FOLLOW is not intuitive

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-966: --- Fix Version/s: 2.2 1.7 Behavior of NOINDEX,FOLLOW is not

[jira] [Updated] (NUTCH-911) recrawls file protocol causes Errors/Exceptions when actually not modified or gone

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-911: --- Fix Version/s: 1.7 recrawls file protocol causes Errors/Exceptions when actually

[jira] [Resolved] (NUTCH-813) Repetitive crawl 403 status page

2013-01-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-813. --- Resolution: Duplicate The described problem is identical to that of NUTCH-578. The provided

[jira] [Resolved] (NUTCH-910) Cached.jsp has a bug with encoding

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-910. Resolution: Won't Fix this is a legacy issue so we won't be fixing it.

[jira] [Updated] (NUTCH-923) Multilingual support for Solr-index-mapping

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-923: --- Patch Info: Patch Available Fix Version/s: 1.7 Multilingual support for

[jira] [Updated] (NUTCH-829) duplicate hadoop temp files

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-829: --- Fix Version/s: 2.2 1.7 duplicate hadoop temp files

[jira] [Resolved] (NUTCH-625) Non-ascii character broken in dumped content for mixed encoding (utf-8 and multi-byte)

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-625. Resolution: Won't Fix as per Dogacan's comments Non-ascii

[jira] [Updated] (NUTCH-609) Allow Plugins to be Loaded from Jar File(s)

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-609: --- Fix Version/s: 2.2 1.7 Allow Plugins to be Loaded from Jar

[jira] [Updated] (NUTCH-670) feed plugin does not parse RSS2 enclosures

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-670: --- Fix Version/s: 2.2 1.7 feed plugin does not parse RSS2

[jira] [Updated] (NUTCH-664) Possibility to update already stored documents.

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-664: --- Fix Version/s: 2.2 Possibility to update already stored documents.

[jira] [Updated] (NUTCH-718) urlfilter-subnets plugin

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-718: --- Fix Version/s: 2.2 1.7 urlfilter-subnets plugin

[jira] [Updated] (NUTCH-750) HtmlParser plugin - page title extraction

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-750: --- Fix Version/s: 1.7 HtmlParser plugin - page title extraction

[jira] [Updated] (NUTCH-737) urlnormalizer-unalias plugin

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-737: --- Fix Version/s: 1.7 urlnormalizer-unalias plugin

[jira] [Commented] (NUTCH-1345) JAVA_HOME should not be required

2013-01-12 Thread Ben McCann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552041#comment-13552041 ] Ben McCann commented on NUTCH-1345: --- You've probably set $NUTCH_JAVA_HOME then. I don't

[jira] [Updated] (NUTCH-690) bug in DomContentUtils.shouldThrowAwayLink?

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-690: --- Fix Version/s: 1.7 bug in DomContentUtils.shouldThrowAwayLink?

[jira] [Updated] (NUTCH-589) Hierarchical Classloaders

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-589: --- Fix Version/s: 1.7 Hierarchical Classloaders -

[jira] [Updated] (NUTCH-569) Protocol plugins should report progress to the fetcher

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-569: --- Fix Version/s: 1.7 Protocol plugins should report progress to the fetcher

[jira] [Updated] (NUTCH-566) Sun's URL class has bug in creation of relative query URLs

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-566: --- Fix Version/s: 2.2 1.7 Sun's URL class has bug in creation of

[jira] [Updated] (NUTCH-431) Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-431: --- Fix Version/s: 2.2 1.7 Move plugin specific properties out of

[jira] [Updated] (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmen

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-427: --- Patch Info: Patch Available Fix Version/s: 2.2 1.7

[jira] [Updated] (NUTCH-410) Faster RegexNormalize with more features

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-410: --- Patch Info: Patch Available Fix Version/s: 2.2 1.7

[jira] [Updated] (NUTCH-409) Add short circuit notion to filters to speedup mixed site/subsite crawling

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-409: --- Patch Info: Patch Available Fix Version/s: 2.2 1.7 Add

[jira] [Updated] (NUTCH-449) Format of junit output should be configurable

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-449: --- Fix Version/s: 2.2 1.7 Format of junit output should be

[jira] [Updated] (NUTCH-449) Format of junit output should be configurable

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-449: --- Patch Info: Patch Available Format of junit output should be configurable

[jira] [Updated] (NUTCH-386) Plugin to index categories by url rules

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-386: --- Patch Info: Patch Available Fix Version/s: 1.7 Plugin to index categories

[jira] [Updated] (NUTCH-351) Protocol forward proxy

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-351: --- Patch Info: Patch Available Fix Version/s: 2.2 1.7

[jira] [Updated] (NUTCH-346) Improve readability of logs/hadoop.log

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-346: --- Patch Info: Patch Available Fix Version/s: 2.2 1.7

[jira] [Updated] (NUTCH-477) Extend URLFilters to support different filtering chains

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-477: --- Fix Version/s: 1.7 Extend URLFilters to support different filtering chains

[jira] [Updated] (NUTCH-490) Extension point with filters for Neko HTML parser (with patch)

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-490: --- Patch Info: Patch Available Fix Version/s: 2.2 1.7

[jira] [Resolved] (NUTCH-248) add support for internationalized domain names

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-248. Resolution: Won't Fix this is legacy add support for

[jira] [Updated] (NUTCH-213) checkstyle

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-213: --- Patch Info: Patch Available Fix Version/s: 2.2 1.7

[jira] [Resolved] (NUTCH-215) Plugin execution order

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-215. Resolution: Won't Fix we can now explicitly specify the order of indexing, parsing

[jira] [Resolved] (NUTCH-49) Flag for generate to fetch only new pages to complement the -refetchonly flag

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-49?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-49. --- Resolution: Won't Fix This is well and truly a legacy issue. The FetchListTool no

[jira] [Resolved] (NUTCH-737) urlnormalizer-unalias plugin

2013-01-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-737. - Resolution: Duplicate urlnormalizer-unalias plugin

[jira] [Updated] (NUTCH-693) Add configurable option for treating nofollow behaviour.

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-693: --- Fix Version/s: 2.2 1.7 Add configurable option for treating

[jira] [Updated] (NUTCH-1513) Support Robots.txt for Ftp urls

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1513: Fix Version/s: 2.2 1.7 Support Robots.txt for Ftp urls

[jira] [Updated] (NUTCH-1500) bin/crawl fails on step solrindex with wrong path to segment

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1500: Fix Version/s: 1.7 bin/crawl fails on step solrindex with wrong path to

[jira] [Closed] (NUTCH-1489) elasticindex should report the indexed documents like solrindex does

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1489. --- Resolution: Not A Problem This functionality is addressed both when deployed in

[jira] [Commented] (NUTCH-49) Flag for generate to fetch only new pages to complement the -refetchonly flag

2013-01-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552049#comment-13552049 ] Markus Jelsma commented on NUTCH-49: This has been implemented in NUTCH-1248.

[jira] [Commented] (NUTCH-693) Add configurable option for treating nofollow behaviour.

2013-01-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552050#comment-13552050 ] Markus Jelsma commented on NUTCH-693: - Vote for `won't fix`. We also don't implement an

[jira] [Commented] (NUTCH-693) Add configurable option for treating nofollow behaviour.

2013-01-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552053#comment-13552053 ] Lewis John McGibbney commented on NUTCH-693: +1 Markus. Please close off when

[jira] [Closed] (NUTCH-693) Add configurable option for treating nofollow behaviour.

2013-01-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-693. --- Resolution: Won't Fix Fix Version/s: (was: 2.2) (was: 1.7) Add

[jira] [Commented] (NUTCH-1345) JAVA_HOME should not be required

2013-01-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552082#comment-13552082 ] Sebastian Nagel commented on NUTCH-1345: JAVA_HOME (or NUTCH_JAVA_HOME) is

[jira] [Commented] (NUTCH-1345) JAVA_HOME should not be required

2013-01-12 Thread Ben McCann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552083#comment-13552083 ] Ben McCann commented on NUTCH-1345: --- I think it's fine to allow overriding the version

Build failed in Jenkins: Nutch-trunk #2082

2013-01-12 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/2082/ -- [...truncated 3965 lines...] [javac] /zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:89: warning:

<    1   2