[jira] [Updated] (NUTCH-1090) LinkDb (invertlinks) should inform the user when it ignores internal links

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1090: - Fix Version/s: (was: 1.3) 1.5 LinkDb (invertlinks) should inform the

Re: Prepare for 1.4 release?

2011-09-28 Thread Julien Nioche
+1 have created a 1.5 version in JIRA. Thanks Julien On 27 September 2011 22:01, Markus Jelsma markus.jel...@openindex.iowrote: Hi, There are some bad issues in 1.3 that are fixed in early 1.4 revisions. Also, 1.4 has some nice improvements and new features. I know some would like to

[jira] [Updated] (NUTCH-1078) Upgrade all instances of commons logging to slf4j (with log4j backend)

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1078: - Priority: Blocker (was: Minor) Marked as blocker due to described issues in fetcher.

[jira] [Updated] (NUTCH-578) URL fetched with 403 is generated over and over again

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-578: Fix Version/s: (was: 1.4) (was: 2.0) 1.5 URL

[jira] [Commented] (NUTCH-1078) Upgrade all instances of commons logging to slf4j (with log4j backend)

2011-09-28 Thread Julien Nioche (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116308#comment-13116308 ] Julien Nioche commented on NUTCH-1078: -- I had modified LogUtil in 2.0 (see

[jira] [Updated] (NUTCH-1040) Backport REST-API from 2.0

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1040: - Affects Version/s: (was: 1.4) Fix Version/s: 1.5 Issue Type: New Feature

[jira] [Assigned] (NUTCH-1039) Fetcher fails for pages without content-length header

2011-09-28 Thread Julien Nioche (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reassigned NUTCH-1039: Assignee: Markus Jelsma Markus - could you check that NUTCH-1096 fixes the issue and

[jira] [Updated] (NUTCH-1129) Any23 Nutch plugin

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1129: - Affects Version/s: (was: 1.4) Fix Version/s: (was: 1.4)

[jira] [Updated] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-585: Fix Version/s: (was: 1.4) 1.5 Marking for 1.5. Needs reviewing and won't

[jira] [Updated] (NUTCH-1079) StringBuffer converted to StringBuilder

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1079: - Priority: Minor (was: Major) Affects Version/s: (was: 1.3) Fix

[jira] [Updated] (NUTCH-1047) Pluggable indexing backends

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1047: - Affects Version/s: (was: 1.4) Fix Version/s: (was: 1.4)

[jira] [Updated] (NUTCH-1088) Write Solr XML documents

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1088: - Priority: Minor (was: Major) Fix Version/s: (was: 1.4) 1.5

[jira] [Commented] (NUTCH-1078) Upgrade all instances of commons logging to slf4j (with log4j backend)

2011-09-28 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116357#comment-13116357 ] Lewis John McGibbney commented on NUTCH-1078: - OK I will be working on this

[jira] [Updated] (NUTCH-1117) JUnit test for index-anchor

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1117: - Fix Version/s: (was: 1.4) 1.5 JUnit test for index-anchor

[jira] [Updated] (NUTCH-1119) JUnit test for index-static

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1119: - Fix Version/s: (was: 1.4) 1.5 JUnit test for index-static

[jira] [Updated] (NUTCH-1118) JUnit test for index-basic

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1118: - Fix Version/s: (was: 1.4) 1.5 JUnit test for index-basic

[jira] [Updated] (NUTCH-1124) JUnit test for scoring-opic

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1124: - Fix Version/s: (was: 1.4) 1.5 JUnit test for scoring-opic

[jira] [Updated] (NUTCH-1123) JUnit test for scoring-link

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1123: - Fix Version/s: (was: 1.4) 1.5 JUnit test for scoring-link

[jira] [Updated] (NUTCH-1128) JUnit test for urlmeta

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1128: - Fix Version/s: (was: 1.4) 1.5 JUnit test for urlmeta

[jira] [Updated] (NUTCH-1127) JUnit test for urlfilter-validator

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1127: - Fix Version/s: (was: 1.4) 1.5 JUnit test for urlfilter-validator

[jira] [Updated] (NUTCH-1130) JUnit test for Any23 RDF plugin

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1130: - Fix Version/s: (was: 1.4) 1.5 JUnit test for Any23 RDF plugin

[jira] [Updated] (NUTCH-1120) JUnit test for microformats-reltag

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1120: - Fix Version/s: (was: 1.4) 1.5 JUnit test for microformats-reltag

[jira] [Updated] (NUTCH-1125) JUnit test for tld

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1125: - Fix Version/s: (was: 1.4) 1.5 JUnit test for tld --

[jira] [Updated] (NUTCH-1122) JUnit test for protocol-ftp

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1122: - Fix Version/s: (was: 1.4) 1.5 JUnit test for protocol-ftp

[jira] [Updated] (NUTCH-1121) JUnit test for parse-js

2011-09-28 Thread Julien Nioche (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1121: - Fix Version/s: (was: 1.4) 1.5 JUnit test for parse-js

[jira] [Commented] (NUTCH-1088) Write Solr XML documents

2011-09-28 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116732#comment-13116732 ] Markus Jelsma commented on NUTCH-1088: -- I believe we can. We need another output

[jira] [Updated] (NUTCH-865) Format source code in unique style

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-865: Fix Version/s: (was: 2.0) Marked as blocker for 1.4. This issue should be the addressed as the

[jira] [Updated] (NUTCH-1106) Options to skip url's based on length

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1106: - Fix Version/s: (was: 1.4) (was: 2.0) 1.5 Marked as

[jira] [Updated] (NUTCH-1103) Port protocol-sftp to 1.4

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1103: - Fix Version/s: (was: 1.4) 1.5 This is obviously not going to happen in

[jira] [Commented] (NUTCH-1005) Index headings plugin

2011-09-28 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116743#comment-13116743 ] Markus Jelsma commented on NUTCH-1005: -- I agree with Julien as it's the most flexible

[jira] [Commented] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2011-09-28 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116746#comment-13116746 ] Markus Jelsma commented on NUTCH-1024: -- Integration with sitemaps and crawler commons

[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1113: - Fix Version/s: (was: 1.4) 1.5 Merging segments causes URLs to vanish

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: (was: 1.4) (was: 2.0) 1.5

[jira] [Commented] (NUTCH-1106) Options to skip url's based on length

2011-09-28 Thread Sebastian Nagel (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116767#comment-13116767 ] Sebastian Nagel commented on NUTCH-1106: Why not just add a regex-urlfilter rule:

[jira] [Issue Comment Edited] (NUTCH-1106) Options to skip url's based on length

2011-09-28 Thread Sebastian Nagel (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116767#comment-13116767 ] Sebastian Nagel edited comment on NUTCH-1106 at 9/28/11 8:28 PM:

[jira] [Assigned] (NUTCH-1098) better url-normalizer basic

2011-09-28 Thread Markus Jelsma (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-1098: Assignee: Markus Jelsma better url-normalizer basic ---

[jira] [Created] (NUTCH-1137) LinkDb / invertlinks: command line arguments ignored

2011-09-28 Thread Sebastian Nagel (Created) (JIRA)
LinkDb / invertlinks: command line arguments ignored Key: NUTCH-1137 URL: https://issues.apache.org/jira/browse/NUTCH-1137 Project: Nutch Issue Type: Bug Components: linkdb

[jira] [Updated] (NUTCH-1137) LinkDb / invertlinks: command line arguments ignored

2011-09-28 Thread Sebastian Nagel (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1137: --- Attachment: NUTCH-1137-1.5.patch LinkDb / invertlinks: command line arguments ignored

[jira] [Assigned] (NUTCH-1137) LinkDb / invertlinks: command line arguments ignored

2011-09-28 Thread Markus Jelsma (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-1137: Assignee: Markus Jelsma LinkDb / invertlinks: command line arguments ignored

[jira] [Updated] (NUTCH-1137) LinkDb / invertlinks: command line arguments ignored

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1137: - Affects Version/s: (was: 1.5) (was: 1.4)

Build failed in Jenkins: Nutch-trunk #1618

2011-09-28 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/1618/ -- [...truncated 936 lines...] A src/plugin/language-identifier/src/test/org/apache/nutch/analysis/lang/de.test A

Build failed in Jenkins: Nutch-nutchgora #20

2011-09-28 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-nutchgora/20/changes Changes: [jnioche] NUTCH-937 Put plugins in classes/plugins in job file (Claudio Martella, Ferdy Galema, jnioche) -- [...truncated 2526 lines...] resolve-default: [ivy:resolve] :: loading

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-09-28 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116986#comment-13116986 ] Hudson commented on NUTCH-937: -- Integrated in Nutch-nutchgora #20 (See