[jira] Created: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-01 Thread Alexis (JIRA)
Content-Length limit, URL filter and few minor issues
-

 Key: NUTCH-950
 URL: https://issues.apache.org/jira/browse/NUTCH-950
 Project: Nutch
  Issue Type: Bug
Affects Versions: 2.0
Reporter: Alexis


1. crawl command (nutch1.patch)

The class was renamed to Crawler but the references to it were not updated.


2. URL filter (nutch2.patch)

This avoids a NPE on bogus urls which host do not have a suffix.


3. Content-Length limit (nutch3.patch)

This is related to NUTCH-899.
The patch avoids the entire flush operation on the Gora datastore to crash 
because the MySQL blob limit was exceeded by a few bytes. Both protocol-http 
and protocol-httpclient plugins were problematic.


4. Ivy configuration (nutch4.patch)
- Change xercesImpl and restlet versions. These 2 version changes are required. 
The first one currently makes a JUnit test crash, the second one is missing in 
default Maven repository.

- Add gora-hbase, zookeeper which is an HBase dependency. Add MySQL connector. 
These jars are necesary to run Gora with HBase or MySQL datastores. (more a 
suggestion that a requirement here)

- Add com.jcraft/jsch, which is a protocol-sftp plugin dependency. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-01 Thread Alexis (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexis updated NUTCH-950:
-

Attachment: nutch4.patch

 Content-Length limit, URL filter and few minor issues
 -

 Key: NUTCH-950
 URL: https://issues.apache.org/jira/browse/NUTCH-950
 Project: Nutch
  Issue Type: Bug
Affects Versions: 2.0
Reporter: Alexis
 Attachments: nutch1.patch, nutch2.patch, nutch3.patch, nutch4.patch


 1. crawl command (nutch1.patch)
 The class was renamed to Crawler but the references to it were not updated.
 2. URL filter (nutch2.patch)
 This avoids a NPE on bogus urls which host do not have a suffix.
 3. Content-Length limit (nutch3.patch)
 This is related to NUTCH-899.
 The patch avoids the entire flush operation on the Gora datastore to crash 
 because the MySQL blob limit was exceeded by a few bytes. Both protocol-http 
 and protocol-httpclient plugins were problematic.
 4. Ivy configuration (nutch4.patch)
 - Change xercesImpl and restlet versions. These 2 version changes are 
 required. The first one currently makes a JUnit test crash, the second one is 
 missing in default Maven repository.
 - Add gora-hbase, zookeeper which is an HBase dependency. Add MySQL 
 connector. These jars are necesary to run Gora with HBase or MySQL 
 datastores. (more a suggestion that a requirement here)
 - Add com.jcraft/jsch, which is a protocol-sftp plugin dependency. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-950) Content-Length limit, URL filter and few minor issues

2011-01-01 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12976421#action_12976421
 ] 

Julien Nioche commented on NUTCH-950:
-

Will look into this next week, thanks for your contribution. In the future 
please open separate JIRA issues instead of putting everything into a single one

 Content-Length limit, URL filter and few minor issues
 -

 Key: NUTCH-950
 URL: https://issues.apache.org/jira/browse/NUTCH-950
 Project: Nutch
  Issue Type: Bug
Affects Versions: 2.0
Reporter: Alexis
 Attachments: nutch1.patch, nutch2.patch, nutch3.patch, nutch4.patch


 1. crawl command (nutch1.patch)
 The class was renamed to Crawler but the references to it were not updated.
 2. URL filter (nutch2.patch)
 This avoids a NPE on bogus urls which host do not have a suffix.
 3. Content-Length limit (nutch3.patch)
 This is related to NUTCH-899.
 The patch avoids the entire flush operation on the Gora datastore to crash 
 because the MySQL blob limit was exceeded by a few bytes. Both protocol-http 
 and protocol-httpclient plugins were problematic.
 4. Ivy configuration (nutch4.patch)
 - Change xercesImpl and restlet versions. These 2 version changes are 
 required. The first one currently makes a JUnit test crash, the second one is 
 missing in default Maven repository.
 - Add gora-hbase, zookeeper which is an HBase dependency. Add MySQL 
 connector. These jars are necesary to run Gora with HBase or MySQL 
 datastores. (more a suggestion that a requirement here)
 - Add com.jcraft/jsch, which is a protocol-sftp plugin dependency. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Nutch-trunk #1355

2011-01-01 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Nutch-trunk/1355/

--
[...truncated 1007 lines...]
A src/plugin/subcollection/src/java/org/apache/nutch/collection
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html
A src/plugin/subcollection/src/java/org/apache/nutch/indexer
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
A src/plugin/subcollection/README.txt
A src/plugin/subcollection/plugin.xml
A src/plugin/subcollection/build.xml
A src/plugin/index-more
A src/plugin/index-more/ivy.xml
A src/plugin/index-more/src
A src/plugin/index-more/src/test
A src/plugin/index-more/src/test/org
A src/plugin/index-more/src/test/org/apache
A src/plugin/index-more/src/test/org/apache/nutch
A src/plugin/index-more/src/test/org/apache/nutch/indexer
A src/plugin/index-more/src/test/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
A src/plugin/index-more/src/java
A src/plugin/index-more/src/java/org
A src/plugin/index-more/src/java/org/apache
A src/plugin/index-more/src/java/org/apache/nutch
A src/plugin/index-more/src/java/org/apache/nutch/indexer
A src/plugin/index-more/src/java/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html
A src/plugin/index-more/plugin.xml
A src/plugin/index-more/build.xml
AUsrc/plugin/plugin.dtd
A src/plugin/parse-ext
A src/plugin/parse-ext/ivy.xml
A src/plugin/parse-ext/src
A src/plugin/parse-ext/src/test
A src/plugin/parse-ext/src/test/org
A src/plugin/parse-ext/src/test/org/apache
A src/plugin/parse-ext/src/test/org/apache/nutch
A src/plugin/parse-ext/src/test/org/apache/nutch/parse
A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
A src/plugin/parse-ext/src/java
A src/plugin/parse-ext/src/java/org
A src/plugin/parse-ext/src/java/org/apache
A src/plugin/parse-ext/src/java/org/apache/nutch
A src/plugin/parse-ext/src/java/org/apache/nutch/parse
A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
A src/plugin/parse-ext/plugin.xml
A src/plugin/parse-ext/build.xml
A src/plugin/parse-ext/command
A src/plugin/urlnormalizer-pass
A src/plugin/urlnormalizer-pass/ivy.xml
A src/plugin/urlnormalizer-pass/src
A src/plugin/urlnormalizer-pass/src/test
A src/plugin/urlnormalizer-pass/src/test/org
A src/plugin/urlnormalizer-pass/src/test/org/apache
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java
A src/plugin/urlnormalizer-pass/src/java
A src/plugin/urlnormalizer-pass/src/java/org
A src/plugin/urlnormalizer-pass/src/java/org/apache
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java
AUsrc/plugin/urlnormalizer-pass/plugin.xml
AUsrc/plugin/urlnormalizer-pass/build.xml
A src/plugin/parse-html
A src/plugin/parse-html/ivy.xml
A src/plugin/parse-html/lib
A src/plugin/parse-html/lib/tagsoup.LICENSE.txt
A src/plugin/parse-html/src
A src/plugin/parse-html/src/test
A src/plugin/parse-html/src/test/org
A src/plugin/parse-html/src/test/org/apache
A src/plugin/parse-html/src/test/org/apache/nutch
A src/plugin/parse-html/src/test/org/apache/nutch/parse
A