[jira] [Commented] (NUTCH-1342) Read time out protocol-http

2014-08-25 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108830#comment-14108830
 ] 

Markus Jelsma commented on NUTCH-1342:
--

Sebastian - i cannot reproduce this problem anymore for those URL's. 

 Read time out protocol-http
 ---

 Key: NUTCH-1342
 URL: https://issues.apache.org/jira/browse/NUTCH-1342
 Project: Nutch
  Issue Type: Bug
  Components: protocol
Affects Versions: 1.4, 1.5
Reporter: Markus Jelsma
Assignee: Markus Jelsma
 Fix For: 1.10

 Attachments: NUTCH-1342-1.6-1.patch


 For some reason some URL's always time out with protocol-http but not 
 protocol-httpclient. The stack trace is always the same:
 {code}
 2012-04-20 11:25:44,275 ERROR http.Http - Failed to get protocol output
 java.net.SocketTimeoutException: Read timed out
 at java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at java.io.FilterInputStream.read(FilterInputStream.java:116)
 at java.io.PushbackInputStream.read(PushbackInputStream.java:169)
 at java.io.FilterInputStream.read(FilterInputStream.java:90)
 at 
 org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:228)
 at 
 org.apache.nutch.protocol.http.HttpResponse.init(HttpResponse.java:157)
 at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64)
 at 
 org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138)
 {code}
 Some example URL's:
 * 404 http://www.fcgroningen.nl/tribunenamen/stemmen/
 * 301 http://shop.fcgroningen.nl/aanbieding



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors

2014-08-25 Thread Mathieu Bouchard (JIRA)
Mathieu Bouchard created NUTCH-1828:
---

 Summary: bin/crawl : incorrect handling of nutch errors
 Key: NUTCH-1828
 URL: https://issues.apache.org/jira/browse/NUTCH-1828
 Project: Nutch
  Issue Type: Bug
  Components: nutchNewbie
Affects Versions: 2.2.1, 1.9
 Environment: Ubuntu Server 14.04, OpenJDK 7
Reporter: Mathieu Bouchard


We are using Solr with Nutch to provide a complete search engine for our 
website.

I created a cron job that would use Nutch to crawl and update the Solr index 
each night. This cron job is trying to automatically correct some errors that 
could result in a corrupt crawldb. However, it seems that the bin/crawl command 
doesn't correctly propagate errors coming from bin/nutch.

Here is an exemple from the bin/crawl script :
$bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR

if [ $? -ne 0 ]
  then exit $?
fi

Even if there is an error in the nutch inject command, the crawl script always 
returns 0. The way I understand it, the exit code returned is the result of the 
shell test and not the result of the nutch inject command.

To correct this, we would need to modify the script with something like :
$bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
RETCODE=$?

if [ $RETCODE -ne 0 ]
  then exit $RETCODE
fi




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors

2014-08-25 Thread Mathieu Bouchard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu Bouchard updated NUTCH-1828:


Attachment: apache-nutch-1.9-crawl-fix-retcode.patch

Patch for Apache Nutch 1.9

 bin/crawl : incorrect handling of nutch errors
 --

 Key: NUTCH-1828
 URL: https://issues.apache.org/jira/browse/NUTCH-1828
 Project: Nutch
  Issue Type: Bug
  Components: nutchNewbie
Affects Versions: 1.9, 2.2.1
 Environment: Ubuntu Server 14.04, OpenJDK 7
Reporter: Mathieu Bouchard
 Attachments: apache-nutch-1.9-crawl-fix-retcode.patch


 We are using Solr with Nutch to provide a complete search engine for our 
 website.
 I created a cron job that would use Nutch to crawl and update the Solr index 
 each night. This cron job is trying to automatically correct some errors that 
 could result in a corrupt crawldb. However, it seems that the bin/crawl 
 command doesn't correctly propagate errors coming from bin/nutch.
 Here is an exemple from the bin/crawl script :
 $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
 if [ $? -ne 0 ]
   then exit $?
 fi
 Even if there is an error in the nutch inject command, the crawl script 
 always returns 0. The way I understand it, the exit code returned is the 
 result of the shell test and not the result of the nutch inject command.
 To correct this, we would need to modify the script with something like :
 $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
 RETCODE=$?
 if [ $RETCODE -ne 0 ]
   then exit $RETCODE
 fi



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1829) Generator : unable to distinguish real errors

2014-08-25 Thread Mathieu Bouchard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu Bouchard updated NUTCH-1829:


Description: 
The bin/nutch generate command is returning the same error code (-1) if there 
is an error or no new segment to process, so there is no way to tell if the 
error is real or not from a shell script. This problem is related to NUTCH-1828.

The problem can be fixed by modifying the following Java source file:
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934

At line 711, if there are no new segment, the generator returns -1, which is 
the same return code returned at line 714 if there was an error.


  was:
The bin/nutch generate command is returning the same error code (-1) if there 
is an error or no new segment to process, so there is no way to tell if the 
error is real or not from a shell script. This problem is related to NUTCH-1828.

The problem can be fixed by modifying the following Java source file:
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java

At line 711, if there are no new segment, the generator returns -1, which is 
the same return code returned at line 714 if there was an error.



 Generator : unable to distinguish real errors
 -

 Key: NUTCH-1829
 URL: https://issues.apache.org/jira/browse/NUTCH-1829
 Project: Nutch
  Issue Type: Bug
  Components: nutchNewbie
Affects Versions: 1.9, 2.2.1
 Environment: Ubuntu Server 14.04, OpenJDK 7
Reporter: Mathieu Bouchard

 The bin/nutch generate command is returning the same error code (-1) if there 
 is an error or no new segment to process, so there is no way to tell if the 
 error is real or not from a shell script. This problem is related to 
 NUTCH-1828.
 The problem can be fixed by modifying the following Java source file:
 http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934
 At line 711, if there are no new segment, the generator returns -1, which is 
 the same return code returned at line 714 if there was an error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1829) Generator : unable to distinguish real errors

2014-08-25 Thread Mathieu Bouchard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mathieu Bouchard updated NUTCH-1829:


Description: 
The bin/nutch generate command is returning the same error code (-1) if there 
is an error or no new segment to process, so there is no way to tell if the 
error is real or not from a shell script. This problem is related to NUTCH-1828.

The problem can be fixed by modifying the following Java source file:
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934view=markup

At line 711, if there are no new segment, the generator returns -1, which is 
the same return code returned at line 714 if there was an error.


  was:
The bin/nutch generate command is returning the same error code (-1) if there 
is an error or no new segment to process, so there is no way to tell if the 
error is real or not from a shell script. This problem is related to NUTCH-1828.

The problem can be fixed by modifying the following Java source file:
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934

At line 711, if there are no new segment, the generator returns -1, which is 
the same return code returned at line 714 if there was an error.



 Generator : unable to distinguish real errors
 -

 Key: NUTCH-1829
 URL: https://issues.apache.org/jira/browse/NUTCH-1829
 Project: Nutch
  Issue Type: Bug
  Components: nutchNewbie
Affects Versions: 1.9, 2.2.1
 Environment: Ubuntu Server 14.04, OpenJDK 7
Reporter: Mathieu Bouchard

 The bin/nutch generate command is returning the same error code (-1) if there 
 is an error or no new segment to process, so there is no way to tell if the 
 error is real or not from a shell script. This problem is related to 
 NUTCH-1828.
 The problem can be fixed by modifying the following Java source file:
 http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934view=markup
 At line 711, if there are no new segment, the generator returns -1, which is 
 the same return code returned at line 714 if there was an error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1829) Generator : unable to distinguish real errors

2014-08-25 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110193#comment-14110193
 ] 

lufeng commented on NUTCH-1829:
---

yes, I think we should distinguish different return result using different 
return code. So we can determine the next action according to this return code. 

 Generator : unable to distinguish real errors
 -

 Key: NUTCH-1829
 URL: https://issues.apache.org/jira/browse/NUTCH-1829
 Project: Nutch
  Issue Type: Bug
  Components: nutchNewbie
Affects Versions: 1.9, 2.2.1
 Environment: Ubuntu Server 14.04, OpenJDK 7
Reporter: Mathieu Bouchard

 The bin/nutch generate command is returning the same error code (-1) if there 
 is an error or no new segment to process, so there is no way to tell if the 
 error is real or not from a shell script. This problem is related to 
 NUTCH-1828.
 The problem can be fixed by modifying the following Java source file:
 http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934view=markup
 At line 711, if there are no new segment, the generator returns -1, which is 
 the same return code returned at line 714 if there was an error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Build failed in Jenkins: Nutch-trunk #2753

2014-08-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/2753/

--
[...truncated 2637 lines...]

jar:
  [jar] Building jar: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-metatags/parse-metatags.jar

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: protocol-file

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-metatags

copy-generated-lib:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-metatags
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data

init:
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/classes
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/lib
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-swf

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: parse-swf
[javac] Compiling 2 source files to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/classes
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning
[javac] Creating empty 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/classes/org/apache/nutch/parse/swf/package-info.class

jar:
  [jar] Building jar: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/parse-swf.jar

deps-test:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: protocol-file

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-swf

copy-generated-lib:
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-swf
 [copy] Copying 1 file to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-swf
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-tika/test/data
 [copy] Copying 9 files to 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-tika/test/data

init:
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-tika/classes
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-tika/test/lib
[mkdir] Created dir: 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-tika

init-plugin:

deps-jar:

init:

init-plugin:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml

compile:

jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve]  ERRORS
[ivy:resolve]   unknown resolver chain
[ivy:resolve]   unknown resolver null
[ivy:resolve]   unknown resolver chain
[ivy:resolve]   unknown resolver chain
[ivy:resolve]   unknown resolver null
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

BUILD FAILED
https://builds.apache.org/job/Nutch-trunk/ws/trunk/build.xml:112: The 
following error