[jira] [Commented] (NUTCH-1342) Read time out protocol-http
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108830#comment-14108830 ] Markus Jelsma commented on NUTCH-1342: -- Sebastian - i cannot reproduce this problem anymore for those URL's. Read time out protocol-http --- Key: NUTCH-1342 URL: https://issues.apache.org/jira/browse/NUTCH-1342 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.4, 1.5 Reporter: Markus Jelsma Assignee: Markus Jelsma Fix For: 1.10 Attachments: NUTCH-1342-1.6-1.patch For some reason some URL's always time out with protocol-http but not protocol-httpclient. The stack trace is always the same: {code} 2012-04-20 11:25:44,275 ERROR http.Http - Failed to get protocol output java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.FilterInputStream.read(FilterInputStream.java:116) at java.io.PushbackInputStream.read(PushbackInputStream.java:169) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:228) at org.apache.nutch.protocol.http.HttpResponse.init(HttpResponse.java:157) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138) {code} Some example URL's: * 404 http://www.fcgroningen.nl/tribunenamen/stemmen/ * 301 http://shop.fcgroningen.nl/aanbieding -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors
Mathieu Bouchard created NUTCH-1828: --- Summary: bin/crawl : incorrect handling of nutch errors Key: NUTCH-1828 URL: https://issues.apache.org/jira/browse/NUTCH-1828 Project: Nutch Issue Type: Bug Components: nutchNewbie Affects Versions: 2.2.1, 1.9 Environment: Ubuntu Server 14.04, OpenJDK 7 Reporter: Mathieu Bouchard We are using Solr with Nutch to provide a complete search engine for our website. I created a cron job that would use Nutch to crawl and update the Solr index each night. This cron job is trying to automatically correct some errors that could result in a corrupt crawldb. However, it seems that the bin/crawl command doesn't correctly propagate errors coming from bin/nutch. Here is an exemple from the bin/crawl script : $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR if [ $? -ne 0 ] then exit $? fi Even if there is an error in the nutch inject command, the crawl script always returns 0. The way I understand it, the exit code returned is the result of the shell test and not the result of the nutch inject command. To correct this, we would need to modify the script with something like : $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR RETCODE=$? if [ $RETCODE -ne 0 ] then exit $RETCODE fi -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors
[ https://issues.apache.org/jira/browse/NUTCH-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Bouchard updated NUTCH-1828: Attachment: apache-nutch-1.9-crawl-fix-retcode.patch Patch for Apache Nutch 1.9 bin/crawl : incorrect handling of nutch errors -- Key: NUTCH-1828 URL: https://issues.apache.org/jira/browse/NUTCH-1828 Project: Nutch Issue Type: Bug Components: nutchNewbie Affects Versions: 1.9, 2.2.1 Environment: Ubuntu Server 14.04, OpenJDK 7 Reporter: Mathieu Bouchard Attachments: apache-nutch-1.9-crawl-fix-retcode.patch We are using Solr with Nutch to provide a complete search engine for our website. I created a cron job that would use Nutch to crawl and update the Solr index each night. This cron job is trying to automatically correct some errors that could result in a corrupt crawldb. However, it seems that the bin/crawl command doesn't correctly propagate errors coming from bin/nutch. Here is an exemple from the bin/crawl script : $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR if [ $? -ne 0 ] then exit $? fi Even if there is an error in the nutch inject command, the crawl script always returns 0. The way I understand it, the exit code returned is the result of the shell test and not the result of the nutch inject command. To correct this, we would need to modify the script with something like : $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR RETCODE=$? if [ $RETCODE -ne 0 ] then exit $RETCODE fi -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1829) Generator : unable to distinguish real errors
[ https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Bouchard updated NUTCH-1829: Description: The bin/nutch generate command is returning the same error code (-1) if there is an error or no new segment to process, so there is no way to tell if the error is real or not from a shell script. This problem is related to NUTCH-1828. The problem can be fixed by modifying the following Java source file: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934 At line 711, if there are no new segment, the generator returns -1, which is the same return code returned at line 714 if there was an error. was: The bin/nutch generate command is returning the same error code (-1) if there is an error or no new segment to process, so there is no way to tell if the error is real or not from a shell script. This problem is related to NUTCH-1828. The problem can be fixed by modifying the following Java source file: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java At line 711, if there are no new segment, the generator returns -1, which is the same return code returned at line 714 if there was an error. Generator : unable to distinguish real errors - Key: NUTCH-1829 URL: https://issues.apache.org/jira/browse/NUTCH-1829 Project: Nutch Issue Type: Bug Components: nutchNewbie Affects Versions: 1.9, 2.2.1 Environment: Ubuntu Server 14.04, OpenJDK 7 Reporter: Mathieu Bouchard The bin/nutch generate command is returning the same error code (-1) if there is an error or no new segment to process, so there is no way to tell if the error is real or not from a shell script. This problem is related to NUTCH-1828. The problem can be fixed by modifying the following Java source file: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934 At line 711, if there are no new segment, the generator returns -1, which is the same return code returned at line 714 if there was an error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1829) Generator : unable to distinguish real errors
[ https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Bouchard updated NUTCH-1829: Description: The bin/nutch generate command is returning the same error code (-1) if there is an error or no new segment to process, so there is no way to tell if the error is real or not from a shell script. This problem is related to NUTCH-1828. The problem can be fixed by modifying the following Java source file: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934view=markup At line 711, if there are no new segment, the generator returns -1, which is the same return code returned at line 714 if there was an error. was: The bin/nutch generate command is returning the same error code (-1) if there is an error or no new segment to process, so there is no way to tell if the error is real or not from a shell script. This problem is related to NUTCH-1828. The problem can be fixed by modifying the following Java source file: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934 At line 711, if there are no new segment, the generator returns -1, which is the same return code returned at line 714 if there was an error. Generator : unable to distinguish real errors - Key: NUTCH-1829 URL: https://issues.apache.org/jira/browse/NUTCH-1829 Project: Nutch Issue Type: Bug Components: nutchNewbie Affects Versions: 1.9, 2.2.1 Environment: Ubuntu Server 14.04, OpenJDK 7 Reporter: Mathieu Bouchard The bin/nutch generate command is returning the same error code (-1) if there is an error or no new segment to process, so there is no way to tell if the error is real or not from a shell script. This problem is related to NUTCH-1828. The problem can be fixed by modifying the following Java source file: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934view=markup At line 711, if there are no new segment, the generator returns -1, which is the same return code returned at line 714 if there was an error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1829) Generator : unable to distinguish real errors
[ https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110193#comment-14110193 ] lufeng commented on NUTCH-1829: --- yes, I think we should distinguish different return result using different return code. So we can determine the next action according to this return code. Generator : unable to distinguish real errors - Key: NUTCH-1829 URL: https://issues.apache.org/jira/browse/NUTCH-1829 Project: Nutch Issue Type: Bug Components: nutchNewbie Affects Versions: 1.9, 2.2.1 Environment: Ubuntu Server 14.04, OpenJDK 7 Reporter: Mathieu Bouchard The bin/nutch generate command is returning the same error code (-1) if there is an error or no new segment to process, so there is no way to tell if the error is real or not from a shell script. This problem is related to NUTCH-1828. The problem can be fixed by modifying the following Java source file: http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934view=markup At line 711, if there are no new segment, the generator returns -1, which is the same return code returned at line 714 if there was an error. -- This message was sent by Atlassian JIRA (v6.2#6252)
Build failed in Jenkins: Nutch-trunk #2753
See https://builds.apache.org/job/Nutch-trunk/2753/ -- [...truncated 2637 lines...] jar: [jar] Building jar: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-metatags/parse-metatags.jar deps-test: init: init-plugin: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml compile: jar: deps-test: deploy: copy-generated-lib: init: init-plugin: deps-jar: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml compile: [echo] Compiling plugin: protocol-file jar: deps-test: deploy: copy-generated-lib: deploy: [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-metatags copy-generated-lib: [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-metatags [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/data init: [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/classes [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/test/lib [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-swf init-plugin: deps-jar: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml compile: [echo] Compiling plugin: parse-swf [javac] Compiling 2 source files to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/classes [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] 1 warning [javac] Creating empty https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/classes/org/apache/nutch/parse/swf/package-info.class jar: [jar] Building jar: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-swf/parse-swf.jar deps-test: init: init-plugin: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml compile: jar: deps-test: deploy: copy-generated-lib: init: init-plugin: deps-jar: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml compile: [echo] Compiling plugin: protocol-file jar: deps-test: deploy: copy-generated-lib: deploy: [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-swf copy-generated-lib: [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-swf [copy] Copying 1 file to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-swf [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-tika/test/data [copy] Copying 9 files to https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-tika/test/data init: [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-tika/classes [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/parse-tika/test/lib [mkdir] Created dir: https://builds.apache.org/job/Nutch-trunk/ws/trunk/build/plugins/parse-tika init-plugin: deps-jar: init: init-plugin: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml compile: jar: clean-lib: resolve-default: [ivy:resolve] :: loading settings :: file = https://builds.apache.org/job/Nutch-trunk/ws/trunk/ivy/ivysettings.xml [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] ERRORS [ivy:resolve] unknown resolver chain [ivy:resolve] unknown resolver null [ivy:resolve] unknown resolver chain [ivy:resolve] unknown resolver chain [ivy:resolve] unknown resolver null [ivy:resolve] [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS BUILD FAILED https://builds.apache.org/job/Nutch-trunk/ws/trunk/build.xml:112: The following error