Please disregard my previous email - the command was launched from incorrect directory.
I don't see improvement for my latest run: [r...@snv-qa-lin-domain-crawler1 software]# hfs -text /user/tomcatadmin/lpm/15-100226111258118-tomcatadmin/parse/0/part-m-00000 10/02/27 07:36:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library 10/02/27 07:36:28 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 10/02/27 07:36:28 INFO compress.CodecPool: Got brand-new decompressor text: java.io.IOException: WritableName can't load class: org.apache.nutch.parse.Parse Here is the command line (see bold): 510 1255 38.3 0.1 1441444 62660 ? Sl 07:23 0:02 /usr/local/jdk1.6.0_14/bin/java -Xmx1000m -Dhadoop.log.dir=/opt/kindsight/nutchbase/logs -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl -Dhadoop.log.file=hadoop.log -Djava.library.path=/opt/kindsight/nutchbase/lib/native/Linux-amd64-64 -classpath /opt/kindsight/nutchbase:/opt/kindsight/nutchbase/conf:/opt/kindsight/nutchbase/conf/batchclient:/opt/kindsight/nutchbase/lib/batchplatform.jar:/opt/kindsight/nutchbase/lib/colo_common.jar:/opt/kindsight/nutchbase/lib/csreader.jar:/opt/kindsight/nutchbase/lib/pr_common.jar:/opt/kindsight/nutchbase/lib/nutch-1.0.job:/opt/kindsight/nutchbase/lib/3rdparty/commons-collections-3.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/servlet-api.jar:/opt/kindsight/nutchbase/lib/3rdparty/lucene-misc-2.4.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/tika-0.1-incubating.jar:/opt/kindsight/nutchbase/lib/3rdparty/junit-3.8.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/oozie-core-0.20.0.o0.1-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/hbase-0.20.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/lucene-core-2.4.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/apache-solr-solrj-1.3.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/json_simple-1.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/xerces-2_6_2.jar:/opt/kindsight/nutchbase/lib/3rdparty/jetty-5.1.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/jets3t-0.6.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-lang-2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/xerces-2_6_2-apis.jar:/opt/kindsight/nutchbase/lib/3rdparty/apache-solr-common-1.3.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/jdom-1.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-fileupload-1.3-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-httpclient-3.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-1.0.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-api-1.0.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-beanutils-1.8.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/nutch-1.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-io-1.3.2.jar:/opt/kindsight/nutchbase/lib/3rdparty/icu4j-4_0_1.jar:/opt/kindsight/nutchbase/lib/3rdparty/log4j-1.2.15.jar:/opt/kindsight/nutchbase/lib/3rdparty/oozie-client-0.20.0.o0.1-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/batch/hbase-0.20.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/batch/nutch-1.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/batch/hadoop-core.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-1.1.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-codec-1.3.jar:/opt/kindsight/nutchbase/lib/3rdparty/jakarta-oro-2.0.8.jar:/opt/kindsight/nutchbase/lib/3rdparty/hsqldb-1.8.0.7.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-pool-1.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-dbcp-1.2.2.jar:/opt/kindsight/nutchbase/lib/3rdparty/mysql-connector-java-5.1.10-bin.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-httpclient-3.0.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/zookeeper-3.2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/taglibs-i18n.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-collections-3.2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-cli-1.2.jar:/usr/local/jdk1.6.0_14/lib/tools.jar:/opt/kindsight/nutchbase/build/nutch-*.job:/opt/kindsight/nutchbase/nutch-*.job:/opt/kindsight/nutchbase/lib/batchplatform.jar:/opt/kindsight/nutchbase/lib/colo_common.jar:/opt/kindsight/nutchbase/lib/csreader.jar:/opt/kindsight/nutchbase/lib/pr_common.jar:/opt/kindsight/nutchbase/lib/jetty-ext/*.jar com.rialto.nutchbase.fetcher.Fetcher *-libjars /opt/kindsight/nutchbase/lib/3rdparty/nutch-1.0.jar,/opt/kindsight/nutchbase/lib/3rdparty/hbase-0.20.1.jar * -D db.max.outlinks.per.page=1000 domaincrawltable lpm/15-100226111258118-tomcatadmin/generate/0 lpm/15-100226111258118-tomcatadmin/parse/0 -threads 10 -actionid 15-100226111258118-tomcatad...@domain_crawl On Sat, Feb 27, 2010 at 7:29 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Now I see this in the log: > [r...@snv-qa-lin-domain-crawler1 webmap_workflow]# hfs -text > /user/tomcatadmin/lpm/15-100226111258118-tomcatadmin/generate/0/part-r-00000 > 2010-02-27 07:25:08,062 WARN [main] conf.Configuration DEPRECATED: > hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is > deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to > override properties of core-default.xml, mapred-default.xml and > hdfs-default.xml respectively > 2010-02-27 07:25:08,062 WARN [main] conf.Configuration DEPRECATED: > hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is > deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to > override properties of core-default.xml, mapred-default.xml and > hdfs-default.xml respectively > *2010-02-27 07:25:08,342 WARN [main] util.NativeCodeLoader Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > *2010-02-27 07:25:08,342 WARN [main] util.NativeCodeLoader Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2010-02-27 07:25:08,342 INFO [main] compress.CodecPool Got brand-new > decompressor > 2010-02-27 07:25:08,342 INFO [main] compress.CodecPool Got brand-new > decompressor > text: null > > But we do have native library as specified by -Djava.library.path=/opt/ > kindsight/nutchbase/lib/native/Linux-amd64-64: > > [r...@snv-qa-lin-domain-crawler1 webmap_workflow]# ls > /opt/kindsight/nutchbase/lib/native/Linux-amd64-64/ > libhadoop.a libhadoop.la libhadoop.so libhadoop.so.1 > libhadoop.so.1.0.0 > > > > On Sat, Feb 27, 2010 at 5:52 AM, Julien Nioche < > lists.digitalpeb...@gmail.com> wrote: > >> Look at the Hadoop option -libjars and use it to point to the >> nutch-1.0.jar, >> that should work >> J. >> >> On 27 February 2010 13:08, Ted Yu <yuzhih...@gmail.com> wrote: >> >> > Hi, >> > We use nutch to perform domain crawl but I see strange 'can't load >> class' >> > error: >> > >> > [r...@snv-qa-lin-domain-crawler1 software]# hfs -text >> > >> /user/tomcatadmin/lpm/12-100226111258118-tomcatadmin/parse/0/part-m-00000 >> > 10/02/27 04:45:10 INFO util.NativeCodeLoader: Loaded the native-hadoop >> > library >> > 10/02/27 04:45:10 INFO zlib.ZlibFactory: Successfully loaded & >> initialized >> > native-zlib library >> > 10/02/27 04:45:10 INFO compress.CodecPool: Got brand-new decompressor >> > text: java.io.IOException: WritableName can't load class: >> > org.apache.nutch.parse.Parse >> > >> > Here is the commandline which includes nutch-1.0.jar that contains >> > org.apache.nutch.parse.Parse (see bold): >> > >> > 510 32488 1.3 0.1 1370060 53264 ? Sl 04:35 0:02 >> > /usr/local/jdk1.6.0_14/bin/java -Xmx1000m >> > -Dhadoop.log.dir=/opt/kindsight/nutchbase/logs >> > >> > >> -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl >> > -Dhadoop.log.file=hadoop.log >> > -Djava.library.path=/opt/kindsight/nutchbase/lib/native/Linux-amd64-64 >> > -classpath >> > >> > >> /opt/kindsight/nutchbase:/opt/kindsight/nutchbase/conf:/opt/kindsight/nutchbase/conf/batchclient:/opt/kindsight/nutchbase/lib/batchplatform.jar:/opt/kindsight/nutchbase/lib/colo_common.jar:/opt/kindsight/nutchbase/lib/csreader.jar:/opt/kindsight/nutchbase/lib/pr_common.jar:/opt/kindsight/nutchbase/lib/nutch-1.0.job:/opt/kindsight/nutchbase/lib/3rdparty/commons-collections-3.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/servlet-api.jar:/opt/kindsight/nutchbase/lib/3rdparty/lucene-misc-2.4.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/tika-0.1-incubating.jar:/opt/kindsight/nutchbase/lib/3rdparty/junit-3.8.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/oozie-core-0.20.0.o0.1-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/hbase-0.20.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/lucene-core-2.4.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/apache-solr-solrj-1.3.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/json_simple-1.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/xerces-2_6_2.jar:/opt/kindsight/nutchbase/lib/3rdparty/jetty-5.1.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/jets3t-0.6.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-lang-2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/xerces-2_6_2-apis.jar:/opt/kindsight/nutchbase/lib/3rdparty/apache-solr-common-1.3.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/jdom-1.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-fileupload-1.3-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-httpclient-3.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-1.0.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-api-1.0.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-beanutils-1.8.0.jar: >> > >> > >> */opt/kindsight/nutchbase/lib/3rdparty/nutch-1.0.jar*:/opt/kindsight/nutchbase/lib/3rdparty/commons-io-1.3.2.jar:/opt/kindsight/nutchbase/lib/3rdparty/icu4j-4_0_1.jar:/opt/kindsight/nutchbase/lib/3rdparty/log4j-1.2.15.jar:/opt/kindsight/nutchbase/lib/3rdparty/oozie-client-0.20.0.o0.1-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/batch/hadoop-core.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-1.1.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-codec-1.3.jar:/opt/kindsight/nutchbase/lib/3rdparty/jakarta-oro-2.0.8.jar:/opt/kindsight/nutchbase/lib/3rdparty/hsqldb-1.8.0.7.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-pool-1.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-dbcp-1.2.2.jar:/opt/kindsight/nutchbase/lib/3rdparty/mysql-connector-java-5.1.10-bin.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-httpclient-3.0.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/zookeeper-3.2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/taglibs-i18n.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-collections-3.2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-cli-1.2.jar:/usr/local/jdk1.6.0_14/lib/tools.jar:/opt/kindsight/nutchbase/build/nutch-*.job:/opt/kindsight/nutchbase/nutch-*.job:/opt/kindsight/nutchbase/lib/batchplatform.jar:/opt/kindsight/nutchbase/lib/colo_common.jar:/opt/kindsight/nutchbase/lib/csreader.jar:/opt/kindsight/nutchbase/lib/pr_common.jar:/opt/kindsight/nutchbase/lib/jetty-ext/*.jar >> > com.rialto.nutchbase.fetcher.Fetcher -D db.max.outlinks.per.page=1000 >> > domaincrawltable lpm/12-100226111258118-tomcatadmin/generate/2 >> > lpm/12-100226111258118-tomcatadmin/parse/2 -threads 10 -actionid >> > 12-100226111258118-tomcatad...@domain_crawl >> > >> > Please shed some light on the above error. >> > >> > Thanks >> > >> >> >> >> -- >> DigitalPebble Ltd >> http://www.digitalpebble.com >> > >