Hello everyone,

I have added a set of seeds to crawl using this command
*
./bin/crawl /largeSeeds 1 http://localhost:8983/solr/ddcd 4*

For first iteration all of the commands(*inject, **generate, **fetch, **parse, **update-table, **Indexer & delete duplicates.*) got executed successfully. For second iteration, *"update-table" *command got failed (please see error log for reference), because of failure of this command the whole process gets terminated.


****************************************************LOG START************************************************************************************************
CrawlDB update for 1
/usr/share/searchEngine/nutch-branch-2.3.1/runtime/deploy/bin/nutch updatedb -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true 1452969522-27478 -crawlId 1 16/01/17 02:10:17 INFO crawl.DbUpdaterJob: DbUpdaterJob: starting at 2016-01-17 02:10:17 16/01/17 02:10:17 INFO crawl.DbUpdaterJob: DbUpdaterJob: batchId: 1452969522-27478 16/01/17 02:10:17 INFO plugin.PluginRepository: Plugins: looking in: /tmp/hadoop-root/hadoop-unjar3649584948711945520/classes/plugins 16/01/17 02:10:18 INFO plugin.PluginRepository: Plugin Auto-activation mode: [true]
16/01/17 02:10:18 INFO plugin.PluginRepository: Registered Plugins:
16/01/17 02:10:18 INFO plugin.PluginRepository: Rel-Tag microformat Parser/Indexer/Querier (microformats-reltag) 16/01/17 02:10:18 INFO plugin.PluginRepository: HTTP Framework (lib-http) 16/01/17 02:10:18 INFO plugin.PluginRepository: Html Parse Plug-in (parse-html) 16/01/17 02:10:18 INFO plugin.PluginRepository: MetaTags (parse-metatags) 16/01/17 02:10:18 INFO plugin.PluginRepository: Http / Https Protocol Plug-in (protocol-httpclient) 16/01/17 02:10:18 INFO plugin.PluginRepository: the nutch core extension points (nutch-extensionpoints) 16/01/17 02:10:18 INFO plugin.PluginRepository: Basic Indexing Filter (index-basic)
16/01/17 02:10:18 INFO plugin.PluginRepository:     XML Libraries (lib-xml)
16/01/17 02:10:18 INFO plugin.PluginRepository: JavaScript Parser (parse-js) 16/01/17 02:10:18 INFO plugin.PluginRepository: Anchor Indexing Filter (index-anchor) 16/01/17 02:10:18 INFO plugin.PluginRepository: Tika Parser Plug-in (parse-tika) 16/01/17 02:10:18 INFO plugin.PluginRepository: Top Level Domain Plugin (tld) 16/01/17 02:10:18 INFO plugin.PluginRepository: Language Identification Parser/Filter (language-identifier) 16/01/17 02:10:18 INFO plugin.PluginRepository: Regex URL Filter Framework (lib-regex-filter) 16/01/17 02:10:18 INFO plugin.PluginRepository: Metadata Indexing Filter (index-metadata) 16/01/17 02:10:18 INFO plugin.PluginRepository: CyberNeko HTML Parser (lib-nekohtml) 16/01/17 02:10:18 INFO plugin.PluginRepository: Subcollection indexing and query filter (subcollection) 16/01/17 02:10:18 INFO plugin.PluginRepository: Link Analysis Scoring Plug-in (scoring-link) 16/01/17 02:10:18 INFO plugin.PluginRepository: Pass-through URL Normalizer (urlnormalizer-pass) 16/01/17 02:10:18 INFO plugin.PluginRepository: OPIC Scoring Plug-in (scoring-opic) 16/01/17 02:10:18 INFO plugin.PluginRepository: More Indexing Filter (index-more) 16/01/17 02:10:18 INFO plugin.PluginRepository: Http Protocol Plug-in (protocol-http) 16/01/17 02:10:18 INFO plugin.PluginRepository: SOLRIndexWriter (indexer-solr) 16/01/17 02:10:18 INFO plugin.PluginRepository: Creative Commons Plugins (creativecommons)
16/01/17 02:10:18 INFO plugin.PluginRepository: Registered Extension-Points:
16/01/17 02:10:18 INFO plugin.PluginRepository: Parse Filter (org.apache.nutch.parse.ParseFilter) 16/01/17 02:10:18 INFO plugin.PluginRepository: Nutch Index Cleaning Filter (org.apache.nutch.indexer.IndexCleaningFilter) 16/01/17 02:10:18 INFO plugin.PluginRepository: Nutch Content Parser (org.apache.nutch.parse.Parser) 16/01/17 02:10:18 INFO plugin.PluginRepository: Nutch URL Filter (org.apache.nutch.net.URLFilter) 16/01/17 02:10:18 INFO plugin.PluginRepository: Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 16/01/17 02:10:18 INFO plugin.PluginRepository: Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 16/01/17 02:10:18 INFO plugin.PluginRepository: Nutch Protocol (org.apache.nutch.protocol.Protocol) 16/01/17 02:10:18 INFO plugin.PluginRepository: Nutch Index Writer (org.apache.nutch.indexer.IndexWriter) 16/01/17 02:10:18 INFO plugin.PluginRepository: Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 16/01/17 02:10:19 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 16/01/17 02:10:19 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 16/01/17 02:10:19 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress 16/01/17 02:10:19 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 16/01/17 02:10:19 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x60a2630a connecting to ZooKeeper ensemble=localhost:2181 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:host.name=cism479 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_65 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/jdk1.8.0_65/jre 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/share/searchEngine/hadoop-2.5.2/conf:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/activation-1.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-io-2.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/paranamer-2.3.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jsr305-1.3.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/hadoop-auth-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-el-1.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jettison-1.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/avro-1.7.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-net-3.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/hadoop-annotations-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/guava-11.0.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jsch-0.1.42.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/xz-1.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/asm-3.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-collections-3.2.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/junit-4.11.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/hadoop-common-2.5.2-tests.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/hadoop-nfs-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/common/hadoop-common-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/commons-el-1.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/hadoop-hdfs-2.5.2-tests.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/hadoop-hdfs-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/hdfs/hadoop-hdfs-nfs-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jline-0.9.94.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/activation-1.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jsr305-1.3.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/guice-3.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/xz-1.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/commons-httpclient-3.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/asm-3.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-server-tests-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-api-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-server-common-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-client-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/yarn/hadoop-yarn-common-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/hadoop-annotations-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.2-tests.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.5.2.jar:/usr/share/searchEngine/hadoop-2.5.2/contrib/capacity-scheduler/*.jar:/usr/share/searchEngine/hbase-0.98.8-hadoop2/lib/*.jar:/usr/share/searchEngine/hbase-0.98.8-hadoop2/conf 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/share/searchEngine/hadoop-2.5.2/lib/native 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:os.version=3.16.0-30-generic 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:user.name=root 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:user.home=/root 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/share/searchEngine/nutch-branch-2.3.1/runtime/deploy 16/01/17 02:10:19 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x60a2630a, quorum=localhost:2181, baseZNode=/hbase 16/01/17 02:10:19 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 16/01/17 02:10:19 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 16/01/17 02:10:19 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x152495dedd00143, negotiated timeout = 90000 16/01/17 02:10:21 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 16/01/17 02:10:21 WARN store.HBaseStore: Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: '1_webpage'Assuming they are the same. 16/01/17 02:10:21 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x60a2630a connecting to ZooKeeper ensemble=localhost:2181 16/01/17 02:10:21 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x60a2630a, quorum=localhost:2181, baseZNode=/hbase 16/01/17 02:10:21 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 16/01/17 02:10:21 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 16/01/17 02:10:21 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x152495dedd00144, negotiated timeout = 90000 16/01/17 02:10:23 INFO zookeeper.ZooKeeper: Session: 0x152495dedd00144 closed
16/01/17 02:10:23 INFO zookeeper.ClientCnxn: EventThread shut down
16/01/17 02:10:23 WARN store.HBaseStore: Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: '1_webpage'Assuming they are the same. 16/01/17 02:10:23 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x60a2630a connecting to ZooKeeper ensemble=localhost:2181 16/01/17 02:10:23 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x60a2630a, quorum=localhost:2181, baseZNode=/hbase 16/01/17 02:10:23 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 16/01/17 02:10:23 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 16/01/17 02:10:23 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x152495dedd00145, negotiated timeout = 90000 16/01/17 02:10:23 INFO zookeeper.ZooKeeper: Session: 0x152495dedd00145 closed
16/01/17 02:10:23 INFO zookeeper.ClientCnxn: EventThread shut down
16/01/17 02:10:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/01/17 02:10:27 WARN store.HBaseStore: Mismatching schema's names. Mappingfile schema: 'webpage'. PersistentClass schema's name: '1_webpage'Assuming they are the same. 16/01/17 02:10:27 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x60a2630a connecting to ZooKeeper ensemble=localhost:2181 16/01/17 02:10:27 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x60a2630a, quorum=localhost:2181, baseZNode=/hbase 16/01/17 02:10:27 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 16/01/17 02:10:27 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 16/01/17 02:10:27 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x152495dedd00146, negotiated timeout = 90000 16/01/17 02:10:27 INFO zookeeper.ZooKeeper: Session: 0x152495dedd00146 closed
16/01/17 02:10:27 INFO zookeeper.ClientCnxn: EventThread shut down
16/01/17 02:10:27 INFO mapreduce.JobSubmitter: number of splits:2
16/01/17 02:10:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1452929501009_0024 16/01/17 02:10:28 INFO impl.YarnClientImpl: Submitted application application_1452929501009_0024 16/01/17 02:10:28 INFO mapreduce.Job: The url to track the job: http://cism479:8088/proxy/application_1452929501009_0024/
16/01/17 02:10:28 INFO mapreduce.Job: Running job: job_1452929501009_0024
16/01/17 02:10:39 INFO mapreduce.Job: Job job_1452929501009_0024 running in uber mode : false
16/01/17 02:10:39 INFO mapreduce.Job:  map 0% reduce 0%
16/01/17 02:11:37 INFO mapreduce.Job: Task Id : attempt_1452929501009_0024_m_000000_0, Status : FAILED Error: java.net.MalformedURLException: For input string: "#10;from <a href="https:"
    at java.net.URL.<init>(URL.java:620)
    at java.net.URL.<init>(URL.java:483)
    at java.net.URL.<init>(URL.java:432)
    at org.apache.nutch.util.TableUtil.reverseUrl(TableUtil.java:43)
    at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:96)
    at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:38)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NumberFormatException: For input string: "#10;from <a href="https:" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:569)
    at java.lang.Integer.parseInt(Integer.java:615)
    at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:216)
    at java.net.URL.<init>(URL.java:615)
    ... 13 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

16/01/17 02:12:13 INFO mapreduce.Job:  map 33% reduce 0%
16/01/17 02:12:24 INFO mapreduce.Job:  map 50% reduce 0%
16/01/17 02:12:44 INFO mapreduce.Job: Task Id : attempt_1452929501009_0024_m_000000_1, Status : FAILED Error: java.net.MalformedURLException: For input string: "#10;from <a href="https:"
    at java.net.URL.<init>(URL.java:620)
    at java.net.URL.<init>(URL.java:483)
    at java.net.URL.<init>(URL.java:432)
    at org.apache.nutch.util.TableUtil.reverseUrl(TableUtil.java:43)
    at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:96)
    at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:38)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NumberFormatException: For input string: "#10;from <a href="https:" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:569)
    at java.lang.Integer.parseInt(Integer.java:615)
    at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:216)
    at java.net.URL.<init>(URL.java:615)
    ... 13 more

16/01/17 02:13:19 INFO mapreduce.Job: Task Id : attempt_1452929501009_0024_m_000000_2, Status : FAILED Error: java.net.MalformedURLException: For input string: "#10;from <a href="https:"
    at java.net.URL.<init>(URL.java:620)
    at java.net.URL.<init>(URL.java:483)
    at java.net.URL.<init>(URL.java:432)
    at org.apache.nutch.util.TableUtil.reverseUrl(TableUtil.java:43)
    at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:96)
    at org.apache.nutch.crawl.DbUpdateMapper.map(DbUpdateMapper.java:38)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NumberFormatException: For input string: "#10;from <a href="https:" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:569)
    at java.lang.Integer.parseInt(Integer.java:615)
    at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:216)
    at java.net.URL.<init>(URL.java:615)
    ... 13 more

16/01/17 02:13:42 INFO mapreduce.Job:  map 100% reduce 100%
16/01/17 02:13:43 INFO mapreduce.Job: Job job_1452929501009_0024 failed with state FAILED due to: Task failed task_1452929501009_0024_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

16/01/17 02:13:44 INFO mapreduce.Job: Counters: 34
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=49949067
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1193
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=1
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters
        Failed map tasks=4
        Launched map tasks=5
        Other local map tasks=3
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=829677
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=276559
        Total vcore-seconds taken by all map tasks=276559
        Total megabyte-seconds taken by all map tasks=849589248
    Map-Reduce Framework
        Map input records=30201
        Map output records=1164348
        Map output bytes=250659088
        Map output materialized bytes=49832245
        Input split bytes=1193
        Combine input records=0
        Spilled Records=1164348
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=3541
        CPU time spent (ms)=42980
        Physical memory (bytes) snapshot=2062766080
        Virtual memory (bytes) snapshot=5086490624
        Total committed heap usage (bytes)=2127036416
    File Input Format Counters
        Bytes Read=0
Exception in thread "main" java.lang.RuntimeException: job failed: name=[1]update-table, jobid=job_1452929501009_0024
    at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
    at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:111)
at org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:140)
    at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:174)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:178)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Error running:
/usr/share/searchEngine/nutch-branch-2.3.1/runtime/deploy/bin/nutch updatedb -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true 1452969522-27478 -crawlId 1
Failed with exit value 1.
****************************************************LOG END ************************************************************************************************

As its pretty clear from error that its because of the malformed urls. So is there a way to get rid of this kind of malformed urls ? or is there any solution which could either skip these kind of urls or bypasss them, so the subsequent processes get executed ?

**Please advise.

Kshitij Shukla
Software developer
CIS

--

------------------------------

*Cyber Infrastructure (P) Limited, [CIS] **(CMMI Level 3 Certified)*

Central India's largest Technology company.

*Ensuring the success of our clients and partners through our highly optimized Technology solutions.*

www.cisin.com | +Cisin <https://plus.google.com/+Cisin/> | Linkedin <https://www.linkedin.com/company/cyber-infrastructure-private-limited> | Offices: *Indore, India.* *Singapore. Silicon Valley, USA*.

DISCLAIMER: INFORMATION PRIVACY is important for us, If you are not the intended recipient, you should delete this message and are notified that any disclosure, copying or distribution of this message, or taking any action based on it, is strictly prohibited by Law.

Reply via email to