Hi - not everyone knows about Nutch 2.x and i don't know anything about Gora 
etc. Some others do, but not everyone reads their email all day. Be patient and 
perhaps you might try the Apache Gora list as well.
Markus

 
 
-----Original message-----
> From:Kshitij Shukla <[email protected]>
> Sent: Thursday 21st January 2016 13:45
> To: [email protected]
> Subject: [CIS-CMMI-3] Re: IllegalArgumentException: Row length 41221 is &gt; 
> 32767
> 
> So, no one is willing to help/guide me through the error ?
> 
> On Wednesday 20 January 2016 12:24 PM, Kshitij Shukla wrote:
> > Hello everyone,
> >
> > I have added a set of seeds to crawl using this command
> > *
> > ./bin/crawl /largeSeeds 1 http://localhost:8983/solr/ddcd 4*
> >
> > For first iteration all of the commands(*inject, **generate, **fetch, 
> > **parse, **update-table, **Indexer & delete duplicates.*) got executed 
> > successfully.
> > For second iteration, *"CrawlDB update" *command got failed (please 
> > see error log for reference), because of failure of this command the 
> > whole process gets terminated.
> >
> >
> > ****************************************************LOG 
> > START************************************************************************************************
> > 16/01/20 02:45:19 INFO parse.ParserJob: ParserJob: finished at 
> > 2016-01-20 02:45:19, time elapsed: 00:06:57
> > CrawlDB update for 1
> > /usr/share/searchEngine/nutch-branch-2.3.1/runtime/deploy/bin/nutch 
> > updatedb -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m 
> > -D mapred.reduce.tasks.speculative.execution=false -D 
> > mapred.map.tasks.speculative.execution=false -D 
> > mapred.compress.map.output=true 1453230757-13191 -crawlId 1
> > 16/01/20 02:45:27 INFO crawl.DbUpdaterJob: DbUpdaterJob: starting at 
> > 2016-01-20 02:45:27
> > 16/01/20 02:45:27 INFO crawl.DbUpdaterJob: DbUpdaterJob: batchId: 
> > 1453230757-13191
> > 16/01/20 02:45:27 INFO plugin.PluginRepository: Plugins: looking in: 
> > /tmp/hadoop-root/hadoop-unjar5654418190157422003/classes/plugins
> > 16/01/20 02:45:28 INFO plugin.PluginRepository: Plugin Auto-activation 
> > mode: [true]
> > 16/01/20 02:45:28 INFO plugin.PluginRepository: Registered Plugins:
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     HTTP Framework 
> > (lib-http)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Html Parse Plug-in 
> > (parse-html)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     MetaTags 
> > (parse-metatags)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     the nutch core 
> > extension points (nutch-extensionpoints)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Basic Indexing 
> > Filter (index-basic)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     XML Libraries 
> > (lib-xml)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Anchor Indexing 
> > Filter (index-anchor)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Basic URL 
> > Normalizer (urlnormalizer-basic)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Language 
> > Identification Parser/Filter (language-identifier)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Metadata Indexing 
> > Filter (index-metadata)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     CyberNeko HTML 
> > Parser (lib-nekohtml)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Subcollection 
> > indexing and query filter (subcollection)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository: SOLRIndexWriter 
> > (indexer-solr)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Rel-Tag 
> > microformat Parser/Indexer/Querier (microformats-reltag)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Http / Https 
> > Protocol Plug-in (protocol-httpclient)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     JavaScript Parser 
> > (parse-js)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Tika Parser 
> > Plug-in (parse-tika)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Top Level Domain 
> > Plugin (tld)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Regex URL Filter 
> > Framework (lib-regex-filter)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Regex URL 
> > Normalizer (urlnormalizer-regex)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Link Analysis 
> > Scoring Plug-in (scoring-link)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     OPIC Scoring 
> > Plug-in (scoring-opic)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     More Indexing 
> > Filter (index-more)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Http Protocol 
> > Plug-in (protocol-http)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Creative Commons 
> > Plugins (creativecommons)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository: Registered 
> > Extension-Points:
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Parse Filter 
> > (org.apache.nutch.parse.ParseFilter)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Nutch Index 
> > Cleaning Filter (org.apache.nutch.indexer.IndexCleaningFilter)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Nutch Content 
> > Parser (org.apache.nutch.parse.Parser)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Nutch URL Filter 
> > (org.apache.nutch.net.URLFilter)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Nutch Scoring 
> > (org.apache.nutch.scoring.ScoringFilter)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Nutch URL 
> > Normalizer (org.apache.nutch.net.URLNormalizer)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Nutch Protocol 
> > (org.apache.nutch.protocol.Protocol)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Nutch Index Writer 
> > (org.apache.nutch.indexer.IndexWriter)
> > 16/01/20 02:45:28 INFO plugin.PluginRepository:     Nutch Indexing 
> > Filter (org.apache.nutch.indexer.IndexingFilter)
> > 16/01/20 02:45:29 INFO Configuration.deprecation: 
> > mapred.map.tasks.speculative.execution is deprecated. Instead, use 
> > mapreduce.map.speculative
> > 16/01/20 02:45:29 INFO Configuration.deprecation: 
> > mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
> > mapreduce.reduce.speculative
> > 16/01/20 02:45:29 INFO Configuration.deprecation: 
> > mapred.compress.map.output is deprecated. Instead, use 
> > mapreduce.map.output.compress
> > 16/01/20 02:45:29 INFO Configuration.deprecation: mapred.reduce.tasks 
> > is deprecated. Instead, use mapreduce.job.reduces
> > 16/01/20 02:45:29 INFO zookeeper.RecoverableZooKeeper: Process 
> > identifier=hconnection-0x60a2630a connecting to ZooKeeper 
> > ensemble=localhost:2181
> > 16/01/20 02:45:29 INFO zookeeper.ZooKeeper: Client 
> > environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
> > 16/01/20 02:45:29 INFO zookeeper.ZooKeeper: Client 
> > environment:host.name=cism479
> > 16/01/20 02:45:29 INFO zookeeper.ZooKeeper: Client 
> > environment:java.version=1.8.0_65
> > 16/01/20 02:45:29 INFO zookeeper.ZooKeeper: Client 
> > environment:java.vendor=Oracle Corporation
> > 16/01/20 02:45:29 INFO zookeeper.ZooKeeper: Client 
> > environment:java.home=/usr/lib/jvm/jdk1.8.0_65/jre
> > 16/01/20 02:45:35 INFO zookeeper.ClientCnxn: EventThread shut down
> > 16/01/20 02:45:35 INFO mapreduce.JobSubmitter: number of splits:2
> > 16/01/20 02:45:36 INFO mapreduce.JobSubmitter: Submitting tokens for 
> > job: job_1453210838763_0011
> > 16/01/20 02:45:36 INFO impl.YarnClientImpl: Submitted application 
> > application_1453210838763_0011
> > 16/01/20 02:45:36 INFO mapreduce.Job: The url to track the job: 
> > http://cism479:8088/proxy/application_1453210838763_0011/
> > 16/01/20 02:45:36 INFO mapreduce.Job: Running job: job_1453210838763_0011
> > 16/01/20 02:45:48 INFO mapreduce.Job: Job job_1453210838763_0011 
> > running in uber mode : false
> > 16/01/20 02:45:48 INFO mapreduce.Job:  map 0% reduce 0%
> > 16/01/20 02:47:31 INFO mapreduce.Job:  map 33% reduce 0%
> > 16/01/20 02:47:47 INFO mapreduce.Job:  map 50% reduce 0%
> > 16/01/20 02:48:08 INFO mapreduce.Job:  map 83% reduce 0%
> > 16/01/20 02:48:16 INFO mapreduce.Job:  map 100% reduce 0%
> > 16/01/20 02:48:31 INFO mapreduce.Job:  map 100% reduce 31%
> > 16/01/20 02:48:34 INFO mapreduce.Job:  map 100% reduce 33%
> > 16/01/20 02:50:30 INFO mapreduce.Job:  map 100% reduce 34%
> > 16/01/20 03:01:18 INFO mapreduce.Job:  map 100% reduce 35%
> > 16/01/20 03:11:58 INFO mapreduce.Job:  map 100% reduce 36%
> > 16/01/20 03:22:50 INFO mapreduce.Job:  map 100% reduce 37%
> > 16/01/20 03:24:22 INFO mapreduce.Job:  map 100% reduce 50%
> > 16/01/20 03:24:35 INFO mapreduce.Job:  map 100% reduce 82%
> > 16/01/20 03:24:38 INFO mapreduce.Job:  map 100% reduce 83%
> > 16/01/20 03:26:33 INFO mapreduce.Job:  map 100% reduce 84%
> > 16/01/20 03:37:35 INFO mapreduce.Job:  map 100% reduce 85%
> > 16/01/20 03:39:38 INFO mapreduce.Job: Task Id : 
> > attempt_1453210838763_0011_r_000001_0, Status : FAILED
> > *Error: java.lang.IllegalArgumentException: Row length 41221 is > 32767*
> >     at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:506)
> >     at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:487)
> >     at org.apache.hadoop.hbase.client.Get.<init>(Get.java:89)
> >     at org.apache.gora.hbase.store.HBaseStore.get(HBaseStore.java:208)
> >     at org.apache.gora.hbase.store.HBaseStore.get(HBaseStore.java:79)
> >     at 
> > org.apache.gora.store.impl.DataStoreBase.get(DataStoreBase.java:156)
> >     at org.apache.gora.store.impl.DataStoreBase.get(DataStoreBase.java:56)
> >     at 
> > org.apache.nutch.crawl.DbUpdateReducer.reduce(DbUpdateReducer.java:114)
> >     at 
> > org.apache.nutch.crawl.DbUpdateReducer.reduce(DbUpdateReducer.java:42)
> >     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> >     at 
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> >     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at 
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> >     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> >
> > 16/01/20 03:39:39 INFO mapreduce.Job:  map 100% reduce 50%
> > 16/01/20 03:39:52 INFO mapreduce.Job:  map 100% reduce 82%
> > 16/01/20 03:39:55 INFO mapreduce.Job:  map 100% reduce 83%
> > 16/01/20 03:41:56 INFO mapreduce.Job:  map 100% reduce 84%
> > 16/01/20 03:53:39 INFO mapreduce.Job:  map 100% reduce 85%
> > 16/01/20 03:55:49 INFO mapreduce.Job: Task Id : 
> > attempt_1453210838763_0011_r_000001_1, Status : FAILED
> > *Error: java.lang.IllegalArgumentException: Row length 41221 is > 32767*
> >     at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:506)
> >     at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:487)
> >     at org.apache.hadoop.hbase.client.Get.<init>(Get.java:89)
> >     at org.apache.gora.hbase.store.HBaseStore.get(HBaseStore.java:208)
> >     at org.apache.gora.hbase.store.HBaseStore.get(HBaseStore.java:79)
> >     at 
> > org.apache.gora.store.impl.DataStoreBase.get(DataStoreBase.java:156)
> >     at org.apache.gora.store.impl.DataStoreBase.get(DataStoreBase.java:56)
> >     at 
> > org.apache.nutch.crawl.DbUpdateReducer.reduce(DbUpdateReducer.java:114)
> >     at 
> > org.apache.nutch.crawl.DbUpdateReducer.reduce(DbUpdateReducer.java:42)
> >     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> >     at 
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> >     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at 
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> >     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> >
> > 16/01/20 03:55:50 INFO mapreduce.Job:  map 100% reduce 50%
> > 16/01/20 03:56:01 INFO mapreduce.Job:  map 100% reduce 83%
> > 16/01/20 03:58:02 INFO mapreduce.Job:  map 100% reduce 84%
> > 16/01/20 04:10:09 INFO mapreduce.Job:  map 100% reduce 85%
> > 16/01/20 04:12:33 INFO mapreduce.Job: Task Id : 
> > attempt_1453210838763_0011_r_000001_2, Status : FAILED
> > *Error: java.lang.IllegalArgumentException: Row length 41221 is > 32767*
> >     at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:506)
> >     at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:487)
> >     at org.apache.hadoop.hbase.client.Get.<init>(Get.java:89)
> >     at org.apache.gora.hbase.store.HBaseStore.get(HBaseStore.java:208)
> >     at org.apache.gora.hbase.store.HBaseStore.get(HBaseStore.java:79)
> >     at 
> > org.apache.gora.store.impl.DataStoreBase.get(DataStoreBase.java:156)
> >     at org.apache.gora.store.impl.DataStoreBase.get(DataStoreBase.java:56)
> >     at 
> > org.apache.nutch.crawl.DbUpdateReducer.reduce(DbUpdateReducer.java:114)
> >     at 
> > org.apache.nutch.crawl.DbUpdateReducer.reduce(DbUpdateReducer.java:42)
> >     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> >     at 
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> >     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at 
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> >     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> >
> > 16/01/20 04:12:34 INFO mapreduce.Job:  map 100% reduce 50%
> > 16/01/20 04:12:45 INFO mapreduce.Job:  map 100% reduce 82%
> > 16/01/20 04:12:48 INFO mapreduce.Job:  map 100% reduce 83%
> > 16/01/20 04:14:46 INFO mapreduce.Job:  map 100% reduce 84%
> > 16/01/20 04:26:53 INFO mapreduce.Job:  map 100% reduce 85%
> > 16/01/20 04:29:09 INFO mapreduce.Job:  map 100% reduce 100%
> > 16/01/20 04:29:10 INFO mapreduce.Job: Job job_1453210838763_0011 
> > failed with state FAILED due to: Task failed 
> > task_1453210838763_0011_r_000001
> > Job failed as tasks failed. failedMaps:0 failedReduces:1
> >
> > 16/01/20 04:29:11 INFO mapreduce.Job: Counters: 50
> >     File System Counters
> >         FILE: Number of bytes read=38378343
> >         FILE: Number of bytes written=115957636
> >         FILE: Number of read operations=0
> >         FILE: Number of large read operations=0
> >         FILE: Number of write operations=0
> >         HDFS: Number of bytes read=2382
> >         HDFS: Number of bytes written=0
> >         HDFS: Number of read operations=2
> >         HDFS: Number of large read operations=0
> >         HDFS: Number of write operations=0
> >     Job Counters
> >         Failed reduce tasks=4
> >         Launched map tasks=2
> >         Launched reduce tasks=5
> >         Data-local map tasks=2
> >         Total time spent by all maps in occupied slots (ms)=789909
> >         Total time spent by all reduces in occupied slots (ms)=30215090
> >         Total time spent by all map tasks (ms)=263303
> >         Total time spent by all reduce tasks (ms)=6043018
> >         Total vcore-seconds taken by all map tasks=263303
> >         Total vcore-seconds taken by all reduce tasks=6043018
> >         Total megabyte-seconds taken by all map tasks=808866816
> >         Total megabyte-seconds taken by all reduce tasks=30940252160
> >     Map-Reduce Framework
> >         Map input records=49929
> >         Map output records=1777904
> >         Map output bytes=382773368
> >         Map output materialized bytes=77228942
> >         Input split bytes=2382
> >         Combine input records=0
> >         Combine output records=0
> >         Reduce input groups=754170
> >         Reduce shuffle bytes=38318183
> >         Reduce input records=881156
> >         Reduce output records=754170
> >         Spilled Records=2659060
> >         Shuffled Maps =2
> >         Failed Shuffles=0
> >         Merged Map outputs=2
> >         GC time elapsed (ms)=17993
> >         CPU time spent (ms)=819690
> >         Physical memory (bytes) snapshot=4080136192
> >         Virtual memory (bytes) snapshot=15234293760
> >         Total committed heap usage (bytes)=4149739520
> >     Shuffle Errors
> >         BAD_ID=0
> >         CONNECTION=0
> >         IO_ERROR=0
> >         WRONG_LENGTH=0
> >         WRONG_MAP=0
> >         WRONG_REDUCE=0
> >     File Input Format Counters
> >         Bytes Read=0
> >     File Output Format Counters
> >         Bytes Written=0
> > Exception in thread "main" java.lang.RuntimeException: job failed: 
> > name=[1]update-table, jobid=job_1453210838763_0011
> >     at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
> >     at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:111)
> >     at 
> > org.apache.nutch.crawl.DbUpdaterJob.updateTable(DbUpdaterJob.java:140)
> >     at org.apache.nutch.crawl.DbUpdaterJob.run(DbUpdaterJob.java:174)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >     at org.apache.nutch.crawl.DbUpdaterJob.main(DbUpdaterJob.java:178)
> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >     at 
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >     at 
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >     at java.lang.reflect.Method.invoke(Method.java:497)
> >     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> > Error running:
> > /usr/share/searchEngine/nutch-branch-2.3.1/runtime/deploy/bin/nutch 
> > updatedb -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m 
> > -D mapred.reduce.tasks.speculative.execution=false -D 
> > mapred.map.tasks.speculative.execution=false -D 
> > mapred.compress.map.output=true 1453230757-13191 -crawlId 1
> > Failed with exit value 1.
> > ****************************************************LOG 
> > END************************************************************************************************
> >
> > -- 
> >
> > Please let me know if you have any questions , concerns or updates.
> > Have a great day ahead :)
> >
> > Thanks and Regards,
> >
> > Kshitij Shukla
> > Software developer
> >
> > *Cyber Infrastructure(CIS)
> > **/The RightSourcing Specialists with 1250 man years of experience!/*
> >
> > DISCLAIMER:  INFORMATION PRIVACY is important for us, If you are not 
> > the intended recipient, you should delete this message and are 
> > notified that any disclosure, copying or distribution of this message, 
> > or taking any action based on it, is strictly prohibited by Law.
> >
> > Please don't print this e-mail unless you really need to.
> 
> 
> -- 
> 
> Please let me know if you have any questions , concerns or updates.
> Have a great day ahead :)
> 
> Thanks and Regards,
> 
> Kshitij Shukla
> Software developer
> 
> *Cyber Infrastructure(CIS)
> **/The RightSourcing Specialists with 1250 man years of experience!/*
> 
> DISCLAIMER:  INFORMATION PRIVACY is important for us, If you are not the 
> intended recipient, you should delete this message and are notified that 
> any disclosure, copying or distribution of this message, or taking any 
> action based on it, is strictly prohibited by Law.
> 
> Please don't print this e-mail unless you really need to.
> 
> -- 
> 
> ------------------------------
> 
> *Cyber Infrastructure (P) Limited, [CIS] **(CMMI Level 3 Certified)*
> 
> Central India's largest Technology company.
> 
> *Ensuring the success of our clients and partners through our highly 
> optimized Technology solutions.*
> 
> www.cisin.com | +Cisin <https://plus.google.com/+Cisin/> | Linkedin 
> <https://www.linkedin.com/company/cyber-infrastructure-private-limited> | 
> Offices: *Indore, India.* *Singapore. Silicon Valley, USA*.
> 
> DISCLAIMER:  INFORMATION PRIVACY is important for us, If you are not the 
> intended recipient, you should delete this message and are notified that 
> any disclosure, copying or distribution of this message, or taking any 
> action based on it, is strictly prohibited by Law.
> 

Reply via email to