Hello List,
i'm new to nutch and i'm trying to run nutch with cassandra.
To this point i've set it up so far that i can inject a seed list and
generate a list of URLs to crawl next.
The Database Tables webpage.f and webpage.sc have content.
But if i try to fetch these URLs it ends up with the following error:
016-02-15 18:39:43,817 INFO fetcher.FetcherJob - -activeThreads=0
2016-02-15 18:39:43,893 WARN mapreduce.GoraRecordWriter - Exception at
GoraRecordWriter.class while closing datastore:
InvalidRequestException(why:supercolumn parameter is not optional for super
CF sc)
2016-02-15 18:39:43,893 WARN mapreduce.GoraRecordWriter - Trace:
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
2016-02-15 18:39:43,895 WARN mapred.LocalJobRunner -
job_local421637962_0001
java.lang.Exception: java.lang.RuntimeException:
me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:supercolumn parameter is not optional for super
CF sc)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException:
me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:supercolumn parameter is not optional for super
CF sc)
at
org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:60)
at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: me.prettyprint.hector.api.exceptions.HInvalidRequestException:
InvalidRequestException(why:supercolumn parameter is not optional for super
CF sc)
at
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:260)
at
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
at
me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at
me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:69)
at
org.apache.gora.cassandra.store.HectorUtils.insertColumn(HectorUtils.java:46)
at
org.apache.gora.cassandra.store.CassandraClient.addColumn(CassandraClient.java:293)
at
org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:513)
at
org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:599)
at
org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:316)
at
org.apache.gora.cassandra.store.CassandraStore.close(CassandraStore.java:160)
at
org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:56)
... 9 more
Caused by: InvalidRequestException(why:supercolumn parameter is not
optional for super CF sc)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result$batch_mutate_resultStandardScheme.read(Cassandra.java:28082)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result$batch_mutate_resultStandardScheme.read(Cassandra.java:28068)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:28002)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1060)
at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1046)
at
me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
at
me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
at
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:104)
at
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:253)
... 19 more
I've founf this Ticket at GORA Jira
https://issues.apache.org/jira/browse/GORA-416
but i don't know whats the resolution. A Update to gora-7-snapshot results
in a compiling error when i'm doing ant runtime.
Have used the last release and also the code checkout from svn.
Has anybody an advice for me how i can get nutch run with cassandra?
Thanks!
Micha