Thunder,
thanks for your reply. The hadoop job is now correctly configured (the
client was not getting the correct jars), however I am getting Avro
formatting exceptions due to the format the schema-repo server follows. I
think I will do something similar and create our own branch that uses the
schema repo. Any gotchas you can advice on?
Thanks!
Max
On Wed, Mar 4, 2015 at 9:24 PM, Thunder Stumpges tstump...@ntent.com
wrote:
What branch of camus are you using? We have our own fork that we updated
the camus dependency from the avro snapshot of the REST Schema Repository
to the new official one you mention in github.com/schema-repo. I was
not aware of a branch on the main linked-in camus repo that has this.
That being said, we are doing essentially this same thing however we are
using a single shaded uber-jar. I believe the maven project builds this
automatically doesnt it?
I'll take a look at the details of how we are invoking this on our site
and get back to you.
Cheers,
Thunder
-Original Message-
From: max square [max2subscr...@gmail.com]
Received: Wednesday, 04 Mar 2015, 5:38PM
To: users@kafka.apache.org [users@kafka.apache.org]
Subject: Trying to get kafka data to Hadoop
Hi all,
I have browsed through different conversations around Camus, and bring this
as a kinda Kafka question. I know is not the most orthodox, but if someone
has some thoughts I'd appreciate ir.
That said, I am trying to set up Camus, using a 3 node Kafka cluster
0.8.2.1, using a project that is trying to build Avro Schema-Repo
https://github.com/schema-repo/schema-repo. All of the Avro schemas for
the topics are published correctly. I am building Camus and using:
hadoop jar camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl.
kafka.CamusJob -libjars $CAMUS_LIBJARS -D mapreduce.job.user.classpath.
first=true -P config.properties
As the command to start the job, where I have set up an environment
variable that holds all the libjars that the mvn package command generates.
I have also set the following properties to configure the job:
camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.
LatestSchemaKafkaAvroMessageDecoder
kafka.message.coder.schema.registry.class=com.linkedin.camus.schemaregistry.
AvroRestSchemaRegistry
etl.schema.registry.url=http://10.0.14.25:2876/schema-repo/
When I execute the job I get an Exception indicating the
AvroRestSchemaRegistry class can't be found (I've double checked it's part
of the libjars). I wanted to ask if this is the correct way to set up this
integration, and if anyone has pointers on why the job is not finding the
class AvroRestSchemaRegistry
Thanks in advance for the help!
Max
Follows the complete stack trace:
[CamusJob] - failed to create decoder
com.linkedin.camus.coders.MessageDecoderException:
com.linkedin.camus.coders
.MessageDecoderException:java.lang.ClassNotFoundException: com.linkedin.
camus.schemaregistry.AvroRestSchemaRegistry
at com.linkedin.camus.etl.kafka.coders.MessageDecoderFactory.
createMessageDecoder(MessageDecoderFactory.java:29)
at
com.linkedin.camus.etl.kafka.mapred.EtlInputFormat.createMessageDecoder
(EtlInputFormat.java:391)
at com.linkedin.camus.etl.kafka.mapred.EtlInputFormat.getSplits(
EtlInputFormat.java:256)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:
1107)
at
org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1124
)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:178)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:1023)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:976)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(
UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.
java:976)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:335)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:563)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:518)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodessorImpl.
java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.