Ted, many thanks.  I'm not used to Java dependencies so this was a real
head-scratcher for me.

Downloading the two metrics packages from the maven repository
(metrics-core, metrics-annotation) and supplying it on the spark-submit
command line worked.

My final spark-submit for a python project using Kafka as an input source:

/home/ubuntu/spark/spark-1.3.1/bin/spark-submit \
    --packages
TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1
\
    --jars
/home/ubuntu/jars/metrics-core-2.2.0.jar,/home/ubuntu/jars/metrics-annotation-2.2.0.jar
\
    --conf
spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \
    --master spark://127.0.0.1:7077 \
    affected_hosts.py

Now we're seeing data from the stream.  Thanks again!

On Mon, May 11, 2015 at 2:43 PM Sean Owen <so...@cloudera.com> wrote:

> Ah yes, the Kafka + streaming code isn't in the assembly, is it? you'd
> have to provide it and all its dependencies with your app. You could
> also build this into your own app jar. Tools like Maven will add in
> the transitive dependencies.
>
> On Mon, May 11, 2015 at 10:04 PM, Lee McFadden <splee...@gmail.com> wrote:
> > Thanks Ted,
> >
> > The issue is that I'm using packages (see spark-submit definition) and I
> do
> > not know how to add com.yammer.metrics:metrics-core to my classpath so
> Spark
> > can see it.
> >
> > Should metrics-core not be part of the
> > org.apache.spark:spark-streaming-kafka_2.10:1.3.1 package so it can work
> > correctly?
> >
> > If not, any clues as to how I can add metrics-core to my project
> (bearing in
> > mind that I'm using Python, not a JVM language) would be much
> appreciated.
> >
> > Thanks, and apologies for my newbness with Java/Scala.
> >
> > On Mon, May 11, 2015 at 1:42 PM Ted Yu <yuzhih...@gmail.com> wrote:
> >>
> >> com.yammer.metrics.core.Gauge is in metrics-core jar
> >> e.g., in master branch:
> >> [INFO] |  \- org.apache.kafka:kafka_2.10:jar:0.8.1.1:compile
> >> [INFO] |     +- com.yammer.metrics:metrics-core:jar:2.2.0:compile
> >>
> >> Please make sure metrics-core jar is on the classpath.
> >>
> >> On Mon, May 11, 2015 at 1:32 PM, Lee McFadden <splee...@gmail.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We've been having some issues getting spark streaming running correctly
> >>> using a Kafka stream, and we've been going around in circles trying to
> >>> resolve this dependency.
> >>>
> >>> Details of our environment and the error below, if anyone can help
> >>> resolve this it would be much appreciated.
> >>>
> >>> Submit command line:
> >>>
> >>> /home/ubuntu/spark/spark-1.3.1/bin/spark-submit \
> >>>     --packages
> >>>
> TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1
> >>> \
> >>>     --conf
> >>>
> spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \
> >>>     --master spark://127.0.0.1:7077 \
> >>>     affected_hosts.py
> >>>
> >>> When we run the streaming job everything starts just fine, then we see
> >>> the following in the logs:
> >>>
> >>> 15/05/11 19:50:46 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID
> >>> 70, ip-10-10-102-53.us-west-2.compute.internal):
> >>> java.lang.NoClassDefFoundError: com/yammer/metrics/core/Gauge
> >>>         at
> >>>
> kafka.consumer.ZookeeperConsumerConnector.createFetcher(ZookeeperConsumerConnector.scala:151)
> >>>         at
> >>>
> kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:115)
> >>>         at
> >>>
> kafka.consumer.ZookeeperConsumerConnector.<init>(ZookeeperConsumerConnector.scala:128)
> >>>         at kafka.consumer.Consumer$.create(ConsumerConnector.scala:89)
> >>>         at
> >>>
> org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
> >>>         at
> >>>
> org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)
> >>>         at
> >>>
> org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)
> >>>         at
> >>>
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:298)
> >>>         at
> >>>
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:290)
> >>>         at
> >>>
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
> >>>         at
> >>>
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
> >>>         at
> >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> >>>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
> >>>         at
> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> >>>         at
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >>>         at
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >>>         at java.lang.Thread.run(Thread.java:745)
> >>> Caused by: java.lang.ClassNotFoundException:
> >>> com.yammer.metrics.core.Gauge
> >>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> >>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> >>>         at java.security.AccessController.doPrivileged(Native Method)
> >>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> >>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>>         ... 17 more
> >>>
> >>>
> >>
> >
>

Reply via email to