Thanks guys. On Wed, Sep 23, 2015 at 3:54 PM, Tathagata Das <t...@databricks.com> wrote:
> SPARK_CLASSPATH is I believe deprecated right now. So I am not surprised > that there is some difference in the code paths. > > On Tue, Sep 22, 2015 at 9:45 PM, Saisai Shao <sai.sai.s...@gmail.com> > wrote: > >> I think it is something related to class loader, the behavior is >> different for classpath and --jars. If you want to know the details I think >> you'd better dig out some source code. >> >> Thanks >> Jerry >> >> On Tue, Sep 22, 2015 at 9:10 PM, ayan guha <guha.a...@gmail.com> wrote: >> >>> I must have been gone mad :) Thanks for pointing it out. I downloaded >>> 1.5.0 assembly jar and added it in SPARK_CLASSPATH. >>> >>> However, I am getting a new error now >>> >>> >>> kvs = >>> KafkaUtils.createDirectStream(ssc,['spark'],{"metadata.broker.list":'l >>> ocalhost:9092'}) >>> >>> >>> ________________________________________________________________________________ >>> ________________ >>> >>> Spark Streaming's Kafka libraries not found in class path. Try one of >>> the foll >>> owing. >>> >>> 1. Include the Kafka library and its dependencies with in the >>> spark-submit command as >>> >>> $ bin/spark-submit --packages >>> org.apache.spark:spark-streaming-kafka:1.5.0 >>> ... >>> >>> 2. Download the JAR of the artifact from Maven Central >>> http://search.maven.org >>> /, >>> Group Id = org.apache.spark, Artifact Id = >>> spark-streaming-kafka-assembly, >>> Version = 1.5.0. >>> Then, include the jar in the spark-submit command as >>> >>> $ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ... >>> >>> >>> ________________________________________________________________________________ >>> ________________ >>> >>> >>> >>> Traceback (most recent call last): >>> File "<stdin>", line 1, in <module> >>> File >>> "D:\sw\spark-1.5.0-bin-hadoop2.6\spark-1.5.0-bin-hadoop2.6\python\pyspark >>> \streaming\kafka.py", line 130, in createDirectStream >>> raise e >>> py4j.protocol.Py4JJavaError: An error occurred while calling >>> o30.loadClass. >>> : java.lang.ClassNotFoundException: >>> org.apache.spark.streaming.kafka.KafkaUtilsP >>> ythonHelper >>> at java.net.URLClassLoader.findClass(Unknown Source) >>> at java.lang.ClassLoader.loadClass(Unknown Source) >>> at java.lang.ClassLoader.loadClass(Unknown Source) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown >>> Source) >>> at java.lang.reflect.Method.invoke(Unknown Source) >>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) >>> at >>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) >>> at py4j.Gateway.invoke(Gateway.java:259) >>> at >>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) >>> at py4j.commands.CallCommand.execute(CallCommand.java:79) >>> at py4j.GatewayConnection.run(GatewayConnection.java:207) >>> at java.lang.Thread.run(Unknown Source) >>> >>> >>> os.environ['SPARK_CLASSPATH'] >>> 'D:\\sw\\spark-streaming-kafka-assembly_2.10-1.5.0' >>> >>> >>> >>> >>> So I launched pyspark with --jars with the assembly jar. Now it is >>> working. >>> >>> THANK YOU for help. >>> >>> Curiosity: Why adding it to SPARK CLASSPATH did not work? >>> >>> Best >>> Ayan >>> >>> On Wed, Sep 23, 2015 at 2:25 AM, Saisai Shao <sai.sai.s...@gmail.com> >>> wrote: >>> >>>> I think you're using the wrong version of kafka assembly jar, I think >>>> Python API from direct Kafka stream is not supported for Spark 1.3.0, you'd >>>> better change to version 1.5.0, looks like you're using Spark 1.5.0, why >>>> you choose Kafka assembly 1.3.0? >>>> >>>> >>>> D:\sw\spark-streaming-kafka-assembly_2.10-1.3.0.jar >>>> >>>> >>>> >>>> On Tue, Sep 22, 2015 at 6:41 AM, ayan guha <guha.a...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> I have added spark assembly jar to SPARK CLASSPATH >>>>> >>>>> >>> print os.environ['SPARK_CLASSPATH'] >>>>> D:\sw\spark-streaming-kafka-assembly_2.10-1.3.0.jar >>>>> >>>>> >>>>> Now I am facing below issue with a test topic >>>>> >>>>> >>> ssc = StreamingContext(sc, 2) >>>>> >>> kvs = >>>>> KafkaUtils.createDirectStream(ssc,['spark'],{"metadata.broker.list":'l >>>>> ocalhost:9092'}) >>>>> Traceback (most recent call last): >>>>> File "<stdin>", line 1, in <module> >>>>> File >>>>> "D:\sw\spark-1.5.0-bin-hadoop2.6\spark-1.5.0-bin-hadoop2.6\python\pyspark >>>>> \streaming\kafka.py", line 126, in createDirectStream >>>>> jstream = helper.createDirectStream(ssc._jssc, kafkaParams, >>>>> set(topics), jfr >>>>> omOffsets) >>>>> File >>>>> "D:\sw\spark-1.5.0-bin-hadoop2.6\spark-1.5.0-bin-hadoop2.6\python\lib\py4 >>>>> j-0.8.2.1-src.zip\py4j\java_gateway.py", line 538, in __call__ >>>>> File >>>>> "D:\sw\spark-1.5.0-bin-hadoop2.6\spark-1.5.0-bin-hadoop2.6\python\pyspark >>>>> \sql\utils.py", line 36, in deco >>>>> return f(*a, **kw) >>>>> File >>>>> "D:\sw\spark-1.5.0-bin-hadoop2.6\spark-1.5.0-bin-hadoop2.6\python\lib\py4 >>>>> j-0.8.2.1-src.zip\py4j\protocol.py", line 304, in get_return_value >>>>> py4j.protocol.Py4JError: An error occurred while calling >>>>> o22.createDirectStream. >>>>> Trace: >>>>> py4j.Py4JException: Method createDirectStream([class >>>>> org.apache.spark.streaming. >>>>> api.java.JavaStreamingContext, class java.util.HashMap, class >>>>> java.util.HashSet, >>>>> class java.util.HashMap]) does not exist >>>>> at >>>>> py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) >>>>> >>>>> at >>>>> py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) >>>>> >>>>> at py4j.Gateway.invoke(Gateway.java:252) >>>>> at >>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) >>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79) >>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207) >>>>> at java.lang.Thread.run(Unknown Source) >>>>> >>>>> >>>>> >>> >>>>> >>>>> Am I doing something wrong? >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Ayan Guha >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> > -- Best Regards, Ayan Guha