Hi Raymond run this command and it should work, provided you have kafka setup a s well on localhost at port 2181
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.1 kafka_wordcount.py localhost:2181 test But i suggest, if you are a beginner, to use Spark examples' wordcount instead, as i believe it reads from a local directory rather than setting up kafka , which is an additional overhead you dont really need If you want to go ahead with Kafka, the two links below can give you a start https://dzone.com/articles/running-apache-kafka-on-windows-os (i believe similar setup can be used on Linux) https://spark.apache.org/docs/latest/streaming-kafka-integration.html kr On Sat, Feb 25, 2017 at 11:12 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi I have a look. At GitHub project tomorrow and let u know. U have a py > scripts to run and dependencies to specify.. pls check spark docs in > meantime...I do all my coding in Scala and specify dependencies using > --packages. <groupid>:<packageid>:<version>. > Kr > > On 25 Feb 2017 11:06 pm, "Raymond Xie" <xie3208...@gmail.com> wrote: > >> Thank you very much Marco, >> >> I am a beginner in this area, is it possible for you to show me what you >> think the right script should be to get it executed in terminal? >> >> >> *------------------------------------------------* >> *Sincerely yours,* >> >> >> *Raymond* >> >> On Sat, Feb 25, 2017 at 6:00 PM, Marco Mistroni <mmistr...@gmail.com> >> wrote: >> >>> Try to use --packages to include the jars. From error it seems it's >>> looking for main class in jars but u r running a python script....... >>> >>> On 25 Feb 2017 10:36 pm, "Raymond Xie" <xie3208...@gmail.com> wrote: >>> >>> That's right Anahita, however, the class name is not indicated in the >>> original github project so I don't know what class should be used here. The >>> github only says: >>> and then run the example >>> `$ bin/spark-submit --jars \ >>> external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar >>> \ >>> examples/src/main/python/streaming/kafka_wordcount.py \ >>> localhost:2181 test` >>> """ Can anyone give any thought on how to find out? Thank you very much >>> in advance. >>> >>> >>> *------------------------------------------------* >>> *Sincerely yours,* >>> >>> >>> *Raymond* >>> >>> On Sat, Feb 25, 2017 at 5:27 PM, Anahita Talebi < >>> anahita.t.am...@gmail.com> wrote: >>> >>>> You're welcome. >>>> You need to specify the class. I meant like that: >>>> >>>> spark-submit /usr/hdp/2.5.0.0-1245/spark/l >>>> ib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>>> --class "give the name of the class" >>>> >>>> >>>> >>>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> >>>> wrote: >>>> >>>>> Thank you, it is still not working: >>>>> >>>>> [image: Inline image 1] >>>>> >>>>> By the way, here is the original source: >>>>> >>>>> https://github.com/apache/spark/blob/master/examples/src/mai >>>>> n/python/streaming/kafka_wordcount.py >>>>> >>>>> >>>>> *------------------------------------------------* >>>>> *Sincerely yours,* >>>>> >>>>> >>>>> *Raymond* >>>>> >>>>> On Sat, Feb 25, 2017 at 4:48 PM, Anahita Talebi < >>>>> anahita.t.am...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I think if you remove --jars, it will work. Like: >>>>>> >>>>>> spark-submit /usr/hdp/2.5.0.0-1245/spark/l >>>>>> ib/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>>>>> >>>>>> I had the same problem before and solved it by removing --jars. >>>>>> >>>>>> Cheers, >>>>>> Anahita >>>>>> >>>>>> On Saturday, February 25, 2017, Raymond Xie <xie3208...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I am doing a spark streaming on a hortonworks sandbox and am stuck >>>>>>> here now, can anyone tell me what's wrong with the following code and >>>>>>> the >>>>>>> exception it causes and how do I fix it? Thank you very much in advance. >>>>>>> >>>>>>> spark-submit --jars /usr/hdp/2.5.0.0-1245/spark/li >>>>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>>>>>> /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar >>>>>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test >>>>>>> >>>>>>> Error: >>>>>>> No main class set in JAR; please specify one with --class >>>>>>> >>>>>>> >>>>>>> spark-submit --class /usr/hdp/2.5.0.0-1245/spark/li >>>>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>>>>>> /usr/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar >>>>>>> /root/hdp/kafka_wordcount.py 192.168.128.119:2181 test >>>>>>> >>>>>>> Error: >>>>>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/spark/li >>>>>>> b/spark-assembly-1.6.2.2.5.0.0-1245-hadoop2.7.3.2.5.0.0-1245.jar >>>>>>> >>>>>>> spark-submit --class /usr/hdp/2.5.0.0-1245/kafka/l >>>>>>> ibs/kafka-streams-0.10.0.2.5.0.0-1245.jar >>>>>>> /usr/hdp/2.5.0.0-1245/spark/lib/spark-assembly-1.6.2.2.5.0.0 >>>>>>> -1245-hadoop2.7.3.2.5.0.0-1245.jar /root/hdp/kafka_wordcount.py >>>>>>> 192.168.128.119:2181 test >>>>>>> >>>>>>> Error: >>>>>>> java.lang.ClassNotFoundException: /usr/hdp/2.5.0.0-1245/kafka/li >>>>>>> bs/kafka-streams-0.10.0.2.5.0.0-1245.jar >>>>>>> >>>>>>> *------------------------------------------------* >>>>>>> *Sincerely yours,* >>>>>>> >>>>>>> >>>>>>> *Raymond* >>>>>>> >>>>>> >>>>> >>> >>> >>