Re: PySpark, Structured Streaming and Kafka

2017-08-24 Thread Brian Wylie
Resolved :) Hi just a loopback on this (thanks for everyone's help). In jupyter notebook the following command works and properly loads in the Kafka jar files. # Spin up a local Spark Session spark = SparkSession.builder.appName('my_awesome')\ .config('spark.jars.packages',

Re: PySpark, Structured Streaming and Kafka

2017-08-23 Thread Brian Wylie
Shixiong, Your suggestion works if I use the pyspark-shell directly. In this case I want to setup a Spark Session from within my Jupyter Notebook. My question/issue is related to this SO question: https://stackoverflow.com/questions/35762459/add-jar-to-standalone-pyspark so basically I want to

Re: PySpark, Structured Streaming and Kafka

2017-08-23 Thread Shixiong(Ryan) Zhu
You can use `bin/pyspark --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0` to start "pyspark". If you want to use "spark-submit", you also need to provide your Python file. On Wed, Aug 23, 2017 at 1:41 PM, Brian Wylie wrote: > Hi All, > > I'm trying the new

Re: PySpark, Structured Streaming and Kafka

2017-08-23 Thread Riccardo Ferrari
Hi Brian, Very nice work you have done! WRT you issue: Can you clarify how are you adding the kafka dependency when using Jupyter? The ClassNotFoundException really tells you about the missing dependency. A bit different is the IllegalArgumentException error, that is simply because you are not

PySpark, Structured Streaming and Kafka

2017-08-23 Thread Brian Wylie
Hi All, I'm trying the new hotness of using Kafka and Structured Streaming. Resources that I've looked at - https://spark.apache.org/docs/latest/streaming-programming-guide.html - https://databricks.com/blog/2016/07/28/structured-streaming- in-apache-spark.html -