Got it Thanks for the clarification TD ! On Thu, 1 Feb 2018 at 11:36 AM, Tathagata Das <tathagata.das1...@gmail.com> wrote:
> The code uses the format "socket" which is only for text sent over a > simple socket, which is completely different from how Twitter APIs works. > So this wont work at all. > Fundamentally, for Structured Streaming, we have focused only on those > streaming sources that have the capabilities record-level tracking offsets > (e.g. Kafka offsets) and replayability in order to give strong exactly-once > fault-tolerance guarantees. Hence we have focused on files, Kafka, Kinesis > (socket is just for testing as is documented). Twitter APIs as a source > does not provide those, hence we have not focused on building one. In > general, for such sources (ones that are not perfectly replayable), there > are two possible solutions. > > 1. Build your own source: A quick google search shows that others in the > community have attempted to build structured-streaming sources for Twitter. > It wont provide the same fault-tolerance guarantees as Kafka, etc. However, > I dont recommend this now because the DataSource APIs to build streaming > sources are not public yet, and are in flux. > > 2. Use Kafka/Kinesis as an intermediate system: Write something simple > that uses Twitter APIs directly to read tweets and write them into > Kafka/Kinesis. And then just read from Kafka/Kinesis. > > Hope this helps. > > TD > > On Wed, Jan 31, 2018 at 7:18 PM, Divya Gehlot <divya.htco...@gmail.com> > wrote: > >> Hi , >> I see ,Does that means Spark structured streaming doesn't work with >> Twitter streams ? >> I could see people used kafka or other streaming tools and used spark to >> process the data in structured streaming . >> >> The below doesn't work directly with Twitter Stream until I set up Kafka >> ? >> >>> import org.apache.spark.sql.SparkSession >>> val spark = SparkSession >>> .builder() >>> .appName("Spark SQL basic example") >>> .config("spark.some.config.option", "some-value") >>> .getOrCreate() >>> // For implicit conversions like converting RDDs to DataFrames >>> import spark.implicits >>>> >>>> / Read text from socket >>> >>> val socketDF = spark >>> >>> .readStream >>> >>> .format("socket") >>> >>> .option("host", "localhost") >>> >>> .option("port", 9999) >>> >>> .load() >>> >>> >>>> socketDF.isStreaming // Returns True for DataFrames that have >>>> streaming sources >>> >>> >>>> socketDF.printSchema >>> >>> >>> >> >> >> Thanks, >> Divya >> >> On 1 February 2018 at 10:30, Tathagata Das <tathagata.das1...@gmail.com> >> wrote: >> >>> Hello Divya, >>> >>> To add further clarification, the Apache Bahir does not have any >>> Structured Streaming support for Twitter. It only has support for Twitter + >>> DStreams. >>> >>> TD >>> >>> >>> >>> On Wed, Jan 31, 2018 at 2:44 AM, vermanurag < >>> anurag.ve...@fnmathlogic.com> wrote: >>> >>>> Twitter functionality is not part of Core Spark. We have successfully >>>> used >>>> the following packages from maven central in past >>>> >>>> org.apache.bahir:spark-streaming-twitter_2.11:2.2.0 >>>> >>>> Earlier there used to be a twitter package under spark, but I find that >>>> it >>>> has not been updated beyond Spark 1.6 >>>> org.apache.spark:spark-streaming-twitter_2.11:1.6.0 >>>> >>>> Anurag >>>> www.fnmathlogic.com >>>> >>>> >>>> >>>> >>>> -- >>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> >>> >> >