re: help with streaming batch interval question needed

Peter Liu Thu, 24 May 2018 13:16:03 -0700

 Hi there,

from my apache spark streaming website (see links below),


   - the batch-interval is set when a spark StreamingContext is constructed
   (see example (a) quoted below)
   - the StreamingContext is available in older and new Spark version
   (v1.6, v2.2 to v2.3.0) (see
   https://spark.apache.org/docs/1.6.0/streaming-programming-guide.html
   and https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html
   )
   - however, example (b) below  doesn't use StreamingContext, but
   StreamingSession object to setup a streaming flow;

What does the usage difference in (a) and (b) mean? I was wondering if this
would mean a different streaming approach ("traditional" streaming vs
structured streaming?

Basically I need to find a way to set the batch-interval in (b), similar as
in (a) below.

Would be great if someone can please share some insights here.

Thanks!

Peter

(a)
https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html )

import org.apache.spark._import org.apache.spark.streaming._
val conf = new SparkConf().setAppName(appName).setMaster(master)val
*ssc *= new StreamingContext(conf, Seconds(1))


(b)
( from databricks'
https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html
)

   val *spark *= SparkSession.builder()
        .appName(appName)
      .getOrCreate()
...

jsonOptions = { "timestampFormat": nestTimestampFormat }
parsed = *spark *\
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "localhost:9092") \
  .option("subscribe", "nest-logs") \
  .load() \
  .select(from_json(col("value").cast("string"), schema,
jsonOptions).alias("parsed_value"))

re: help with streaming batch interval question needed

Reply via email to