Hi there, from my apache spark streaming website (see links below),
- the batch-interval is set when a spark StreamingContext is constructed (see example (a) quoted below) - the StreamingContext is available in older and new Spark version (v1.6, v2.2 to v2.3.0) (see https://spark.apache.org/docs/1.6.0/streaming-programming-guide.html and https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html ) - however, example (b) below doesn't use StreamingContext, but StreamingSession object to setup a streaming flow; What does the usage difference in (a) and (b) mean? I was wondering if this would mean a different streaming approach ("traditional" streaming vs structured streaming? Basically I need to find a way to set the batch-interval in (b), similar as in (a) below. Would be great if someone can please share some insights here. Thanks! Peter (a) https://spark.apache.org/docs/2.3.0/streaming-programming-guide.html ) import org.apache.spark._import org.apache.spark.streaming._ val conf = new SparkConf().setAppName(appName).setMaster(master)val *ssc *= new StreamingContext(conf, Seconds(1)) (b) ( from databricks' https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html ) val *spark *= SparkSession.builder() .appName(appName) .getOrCreate() ... jsonOptions = { "timestampFormat": nestTimestampFormat } parsed = *spark *\ .readStream \ .format("kafka") \ .option("kafka.bootstrap.servers", "localhost:9092") \ .option("subscribe", "nest-logs") \ .load() \ .select(from_json(col("value").cast("string"), schema, jsonOptions).alias("parsed_value"))