Thanks for the tips!
I think I figured out what might be causing it. It's the checkpointing to
Microsoft Azure Data Lake Storage (ADLS).
When I use "local checkpointing" it works, but when i use fails when there's
a groupBy in the stream. Weirdly it works when there is no groupBy clause in
the
I am now testing with to stream into a Delta table. Interestingly I have
gotten it working within a community version of Databricks, which leads me
to think there might be something to do with my dependencies. I am
checkpointing to ADLS Gen2 adding the following dependencies:
This SO post is pretty much the exact same issue:
https://stackoverflow.com/questions/59962680/spark-structured-streaming-error-while-sending-aggregated-result-to-kafka-topic
The user mentions it's an issue with
org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.4
--
Sent from:
Hi all:
I am having a strange issue incorporating `groupBy` statements into a
structured streaming job when trying to write to Kafka or Delta. Weirdly it
only appears to work if I write to console, or to memory...
*I'm running Spark 3.0.1 with the following dependencies:
*