Re: Structured Streaming Spark 3.0.1

2021-01-21 Thread gshen
Thanks for the tips! I think I figured out what might be causing it. It's the checkpointing to Microsoft Azure Data Lake Storage (ADLS). When I use "local checkpointing" it works, but when i use fails when there's a groupBy in the stream. Weirdly it works when there is no groupBy clause in the

Re: Structured Streaming Spark 3.0.1

2021-01-21 Thread gshen
I am now testing with to stream into a Delta table. Interestingly I have gotten it working within a community version of Databricks, which leads me to think there might be something to do with my dependencies. I am checkpointing to ADLS Gen2 adding the following dependencies:

Re: Structured Streaming Spark 3.0.1

2021-01-20 Thread gshen
This SO post is pretty much the exact same issue: https://stackoverflow.com/questions/59962680/spark-structured-streaming-error-while-sending-aggregated-result-to-kafka-topic The user mentions it's an issue with org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.4 -- Sent from:

Structured Streaming Spark 3.0.1

2021-01-20 Thread gshen
Hi all: I am having a strange issue incorporating `groupBy` statements into a structured streaming job when trying to write to Kafka or Delta. Weirdly it only appears to work if I write to console, or to memory... *I'm running Spark 3.0.1 with the following dependencies: *