Re: Structured Streaming Spark 3.0.1

2021-01-21 Thread Gabor Somogyi
The most interesting part is that you've added this: kafka-clients-0.10.2.2.jar Spark 3.0.1 uses Kafka clients 2.4.1. Downgrading with such a big step doesn't help. Please remove that also togrther w/ Spark-Kafka dependency. G On Thu, 21 Jan 2021, 22:45 gshen, wrote: > Thanks for the tips! > >

Re: Structured Streaming Spark 3.0.1

2021-01-21 Thread gshen
Thanks for the tips! I think I figured out what might be causing it. It's the checkpointing to Microsoft Azure Data Lake Storage (ADLS). When I use "local checkpointing" it works, but when i use fails when there's a groupBy in the stream. Weirdly it works when there is no groupBy clause in the st

Re: Structured Streaming Spark 3.0.1

2021-01-21 Thread Jungtaek Lim
Looks like it's a driver side error log, and I think executor log would have much more warning/error logs and probably with stack traces. I'd also suggest excluding the external dependency whatever possible while experimenting/investigating. If you're suspecting Apache Spark I'd rather say you'll

Re: Structured Streaming Spark 3.0.1

2021-01-21 Thread gshen
I am now testing with to stream into a Delta table. Interestingly I have gotten it working within a community version of Databricks, which leads me to think there might be something to do with my dependencies. I am checkpointing to ADLS Gen2 adding the following dependencies: delta-core_2.12-0.7.0

Re: Structured Streaming Spark 3.0.1

2021-01-21 Thread Gabor Somogyi
If you have an exact version which version of the google connector is used then the source can be checked to see what really happened: https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/83a6c9809ad49a44895d59558e666e5fc183e0bf/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoop

Re: Structured Streaming Spark 3.0.1

2021-01-21 Thread Gabor Somogyi
I've doubled checked this and came to the same conclusion just like Jungtaek. I've added a comment to the stackoverflow post to reach more poeple with the answer. G On Thu, Jan 21, 2021 at 6:53 AM Jungtaek Lim wrote: > I quickly looked into the attached log in SO post, and the problem doesn't

Re: Structured Streaming Spark 3.0.1

2021-01-20 Thread Jungtaek Lim
I quickly looked into the attached log in SO post, and the problem doesn't seem to be related to Kafka. The error stack trace is from checkpointing to GCS, and the implementation of OutputStream for GCS seems to be provided with Google. Could you please elaborate the stack trace or upload the log

Re: Structured Streaming Spark 3.0.1

2021-01-20 Thread German Schiavon
Hi, I couldn't reproduce this error :/ I wonder if there is something else underline causing it... *Input* ➜ kafka_2.12-2.5.0 ./bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test1 {"name": "pedro", "age": 50} >{"name": "pedro", "age": 50} >{"name": "pedro", "age": 50} >

Re: Structured Streaming Spark 3.0.1

2021-01-20 Thread gshen
This SO post is pretty much the exact same issue: https://stackoverflow.com/questions/59962680/spark-structured-streaming-error-while-sending-aggregated-result-to-kafka-topic The user mentions it's an issue with org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.4 -- Sent from: http://apache-spark