The most interesting part is that you've added this:
kafka-clients-0.10.2.2.jar
Spark 3.0.1 uses Kafka clients 2.4.1. Downgrading with such a big step
doesn't help. Please remove that also togrther w/ Spark-Kafka dependency.
G
On Thu, 21 Jan 2021, 22:45 gshen, wrote:
> Thanks for the tips!
>
>
Thanks for the tips!
I think I figured out what might be causing it. It's the checkpointing to
Microsoft Azure Data Lake Storage (ADLS).
When I use "local checkpointing" it works, but when i use fails when there's
a groupBy in the stream. Weirdly it works when there is no groupBy clause in
the st
Looks like it's a driver side error log, and I think executor log would
have much more warning/error logs and probably with stack traces.
I'd also suggest excluding the external dependency whatever possible while
experimenting/investigating. If you're suspecting Apache Spark I'd rather
say you'll
I am now testing with to stream into a Delta table. Interestingly I have
gotten it working within a community version of Databricks, which leads me
to think there might be something to do with my dependencies. I am
checkpointing to ADLS Gen2 adding the following dependencies:
delta-core_2.12-0.7.0
If you have an exact version which version of the google connector is used
then the source can be checked to see what really happened:
https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/83a6c9809ad49a44895d59558e666e5fc183e0bf/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/GoogleHadoop
I've doubled checked this and came to the same conclusion just like
Jungtaek.
I've added a comment to the stackoverflow post to reach more poeple with
the answer.
G
On Thu, Jan 21, 2021 at 6:53 AM Jungtaek Lim
wrote:
> I quickly looked into the attached log in SO post, and the problem doesn't
I quickly looked into the attached log in SO post, and the problem doesn't
seem to be related to Kafka. The error stack trace is from checkpointing to
GCS, and the implementation of OutputStream for GCS seems to be provided
with Google.
Could you please elaborate the stack trace or upload the log
Hi,
I couldn't reproduce this error :/ I wonder if there is something else
underline causing it...
*Input*
➜ kafka_2.12-2.5.0 ./bin/kafka-console-producer.sh --bootstrap-server
localhost:9092 --topic test1
{"name": "pedro", "age": 50}
>{"name": "pedro", "age": 50}
>{"name": "pedro", "age": 50}
>
This SO post is pretty much the exact same issue:
https://stackoverflow.com/questions/59962680/spark-structured-streaming-error-while-sending-aggregated-result-to-kafka-topic
The user mentions it's an issue with
org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.4
--
Sent from: http://apache-spark