Oh wow. I never thought this would be up for debate. I use complete mode
VERY frequently for all my dashboarding use cases. Here are some of my
thoughts:
> 1. It destroys the purpose of watermark and forces Spark to maintain all
of state rows, growing incrementally. It only works when all keys
Thanks for the input, Burak!
The reason I started to think the complete mode is for niche case is that
the mode is most probably only helpful for the memory sink, once we address
the update mode properly. Kafka has compacted topic, JDBC can upsert, Delta
can merge, AFAIK Iceberg is in discussion
Looks like there're new blocker issues newly figured out.
* https://issues.apache.org/jira/browse/SPARK-31786
* https://issues.apache.org/jira/browse/SPARK-31761 (not yet marked as
blocker but according to JIRA comment it's a regression issue as well as
correctness issue IMHO)
Let's collect the
Worth noting that I got similar question around local community as well.
These reporters didn't encounter the edge-case, they're encountered the
critical issue in the normal running of streaming query.
On Fri, May 8, 2020 at 4:49 PM Jungtaek Lim
wrote:
> (bump to expose the discussion to more
Hi there,
This will be a little long so please bear with me. There is a buildable example
available at https://github.com/sfcoy/sfcoy-spark-cce-test.
Say I have the following three tables:
Machines
Id,MachineType
11,A
12,B
23,B
24,A
25,B
Bolts
MachineType,Description
Another related issue for backwards compatibility, In Datasource.scala
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L415-L416
Will get triggered even when the class is a Valid DatasourceV2 but being
used in a