Re: [DISCUSS] "complete" streaming output mode

2020-05-21 Thread Burak Yavuz
Oh wow. I never thought this would be up for debate. I use complete mode VERY frequently for all my dashboarding use cases. Here are some of my thoughts: > 1. It destroys the purpose of watermark and forces Spark to maintain all of state rows, growing incrementally. It only works when all keys

Re: [DISCUSS] "complete" streaming output mode

2020-05-21 Thread Jungtaek Lim
Thanks for the input, Burak! The reason I started to think the complete mode is for niche case is that the mode is most probably only helpful for the memory sink, once we address the update mode properly. Kafka has compacted topic, JDBC can upsert, Delta can merge, AFAIK Iceberg is in discussion

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-21 Thread Jungtaek Lim
Looks like there're new blocker issues newly figured out. * https://issues.apache.org/jira/browse/SPARK-31786 * https://issues.apache.org/jira/browse/SPARK-31761 (not yet marked as blocker but according to JIRA comment it's a regression issue as well as correctness issue IMHO) Let's collect the

Re: Handling user-facing metadata issues on file stream source & sink

2020-05-21 Thread Jungtaek Lim
Worth noting that I got similar question around local community as well. These reporters didn't encounter the edge-case, they're encountered the critical issue in the normal running of streaming query. On Fri, May 8, 2020 at 4:49 PM Jungtaek Lim wrote: > (bump to expose the discussion to more

Weird ClassCastException when using generics from Java

2020-05-21 Thread Stephen Coy
Hi there, This will be a little long so please bear with me. There is a buildable example available at https://github.com/sfcoy/sfcoy-spark-cce-test. Say I have the following three tables: Machines Id,MachineType 11,A 12,B 23,B 24,A 25,B Bolts MachineType,Description

Re: [DatasourceV2] Default Mode for DataFrameWriter not Dependent on DataSource Version

2020-05-21 Thread Russell Spitzer
Another related issue for backwards compatibility, In Datasource.scala https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L415-L416 Will get triggered even when the class is a Valid DatasourceV2 but being used in a