Output mode in Structured Streaming and DSv1 sink/DSv2 table

Jungtaek Lim Sat, 19 Sep 2020 22:00:57 -0700

Hi devs,

We have a capability check in DSv2 defining which operations can be done
against the data source both read and write. The concept was brought in
DSv2, so it's not weird for DSv1 to don't have a concept.


In SS the problem arises - if I understand correctly, we would like to
couple the output mode in the query and the output table. That said,
complete mode should enforce the output table to truncate the content.
Update mode should enforce the output table to "upsert" or "delete and
append" the content.

Nothing has been done against the DSv1 sink - Spark doesn't enforce
anything and works as append mode, though the query still respects the
output mode on stateful operations.

I understand we don't want to make end users surprised on broken
compatibility, but shouldn't it be an "temporary" "exceptional" case
and DSv2 never does it again? I'm seeing many built-in data sources being
migrated to DSv2 with the exception of "do nothing for update/truncate",
which simply destruct the rationalization on capability.

In addition, they don't add TRUNCATE in capability but add SupportsTruncate
in WriteBuilder, which is weird. It works as of now because SS misses
checking capability on the writer side (I guess it only checks
STREAMING_WRITE), but once we check capability in first place, things will
break.
(I'm looking into adding a writer plan in SS before analyzer, and check
capability there.)

What would be our best fix on this issue? Would we leave the responsibility
of handling "truncate" on the data source (so do nothing is fine if it's
intended), and just add TRUNCATE to the capability? (That should be
documented in its data source description though.) Or drop the support on
truncate if the data source is unable to truncate? (Foreach and Kafka
output tables will be unable to apply complete mode afterwards.)

Looking forward to hear everyone's thoughts.

Thanks,
Jungtaek Lim (HeartSaVioR)

Output mode in Structured Streaming and DSv1 sink/DSv2 table

Reply via email to