Re: Output mode in Structured Streaming and DSv1 sink/DSv2 table

2020-09-27 Thread Jungtaek Lim
bump to see anyone interested or concerned about this

On Sun, Sep 20, 2020 at 1:59 PM Jungtaek Lim 
wrote:

> Hi devs,
>
> We have a capability check in DSv2 defining which operations can be done
> against the data source both read and write. The concept was brought in
> DSv2, so it's not weird for DSv1 to don't have a concept.
>
> In SS the problem arises - if I understand correctly, we would like to
> couple the output mode in the query and the output table. That said,
> complete mode should enforce the output table to truncate the content.
> Update mode should enforce the output table to "upsert" or "delete and
> append" the content.
>
> Nothing has been done against the DSv1 sink - Spark doesn't enforce
> anything and works as append mode, though the query still respects the
> output mode on stateful operations.
>
> I understand we don't want to make end users surprised on broken
> compatibility, but shouldn't it be an "temporary" "exceptional" case
> and DSv2 never does it again? I'm seeing many built-in data sources being
> migrated to DSv2 with the exception of "do nothing for update/truncate",
> which simply destruct the rationalization on capability.
>
> In addition, they don't add TRUNCATE in capability but add
> SupportsTruncate in WriteBuilder, which is weird. It works as of now
> because SS misses checking capability on the writer side (I guess it only
> checks STREAMING_WRITE), but once we check capability in first place,
> things will break.
> (I'm looking into adding a writer plan in SS before analyzer, and check
> capability there.)
>
> What would be our best fix on this issue? Would we leave the
> responsibility of handling "truncate" on the data source (so do nothing is
> fine if it's intended), and just add TRUNCATE to the capability? (That
> should be documented in its data source description though.) Or drop the
> support on truncate if the data source is unable to truncate? (Foreach and
> Kafka output tables will be unable to apply complete mode afterwards.)
>
> Looking forward to hear everyone's thoughts.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>


Output mode in Structured Streaming and DSv1 sink/DSv2 table

2020-09-19 Thread Jungtaek Lim
Hi devs,

We have a capability check in DSv2 defining which operations can be done
against the data source both read and write. The concept was brought in
DSv2, so it's not weird for DSv1 to don't have a concept.

In SS the problem arises - if I understand correctly, we would like to
couple the output mode in the query and the output table. That said,
complete mode should enforce the output table to truncate the content.
Update mode should enforce the output table to "upsert" or "delete and
append" the content.

Nothing has been done against the DSv1 sink - Spark doesn't enforce
anything and works as append mode, though the query still respects the
output mode on stateful operations.

I understand we don't want to make end users surprised on broken
compatibility, but shouldn't it be an "temporary" "exceptional" case
and DSv2 never does it again? I'm seeing many built-in data sources being
migrated to DSv2 with the exception of "do nothing for update/truncate",
which simply destruct the rationalization on capability.

In addition, they don't add TRUNCATE in capability but add SupportsTruncate
in WriteBuilder, which is weird. It works as of now because SS misses
checking capability on the writer side (I guess it only checks
STREAMING_WRITE), but once we check capability in first place, things will
break.
(I'm looking into adding a writer plan in SS before analyzer, and check
capability there.)

What would be our best fix on this issue? Would we leave the responsibility
of handling "truncate" on the data source (so do nothing is fine if it's
intended), and just add TRUNCATE to the capability? (That should be
documented in its data source description though.) Or drop the support on
truncate if the data source is unable to truncate? (Foreach and Kafka
output tables will be unable to apply complete mode afterwards.)

Looking forward to hear everyone's thoughts.

Thanks,
Jungtaek Lim (HeartSaVioR)