Hello,

I am interested in using the new Structured Streaming feature of Spark SQL
and am currently doing some experiments on code at HEAD. I would like to
have a better understanding of how deletion should be handled in a
structured streaming setting.

Given some incremental query computing an arbitrary aggregation over some
dataset, inserting new values is somewhat obvious: simply update the
aggregate computation tree with whatever new values are added to input
datasets/datastreams. But things are not so obvious for updates and
deletions: do they have a representation in the input datastreams? If I
have a query that aggregates some value over some key, and I delete all
instances of that key, I would expect the query to output a result removing
the key's aggregated value. The same is true for updates...

Thanks for any insights you might want to share.

Regards,
-- 
Arnaud Bailly

twitter: abailly
skype: arnaud-bailly
linkedin: http://fr.linkedin.com/in/arnaudbailly/

Reply via email to