Hello, I am interested in using the new Structured Streaming feature of Spark SQL and am currently doing some experiments on code at HEAD. I would like to have a better understanding of how deletion should be handled in a structured streaming setting.
Given some incremental query computing an arbitrary aggregation over some dataset, inserting new values is somewhat obvious: simply update the aggregate computation tree with whatever new values are added to input datasets/datastreams. But things are not so obvious for updates and deletions: do they have a representation in the input datastreams? If I have a query that aggregates some value over some key, and I delete all instances of that key, I would expect the query to output a result removing the key's aggregated value. The same is true for updates... Thanks for any insights you might want to share. Regards, -- Arnaud Bailly twitter: abailly skype: arnaud-bailly linkedin: http://fr.linkedin.com/in/arnaudbailly/