I need to write a Spark Structured Streaming pipeline that involves
multiple aggregations, splitting data into multiple sub-pipes and union
them. Also it need to have stateful aggregation with timeout.
Spark Structured Streaming support all of the required functionality but
not as one stream. I
Hi,
I need to write nightly job that ingest large csv files (~15GB each) and
add/update/delete the changed rows to relational database.
If a row is identical to what in the database, I don't want to re-write the
row to the database. Also, if same item comes from multiple sources (files)
I need