Hi
The RDBMS context is quite broad: It has both large facts tables with
billion rows as well as hundreds of small normalized tables. Depending
on the spark transformation, the source data can be one or multiple
tables, as well as few rows, million or even billion of them. When new
data is inserte
Yes, you can certainly use spark streaming, but reading from the original
source table may still be time consuming and resource intensive.
Having some context on the RDBMS platform, data size/volumes involved and the
tolerable lag (between changes being created and it being processed by Spark)
Hi
I have this living RDBMS and I d'like to apply a spark job on several
tables once new data get in.
I could run batch spark jobs thought cron jobs every minutes. But the
job takes time and resources to begin (sparkcontext, yarn)
I wonder if I could run one instance of a spark streaming job