Its important to note that running multiple streaming queries, as of today,
would read the input data that many number of time. So there is a trade off
between the two approaches.
So even though scenario 1 wont get great catalyst optimization, it may be
more efficient overall in terms of resource
This is not easy to say without testing. It depends on type of computation etc.
it also depends on the Spark version. Generally vectorization / SIMD could be
much faster if it is applied by Spark / the JVM in scenario 2.
> On 9. Aug 2017, at 07:05, Raghavendra Pandey
I am using structured streaming to evaluate multiple rules on same running
stream.
I have two options to do that. One is to use forEach and evaluate all the
rules on the row..
The other option is to express rules in spark sql dsl and run multiple
queries.
I was wondering if option 1 will result in