Its important to note that running multiple streaming queries, as of today,
would read the input data that many number of time. So there is a trade off
between the two approaches.
So even though scenario 1 wont get great catalyst optimization, it may be
more efficient overall in terms of resource u
This is not easy to say without testing. It depends on type of computation etc.
it also depends on the Spark version. Generally vectorization / SIMD could be
much faster if it is applied by Spark / the JVM in scenario 2.
> On 9. Aug 2017, at 07:05, Raghavendra Pandey
> wrote:
>
> I am using s