Just to clarify a subtle difference between DStreams and Structured Streaming. Multiple input streams in a DStreamGraph is likely to mean they are all being processed/computed in the same way as there can be only one streaming query / context active in the StreamingContext. However, in the case of Structured Streaming, there can be any number of independent streaming queries (i.e. different computations), and each streaming query with any number if separate input sources. So Michael's comment of "each stream will have a thread on the driver" is correct when there are many independent queries with different computations simultaneously running. However if all your streams need to be processed in the same way, then its one streaming query with many inputs, and will require one thread.
Hope this helps. TD On Wed, Jan 31, 2018 at 12:39 PM, Michael Armbrust <mich...@databricks.com> wrote: > -dev +user > > >> Similarly for structured streaming, Would there be any limit on number of >> of streaming sources I can have ? >> > > There is no fundamental limit, but each stream will have a thread on the > driver that is doing coordination of execution. We comfortably run 20+ > streams on a single cluster in production, but I have not pushed the > limits. You'd want to test with your specific application. >