Re: How to Scale Streaming Application to Multiple Workers

Artemis User Thu, 15 Oct 2020 12:03:18 -0700

Thanks for the input. What I am interested is how to have multipleworkers to read and process the small files in parallel, and certainlyone file per worker at a time. Partitioning data frame doesn't makesense since the data frame is small already.


On 10/15/20 9:14 AM, Lalwani, Jayesh wrote:

Parallelism of streaming depends on the input source. If you are getting one 
small file per microbatch, then Spark will read it in one worker. You can 
always repartition your data frame after reading it to increase the parallelism.


On 10/14/20, 11:26 PM, "Artemis User" <arte...@dtechspace.com> wrote:

     CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



     Hi,

     We have a streaming application that read microbatch csv files and
     involves the foreachBatch call.  Each microbatch can be processed
     independently.  I noticed that only one worker node is being utilized.
     Is there anyway or any explicit method to distribute the batch work load
     to multiple workers?  I would think Spark would execute foreachBatch
     method on different workers since each batch can be treated as atomic?

     Thanks!

     ND


     ---------------------------------------------------------------------
     To unsubscribe e-mail: user-unsubscr...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How to Scale Streaming Application to Multiple Workers

Reply via email to