Hi Vijay:
I am using spark-shell because I am still prototyping the steps involved.
Regarding executors - I have 280 executors and UI only show a few straggler
tasks on each trigger. The UI does not show too much time spend on GC.
suspect the delay is because of getting data from kafka. The num
Instead of spark-shell have you tried running it as a job.
how many executors and cores, can you share the RDD graph and event timeline
on the UI and did you find which of the tasks taking more time was they are
any GC
please look at the UI if not already it can provide lot of information
-
Hi:
I am working with spark structured streaming (2.2.1) reading data from Kafka
(0.11).
I need to aggregate data ingested every minute and I am using spark-shell at
the moment. The message rate ingestion rate is approx 500k/second. During
some trigger intervals (1 minute) especially when t