Re: BUG :: UI Spark

2024-05-26 Thread Mich Talebzadeh
sorry i thought i gave an explanation The issue you are encountering with incorrect record numbers in the "ShuffleWrite Size/Records" column in the Spark DAG UI when data is read from cache/persist is a known limitation. This discrepancy arises due to the way Spark handles and reports shuffle

Re: BUG :: UI Spark

2024-05-26 Thread Mich Talebzadeh
Just to further clarify that the Shuffle Write Size/Records column in the Spark UI can be misleading when working with cached/persisted data because it reflects the shuffled data size and record count, not the entire cached/persisted data., So it is fair to say that this is a limitation of the

Re: BUG :: UI Spark

2024-05-26 Thread Mich Talebzadeh
Yep, the Spark UI's Shuffle Write Size/Records" column can sometimes show incorrect record counts *when data is retrieved from cache or persisted data*. This happens because the record count reflects the number of records written to disk for shuffling, and not the actual number of records in the

Re: BUG :: UI Spark

2024-05-26 Thread Sathi Chowdhury
Can you please explain how did you realize it’s wrong? Did you check cloudwatch for the same metrics and compare? Also are you using do.cache() and expecting that shuffle read/write to go away ? Sent from Yahoo Mail for iPhone On Sunday, May 26, 2024, 7:53 AM, Prem Sahoo wrote: Can anyone

Re: BUG :: UI Spark

2024-05-26 Thread Prem Sahoo
Can anyone please assist me ? On Fri, May 24, 2024 at 12:29 AM Prem Sahoo wrote: > Does anyone have a clue ? > > On Thu, May 23, 2024 at 11:40 AM Prem Sahoo wrote: > >> Hello Team, >> in spark DAG UI , we have Stages tab. Once you click on each stage you >> can view the tasks. >> >> In each