Re: shuffle mathematic formulat

2020-02-04 Thread Aironman DirtDiver
I would have to check it, but in principle it could be done by checking the streaming logs, so that once you detect when a shuffle operation starts and ends, you can know the total operation time. https://stackoverflow.com/questions/27276884/what-is-shuffle-read-shuffle-write-in-apache-spark El

shuffle mathematic formulat

2020-02-04 Thread asma zgolli
dear spark contributors, I'm searching for a way to model spark shuffle cost and i wonder if there s mathematic formulas to compute "shuffle read " and "shuffle write" sizes in the stages view in spark UI. if there isn't, are there any references to head start in this. Stage Id ▾