Hi Spark users, I'm very much hoping someone can help me out. I have a strict performance requirement on a particular query. One of the stages shows great variance in duration -- from 300ms to 10sec.
The stage is mapPartitionsWithIndex at Operator.scala:210 (running Spark 0.8) I have run the job quite a few times -- the details within the stage do not account for the overall duration shown for the stage. What could be taking up time that's not showing within the stage breakdown UI? Im thinking that reading the data in is reflected in the Duration column before, so caching should not be a reason(I'm not caching explicitly)? The details within the stage always show roughly the following (both for the 10second and 600ms query -- very little variation, nothing over 500ms, ShuffleWrite size is pretty comparable): StatusLocality LevelExecutor Launch Time DurationGC TimeShuffle Write 1864 SUCCESSNODE_LOCAL ####### 301 ms 8 ms 111.0 B 1863 SUCCESSNODE_LOCAL ####### 273 ms 102.0 B 1862 SUCCESSNODE_LOCAL ####### 245 ms 111.0 B 1861 SUCCESSNODE_LOCAL ####### 326 ms 4 ms 102.0 B 1860 SUCCESSNODE_LOCAL ####### 217 ms 6 ms 102.0 B 1859 SUCCESSNODE_LOCAL ####### 277 ms 111.0 B 1858 SUCCESSNODE_LOCAL ####### 262 ms 108.0 B 1857 SUCCESSNODE_LOCAL ####### 217 ms 14 ms 112.0 B 1856 SUCCESSNODE_LOCAL ####### 208 ms 109.0 B 1855 SUCCESSNODE_LOCAL ####### 242 ms 74.0 B 1854 SUCCESSNODE_LOCAL ####### 218 ms 3 ms 58.0 B 1853 SUCCESSNODE_LOCAL ####### 254 ms 12 ms 102.0 B 1852 SUCCESSNODE_LOCAL ####### 274 ms 8 ms 77.0 B