Another test I just did it's to execute with local[X] and this problem doesn't happen. Communication problems?
2018-08-23 22:43 GMT+02:00 Guillermo Ortiz <konstt2...@gmail.com>: > it's a complex DAG before the point I cache the RDD, they are some joins, > filter and maps before caching data, but most of the times it doesn't take > almost time to do it. I could understand if it would take the same time all > the times to process or cache the data. Besides it seems random and they > are any weird data in the input. > > Another test I tried it's disabled caching, and I saw that all the > microbatches last the same time, so it seems that it's relation with > caching these RDD's. > > El jue., 23 ago. 2018 a las 15:29, Sonal Goyal (<sonalgoy...@gmail.com>) > escribió: > >> How are these small RDDs created? Could the blockage be in their compute >> creation instead of their caching? >> >> Thanks, >> Sonal >> Nube Technologies <http://www.nubetech.co> >> >> <http://in.linkedin.com/in/sonalgoyal> >> >> >> >> On Thu, Aug 23, 2018 at 6:38 PM, Guillermo Ortiz <konstt2...@gmail.com> >> wrote: >> >>> I use spark with caching with persist method. I have several RDDs what I >>> cache but some of them are pretty small (about 300kbytes). Most of time it >>> works well and usually lasts 1s the whole job, but sometimes it takes about >>> 40s to store 300kbytes to cache. >>> >>> If I go to the SparkUI->Cache, I can see how the percentage is >>> increasing until 83% (250kbytes) and then it stops for a while. If I check >>> the event time in the Spark UI I can see that when this happen there is a >>> node where tasks takes very long time. This node could be any from the >>> cluster, it's not always the same. >>> >>> In the spark executor logs I can see it's that it takes about 40s in >>> store 3.7kb when this problem occurs >>> >>> INFO 2018-08-23 12:46:58 Logging.scala:54 - >>> org.apache.spark.storage.BlockManager: Found block rdd_1705_23 locally >>> INFO 2018-08-23 12:47:38 Logging.scala:54 - >>> org.apache.spark.storage.memory.MemoryStore: Block rdd_1692_7 stored as >>> bytes in memory (estimated size 3.7 KB, free 1048.0 MB) >>> INFO 2018-08-23 12:47:38 Logging.scala:54 - >>> org.apache.spark.storage.BlockManager: Found block rdd_1692_7 locally >>> >>> I have tried with MEMORY_ONLY, MEMORY_AND_SER and so on with the same >>> results. I have checked the IO disk (although if I use memory_only I guess >>> that it doesn't have sense) and I can't see any problem. This happens >>> randomly, but it could be in the 25% of the jobs. >>> >>> Any idea about what it could be happening? >>> >> >>