subject:"Why RDDs are being dropped by Executors\?"

Re: Why RDDs are being dropped by Executors?

2015-09-23 Thread Tathagata Das

There could multiple reasons for caching till 90% - 1. not enough aggregate space in cluster - increase cluster memory 2. ata is skewed among executor so one executor is try to cache too much while others are idle - Repartition the data using RDD.repartition to force even distribution. The Storage

Re: Why RDDs are being dropped by Executors?

2015-09-23 Thread Uthayan Suthakar

Thank you tathagata for your response. It make sense to use the MEMORY_AND_DISK. But sometime when I start the job it does not cache everyting at the start. It only caches 90%. The LRU scheme will only take affect after a while when the data is not in use but why it failing to cache the data at the

Re: Why RDDs are being dropped by Executors?

2015-09-22 Thread Tathagata Das

If the RDD is not constantly in use, then the LRU scheme in each executor can kick out some of the partitions from memory. If you want to avoid recomputing in such cases, you could persist with StorageLevel.MEMORY_AND_DISK, where the partitions will dropped to disk when kicked from memory. That wil

Why RDDs are being dropped by Executors?

2015-09-22 Thread Uthayan Suthakar

Hello All, We have a Spark Streaming job that reads data from DB (three tables) and cache them into memory ONLY at the start then it will happily carry out the incremental calculation with the new data. What we've noticed occasionally is that one of the RDDs caches only 90% of the data. Therefore,

Re: Why RDDs are being dropped by Executors?

Re: Why RDDs are being dropped by Executors?

Re: Why RDDs are being dropped by Executors?

Why RDDs are being dropped by Executors?

4 matches

Site Navigation

Mail list logo

Footer information