Hi , Just would like to clarify few doubts I have how BlockManager behaves . This is mostly in regards to Spark Streaming Context .
There are two possible cases Blocks may get dropped / not stored in memory Case 1. While writing the Block for MEMORY_ONLY settings , if Node's BlockManager does not have enough memory to unroll the block , Block wont be stored to memory and Receiver will throw error while writing the Block.. If StorageLevel is using Disk ( as in case MEMORY_AND_DISK) , blocks will be stored to Disk ONLY IF BlockManager not able to unroll to Memory... This is fine in the case while receiving the blocks , but this logic has a issue when old Blocks are chosen to be dropped from memory as Case 2 Case 2 : Now let say either for MEMORY_ONLY or MEMORY_AND_DISK settings , blocks are successfully stored to Memory in Case 1 . Now what would happen if memory limit goes beyond a certain threshold, BlockManager start dropping LRU blocks from memory which was successfully stored while receiving. Primary issue here what I see , while dropping the blocks in Case 2 , Spark does not check if storage level is using Disk (MEMORY_AND_DISK ) , and even with DISK storage levels blocks is drooped from memory without writing it to Disk. Or I believe the issue is at the first place that blocks are NOT written to Disk simultaneously in Case 1 , I understand this will impact throughput , but it design may throw BlockNotFound error if Blocks are chosen to be dropped even in case of StorageLevel is using Disk. Any thoughts ? Regards, Dibyendu