Re: difference between rdd.collect().toMap to rdd.collectAsMap() ?

2015-10-20 Thread Adrian Tanase
If you look at the source code you’ll see that this is merely a convenience function on PairRDDs - only interesting detail is that it uses a mutable HashMap to optimize creating maps with many keys. That being said, .collect() is called anyway.

difference between rdd.collect().toMap to rdd.collectAsMap() ?

2015-10-20 Thread kali.tumm...@gmail.com
Hi All, Is there any performance impact when I use collectAsMap on my RDD instead of rdd.collect().toMap ? I have a key value rdd and I want to convert to HashMap as far I know collect() is not efficient on large data sets as it runs on driver can I use collectAsMap instead is there any