If you look at the source code you’ll see that this is merely a convenience
function on PairRDDs - only interesting detail is that it uses a mutable
HashMap to optimize creating maps with many keys. That being said, .collect()
is called anyway.
Hi All,
Is there any performance impact when I use collectAsMap on my RDD instead of
rdd.collect().toMap ?
I have a key value rdd and I want to convert to HashMap as far I know
collect() is not efficient on large data sets as it runs on driver can I use
collectAsMap instead is there any