Doing RDD."count" in parallel , at at least parallelize it as much as possible?

shahab Thu, 30 Oct 2014 10:27:17 -0700

Hi,

I noticed that the "count" (of RDD)  in many of my queries is the most time
consuming one as it runs in the "driver" process rather then done by
parallel worker nodes,


Is there any way to perform "count" in parallel , at at least parallelize
 it as much as possible?

best,
/Shahab

Doing RDD."count" in parallel , at at least parallelize it as much as possible?

Reply via email to