Hi,

I noticed that the "count" (of RDD)  in many of my queries is the most time
consuming one as it runs in the "driver" process rather then done by
parallel worker nodes,

Is there any way to perform "count" in parallel , at at least parallelize
 it as much as possible?

best,
/Shahab

Reply via email to