Could be a number of issues.. maybe your csv is not allowing map tasks to
be broken, of the file is not process-node local.. how many tasks are you
seeing in spark web ui for map & store data. are all the nodes being used
when you look at task level .. is the time taken by each task roughly equal
o
Hi,
I have a csv file... (say "n" columns )
I am trying to do a filter operation like:
query = rdd.filter(lambda x:x[1] == "1234")
query.take(20)
Basically this would return me rows with that specific value?
This manipulation is taking quite some time to execute.. (if i can
compare.. maybe slo