Non-deterministic behavior in spark

Ognen Duzlevski Fri, 24 Jan 2014 02:46:58 -0800

Hello,

(Sorry for the sensationalist title) :)


If I run Spark on files from S3 and do basic transformation like:

textfile()
filter
groupByKey
count

I get one number (e.g. 40,000).

If I do the same on the same files from HDFS, the number spat out is
completely different (VERY different - something like 13,000).

What would one do in a situation like this? How do I even go about figuring
out what the problem is? This is run on a cluster of 15 instances on Amazon.

Thanks,
Ognen

Non-deterministic behavior in spark

Reply via email to