Re: Non-deterministic behavior in spark

尹绪森 Fri, 24 Jan 2014 04:41:34 -0800

 Does there are some non-deterministic codes in filter ? Such as
Random.nextInt(). If so, the program lost the idempotent feature. You
should specify a seed to it.



2014/1/24 Ognen Duzlevski <[email protected]>

> Hello,
>
> (Sorry for the sensationalist title) :)
>
> If I run Spark on files from S3 and do basic transformation like:
>
> textfile()
> filter
> groupByKey
> count
>
> I get one number (e.g. 40,000).
>
> If I do the same on the same files from HDFS, the number spat out is
> completely different (VERY different - something like 13,000).
>
> What would one do in a situation like this? How do I even go about
> figuring out what the problem is? This is run on a cluster of 15 instances
> on Amazon.
>
> Thanks,
> Ognen
>



-- 
Best Regards
-----------------------------------
Xusen Yin    尹绪森
Beijing Key Laboratory of Intelligent Telecommunications Software and
Multimedia
Beijing University of Posts & Telecommunications
Intel Labs China
Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*

Re: Non-deterministic behavior in spark

Reply via email to