No. It is a filter that splits a line in a json file and extracts a position for it - every run is the same.
That's what bothers me about this. Ognen On Fri, Jan 24, 2014 at 12:40 PM, 尹绪森 <[email protected]> wrote: > Does there are some non-deterministic codes in filter ? Such as > Random.nextInt(). If so, the program lost the idempotent feature. You > should specify a seed to it. > > > 2014/1/24 Ognen Duzlevski <[email protected]> > >> Hello, >> >> (Sorry for the sensationalist title) :) >> >> If I run Spark on files from S3 and do basic transformation like: >> >> textfile() >> filter >> groupByKey >> count >> >> I get one number (e.g. 40,000). >> >> If I do the same on the same files from HDFS, the number spat out is >> completely different (VERY different - something like 13,000). >> >> What would one do in a situation like this? How do I even go about >> figuring out what the problem is? This is run on a cluster of 15 instances >> on Amazon. >> >> Thanks, >> Ognen >> > > > > -- > Best Regards > ----------------------------------- > Xusen Yin 尹绪森 > Beijing Key Laboratory of Intelligent Telecommunications Software and > Multimedia > Beijing University of Posts & Telecommunications > Intel Labs China > Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>* > -- "Le secret des grandes fortunes sans cause apparente est un crime oublié, parce qu'il a été proprement fait" - Honore de Balzac
