1. Does there any in-place operation in you code? Such as addi() for DoubleMatrix. This kind of operation will affect the original data.
2. You could try to use Spark replay debugger, there is a assert function. Hope that helpful. http://spark-replay-debugger-overview.readthedocs.org/en/latest/ 2014/1/24 Ognen Duzlevski <[email protected]> > No. It is a filter that splits a line in a json file and extracts a > position for it - every run is the same. > > That's what bothers me about this. > > Ognen > > > On Fri, Jan 24, 2014 at 12:40 PM, 尹绪森 <[email protected]> wrote: > >> Does there are some non-deterministic codes in filter ? Such as >> Random.nextInt(). If so, the program lost the idempotent feature. You >> should specify a seed to it. >> >> >> 2014/1/24 Ognen Duzlevski <[email protected]> >> >>> Hello, >>> >>> (Sorry for the sensationalist title) :) >>> >>> If I run Spark on files from S3 and do basic transformation like: >>> >>> textfile() >>> filter >>> groupByKey >>> count >>> >>> I get one number (e.g. 40,000). >>> >>> If I do the same on the same files from HDFS, the number spat out is >>> completely different (VERY different - something like 13,000). >>> >>> What would one do in a situation like this? How do I even go about >>> figuring out what the problem is? This is run on a cluster of 15 instances >>> on Amazon. >>> >>> Thanks, >>> Ognen >>> >> >> >> >> -- >> Best Regards >> ----------------------------------- >> Xusen Yin 尹绪森 >> Beijing Key Laboratory of Intelligent Telecommunications Software and >> Multimedia >> Beijing University of Posts & Telecommunications >> Intel Labs China >> Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>* >> > > > > -- > "Le secret des grandes fortunes sans cause apparente est un crime oublié, > parce qu'il a été proprement fait" - Honore de Balzac > -- Best Regards ----------------------------------- Xusen Yin 尹绪森 Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia Beijing University of Posts & Telecommunications Intel Labs China Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*
