Thank you, Meisam! But I have found something interesting (for me, as novice in Spark). Working with 400k elements, count() takes 30 secs and .take(Int.MaxValue).size is less than a second! The problem comes when working with 1400k elements - .take(Int.MaxValue).size is not so quik. Best regards, Valentin
2013/11/14 Meisam Fathi <[email protected]>: > Hi Valentin, > > data.filter() and rdd map() do not actually do the computation. When > you call count() or collect(), your RDD first dies the filter(), then > the map() and then the count() or collect(). > See this for more info: > https://github.com/mesos/spark/wiki/Spark-Programming-Guide#transformations > > Thanks, > Meisam > > On Thu, Nov 14, 2013 at 2:02 PM, Valentin Michajlenko > <[email protected]> wrote: >> Hi! >> I load data from list( sc.parallelize() ) with length about 1400000 >> items. After that I run data.filter(func1).map(func2). This operation >> runs less, then a second. But after that function count() (or >> collect() ) takes about 30 seconds. Please, help me to reduce this >> time! >> Best Regards, >> Valentin
