Hi Valentin, data.filter() and rdd map() do not actually do the computation. When you call count() or collect(), your RDD first dies the filter(), then the map() and then the count() or collect(). See this for more info: https://github.com/mesos/spark/wiki/Spark-Programming-Guide#transformations
Thanks, Meisam On Thu, Nov 14, 2013 at 2:02 PM, Valentin Michajlenko <[email protected]> wrote: > Hi! > I load data from list( sc.parallelize() ) with length about 1400000 > items. After that I run data.filter(func1).map(func2). This operation > runs less, then a second. But after that function count() (or > collect() ) takes about 30 seconds. Please, help me to reduce this > time! > Best Regards, > Valentin
