Hi Valentin,

data.filter() and rdd map() do not actually do the computation. When
you call count() or collect(), your RDD first dies the filter(), then
the map() and then the count() or collect().
See this for more info:
https://github.com/mesos/spark/wiki/Spark-Programming-Guide#transformations

Thanks,
Meisam

On Thu, Nov 14, 2013 at 2:02 PM, Valentin Michajlenko
<[email protected]> wrote:
> Hi!
> I load data from list( sc.parallelize() ) with length about 1400000
> items. After that I run data.filter(func1).map(func2). This operation
> runs less, then a second. But after that function count() (or
> collect() ) takes about 30 seconds. Please, help me to reduce this
> time!
> Best Regards,
> Valentin

Reply via email to