Re: RDD.count() take a lot of time

Meisam Fathi Thu, 14 Nov 2013 11:28:35 -0800

Hi Valentin,

data.filter() and rdd map() do not actually do the computation. When
you call count() or collect(), your RDD first dies the filter(), then
the map() and then the count() or collect().
See this for more info:
https://github.com/mesos/spark/wiki/Spark-Programming-Guide#transformations


Thanks,
Meisam

On Thu, Nov 14, 2013 at 2:02 PM, Valentin Michajlenko
<[email protected]> wrote:
> Hi!
> I load data from list( sc.parallelize() ) with length about 1400000
> items. After that I run data.filter(func1).map(func2). This operation
> runs less, then a second. But after that function count() (or
> collect() ) takes about 30 seconds. Please, help me to reduce this
> time!
> Best Regards,
> Valentin

Re: RDD.count() take a lot of time

Reply via email to