Guillaume, this is RDD.count() /**
* Return the number of elements in the RDD.
*/
def count(): Long = {
sc.runJob(this, (iter: Iterator[T]) => {
// Use a while loop to count the number of elements rather than
iter.size because
// iter.size uses a for loop, which is slightly slower in current
version of Scala.
var result = 0L
while (iter.hasNext) {
result += 1L
iter.next()
}
result
}).sum
}
so if you want something cheaper you could try
sc.runJob(rdd, (iter: Iterator[_]) => {})
--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen
On Wed, Jan 22, 2014 at 12:09 AM, Reynold Xin <[email protected]> wrote:
> You can also do
>
> rdd.foreach(a => Unit)
>
> I actually suspect count is even cheaper than this.
>
>
>
> On Tue, Jan 21, 2014 at 5:05 AM, Guillaume Pitel <
> [email protected]> wrote:
>
>> Thanks. So you mean that first() trigger the computation of the WHOLE
>> RDD ? That does not sound right, I thought it was lazy.
>>
>> Guillaume
>>
>> Hi,
>> You can call less expensive operations like first or take to trigger the
>> computation.
>>
>>
>>
>>
>> --
>> [image: eXenSa]
>> *Guillaume PITEL, Président*
>> +33(0)6 25 48 86 80
>>
>> eXenSa S.A.S. <http://www.exensa.com/>
>> 41, rue Périer - 92120 Montrouge - FRANCE
>> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>>
>
>
<<exensa_logo_mail.png>>
