Guillaume, this is RDD.count()

  /**

   * Return the number of elements in the RDD.

   */

  def count(): Long = {

    sc.runJob(this, (iter: Iterator[T]) => {

      // Use a while loop to count the number of elements rather than
iter.size because

      // iter.size uses a for loop, which is slightly slower in current
version of Scala.

      var result = 0L

      while (iter.hasNext) {

        result += 1L

        iter.next()

      }

      result

    }).sum

  }


so if you want something cheaper you could try


sc.runJob(rdd, (iter: Iterator[_]) => {})

--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Wed, Jan 22, 2014 at 12:09 AM, Reynold Xin <[email protected]> wrote:

> You can also do
>
> rdd.foreach(a => Unit)
>
> I actually suspect count is even cheaper than this.
>
>
>
> On Tue, Jan 21, 2014 at 5:05 AM, Guillaume Pitel <
> [email protected]> wrote:
>
>>  Thanks. So you mean that first() trigger the computation of the WHOLE
>> RDD ? That does not sound right, I thought it was lazy.
>>
>> Guillaume
>>
>> Hi,
>> You can call less expensive operations like first or  take to trigger the
>> computation.
>>
>>
>>
>>
>> --
>>    [image: eXenSa]
>>  *Guillaume PITEL, Président*
>> +33(0)6 25 48 86 80
>>
>> eXenSa S.A.S. <http://www.exensa.com/>
>>  41, rue Périer - 92120 Montrouge - FRANCE
>> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>>
>
>

<<exensa_logo_mail.png>>

Reply via email to