I think you could do something like that : (but Cheng's solution allows to avoid testing, so it's better) :

val isEmpty = rdd.mapPartitions(iter => Iterator(! iter.hasNext)).reduce(_&&_)

but it would be bad if the rdd is not persisted...

Guillaume
Unfortunately there isn't an easy way to test whether an RDD is empty unless you count it.  But fortunately, RDD.fold can solve your problem.  All you need to do is to provide a zero value.

For an RDD r, r.reduce(f) is roughly equivalent to r.fold(r.first)(f), that is, the first element of r is considered as the zero value of the fold call.  For example:

    val r = sc.makeRDD(1 to 4)
    r.reduce(_ + _)  // can't handle empty RDD
    r.fold(0)(_ + _)  // no problem with empty RDD



On Tue, Feb 18, 2014 at 8:33 PM, Sampo Niskanen <[email protected]> wrote:
Hi,

I couldn't find any documentation on how to test whether an RDD is empty.  I'm doing a reduce operation, but it throws an UnsupportedOperationException if the RDD is empty.  I'd like to check if the RDD is empty before calling reduce.

On the Google Groups list Reynold Xin had suggested using rdd.first, but it throws the same exception in case the RDD is empty.  rdd.count() on the other hand might do a lot of unnecessary processing just to check whether the RDD is empty.

Is there some way to do rdd.isEmpty?


Catching the UnsupportedOperationException works, but it seems like bad practice, as it could be an indication of some other error as well.  (I'd suggest changing it to EmptyRddException or similar, which could be a subclass of UnsupportedOperationException.  That would make the cause explicit.)


Thanks.

    Sampo Niskanen
    Lead developer / Wellmo
    [email protected]
 



--
eXenSa
Guillaume PITEL, Président
+33(0)6 25 48 86 80

eXenSa S.A.S.
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05

Reply via email to