Unfortunately there isn't an easy way to test whether an RDD is empty
unless you count it. But fortunately, RDD.fold can solve your problem.
All you need to do is to provide a zero value.
For an RDD r, r.reduce(f) is roughly equivalent to r.fold(r.first)(f), that
is, the first element of r is considered as the zero value of the fold
call. For example:
val r = sc.makeRDD(1 to 4)
r.reduce(_ + _) // can't handle empty RDD
r.fold(0)(_ + _) // no problem with empty RDD
On Tue, Feb 18, 2014 at 8:33 PM, Sampo Niskanen
<[email protected]>wrote:
> Hi,
>
> I couldn't find any documentation on how to test whether an RDD is empty.
> I'm doing a reduce operation, but it throws an
> UnsupportedOperationException if the RDD is empty. I'd like to check if
> the RDD is empty before calling reduce.
>
> On the Google Groups list Reynold Xin had suggested using rdd.first, but
> it throws the same exception in case the RDD is empty. rdd.count() on the
> other hand might do a lot of unnecessary processing just to check whether
> the RDD is empty.
>
> Is there some way to do rdd.isEmpty?
>
>
> Catching the UnsupportedOperationException works, but it seems like bad
> practice, as it could be an indication of some other error as well. (I'd
> suggest changing it to EmptyRddException or similar, which could be a
> subclass of UnsupportedOperationException. That would make the cause
> explicit.)
>
>
> Thanks.
>
> * Sampo Niskanen*
> *Lead developer / Wellmo*
> [email protected]
>
>