Does your job fail because of serialization error? One situation I can
think of is something like this:

class NotSerializable(val n: Int)val obj = new NotSerializable(1)
sc.makeRDD(1 to 3).filter(_ > obj.n)

If you enclose a field member of an object into a closure, not only this
field but also the whole outer object is enclosed into the closure. If the
outer object is not serializable, then RDD DAG serialization would fail.
You can simply reference the field member with a separate variable to
workaround this:

class NotSerializable(val n: Int)val obj = new NotSerializable(1)val x = obj.n
sc.makeRDD(1 to 3).filter(_ > x)



On Wed, Apr 23, 2014 at 5:45 PM, randylu <randyl...@gmail.com> wrote:

>   my code is like:
>     rdd2 = rdd1.filter(_._2.length > 1)
>     rdd2.collect()
>   it works well, but if i use a variable /num/ instead of 1:
>     var num = 1
>     rdd2 = rdd1.filter(_._2.length > num)
>     rdd2.collect()
>   it fails at rdd2.collect()
>   so strange?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/about-rdd-filter-tp4657.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to