Does your job fail because of serialization error? One situation I can think of is something like this:
class NotSerializable(val n: Int)val obj = new NotSerializable(1) sc.makeRDD(1 to 3).filter(_ > obj.n) If you enclose a field member of an object into a closure, not only this field but also the whole outer object is enclosed into the closure. If the outer object is not serializable, then RDD DAG serialization would fail. You can simply reference the field member with a separate variable to workaround this: class NotSerializable(val n: Int)val obj = new NotSerializable(1)val x = obj.n sc.makeRDD(1 to 3).filter(_ > x) On Wed, Apr 23, 2014 at 5:45 PM, randylu <randyl...@gmail.com> wrote: > my code is like: > rdd2 = rdd1.filter(_._2.length > 1) > rdd2.collect() > it works well, but if i use a variable /num/ instead of 1: > var num = 1 > rdd2 = rdd1.filter(_._2.length > num) > rdd2.collect() > it fails at rdd2.collect() > so strange? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/about-rdd-filter-tp4657.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >