Hi Yitong,

It's not as simple as that.

In your very simple example, the only things referenced by the closure
are (i) the input arguments and (ii) a Scala "object". So there are no
external references to serialize in that case, just the closure
instance itself - see, there is still something being serialized, you
just don't see it.

What happens to the output of a "map" will depend on what other
transformations or actions you perform on the RDD it returns. So
saying "only its output will be serialized" is a question that cannot
be answered by just looking at that code.

I really suggest that if you're curious, you study the disassembled
bytecode, which is really not hard to understand. There are also
plenty of previous messages on this list that have covered this topic.


On Mon, Feb 9, 2015 at 7:56 PM, Yitong Zhou <timyit...@gmail.com> wrote:
> Hi Marcelo,
> Thanks for the explanation! So you mean in this way, actually only the
> output of the map closure would need to be serialized so that it could be
> passed further for other operations (maybe reduce or else)? And we don't
> have to worry about Utils.funcX because for each closure instance we would
> load a new instance containing the func1 and func2 from jars that are
> already cached into local nodes?
>
> Thanks,
> Yitong
>
> 2015-02-09 14:35 GMT-08:00 Marcelo Vanzin <van...@cloudera.com>:
>
>> `func1` and `func2` never get serialized. They must exist on the other
>> end in the form of a class loaded by the JVM.
>>
>> What gets serialized is an instance of a particular closure (the
>> argument to your "map" function). That's a separate class. The
>> instance of that class that is serialized contains references to all
>> other instances it needs to execute its "apply" method (or "run" or
>> whatever is the correct method name). In this case, nothing is needed,
>> since all it does is pass its argument in a call to a static method
>> (Util.func1).
>>
>> Hope that helps, these things can be really confusing. You can play
>> with "javap -c" to disassemble the class files to understand better
>> how it all happens under the hood.
>>
>>
>> On Mon, Feb 9, 2015 at 1:56 PM, Yitong Zhou <timyit...@gmail.com> wrote:
>> > If we define an Utils object:
>> >
>> > object Utils {
>> >   def func1 = {..}
>> >   def func2 = {..}
>> > }
>> >
>> > And then in a RDD we refer to one of the function:
>> >
>> > rdd.map{r => Utils.func1(r)}
>> >
>> > Will Utils.func2 also get serialized or not?
>> >
>> > Thanks,
>> > Yitong
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-user-list.1001560.n3.nabble.com/Will-Spark-serialize-an-entire-Object-or-just-the-method-referred-in-an-object-tp21566.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>>
>>
>> --
>> Marcelo
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to