Re: Python vs Java SDK Performance

Luke Cwik Wed, 30 Oct 2019 12:36:59 -0700

To my knowledge we haven't compared the cost of the "dill/pickle/..." coder
to Java's SerializableCoder but even then you always have the power to
write your own coders if you don't believe the default coders perform well
in Python.

Note that a lot of the Beam Python coders use cython to go fast so it may
be less of a concern then you think.

But please try it out and report any perf issues that you discover since
they can be fixed within the Python SDK.

On Mon, Oct 14, 2019 at 6:52 AM Shannon Duncan <joseph.dun...@liveramp.com>
wrote:

> Has anyone done any testing around the performance difference of Python
> SDK vs Java SDK on Google Dataflow?
>
> We recently dropped our requirement for sequence files in our pipeline
> which opens the door to using the python SDK vs the Java SDK. But my
> concern is loss of performance.
>
> In Java we control our serialization very carefully between pipeline items
> and my fear is loosing control of that in Python, so I'm curious about the
> speed of serialization of generic python items like dictionaries, lists,
> tuples, etc in context of dataflow.
>
> Thanks!
> Shannon Duncan
>

Re: Python vs Java SDK Performance

Reply via email to