To my knowledge we haven't compared the cost of the "dill/pickle/..." coder to Java's SerializableCoder but even then you always have the power to write your own coders if you don't believe the default coders perform well in Python.
Note that a lot of the Beam Python coders use cython to go fast so it may be less of a concern then you think. But please try it out and report any perf issues that you discover since they can be fixed within the Python SDK. On Mon, Oct 14, 2019 at 6:52 AM Shannon Duncan <joseph.dun...@liveramp.com> wrote: > Has anyone done any testing around the performance difference of Python > SDK vs Java SDK on Google Dataflow? > > We recently dropped our requirement for sequence files in our pipeline > which opens the door to using the python SDK vs the Java SDK. But my > concern is loss of performance. > > In Java we control our serialization very carefully between pipeline items > and my fear is loosing control of that in Python, so I'm curious about the > speed of serialization of generic python items like dictionaries, lists, > tuples, etc in context of dataflow. > > Thanks! > Shannon Duncan >