I would like to point out that the Java code has been around a lot longer
and has had more time to be optimized while Python has been much more
recent and is still having lots of changes with much larger improvements in
performance. That gap between Python and Java has been steadily decreasing
over the past couple of months.

On Fri, Nov 18, 2016 at 11:42 AM, Matthias Baetens <
[email protected]> wrote:

> Hi Apache Beam users!
>
> The last months I played around a bit with Google Dataflow/Apache Beam
> (first in Java and lately in Python as well).
>
> This week I did a quick implementation of the same pipeline in both Java
> and Python involving some processing (String operations and int operations)
> and a GroupBy using a Accumulator.
>
> When running the pipeline on Google Cloud,  the Java pipeline performed
> 4-5 times faster than the Python pipeline. Now, this probably makes sense
> since Python is in general slower than Java, but I was wondering if there
> is more to it and how I could potentially profile the pipelines in a
> (semi)-scientific way... Maybe some of you have thoughts/input or had
> similar experiences? Happy to hear your input!
>
> Best regards,
>
> Matthias
>

Reply via email to