Hi Matthias!

Glad to hear you're interested in performance. I've been doing some
investigation into benchmarking Beam over the last couple of weeks and I'm
getting fairly close to having something I think will be workable, probably
next week or the week after. I'm very interested in hearing opinions from
the community (I solicited feedback from the dev list a few weeks ago but
neglected to include user@), so I'd love to hear any thoughts you have.

Best,

Jason

On Fri, Nov 18, 2016 at 11:11 AM, Lukasz Cwik <[email protected]> wrote:

> I would like to point out that the Java code has been around a lot longer
> and has had more time to be optimized while Python has been much more
> recent and is still having lots of changes with much larger improvements in
> performance. That gap between Python and Java has been steadily decreasing
> over the past couple of months.
>
> On Fri, Nov 18, 2016 at 11:42 AM, Matthias Baetens <
> [email protected]> wrote:
>
>> Hi Apache Beam users!
>>
>> The last months I played around a bit with Google Dataflow/Apache Beam
>> (first in Java and lately in Python as well).
>>
>> This week I did a quick implementation of the same pipeline in both Java
>> and Python involving some processing (String operations and int operations)
>> and a GroupBy using a Accumulator.
>>
>> When running the pipeline on Google Cloud,  the Java pipeline performed
>> 4-5 times faster than the Python pipeline. Now, this probably makes sense
>> since Python is in general slower than Java, but I was wondering if there
>> is more to it and how I could potentially profile the pipelines in a
>> (semi)-scientific way... Maybe some of you have thoughts/input or had
>> similar experiences? Happy to hear your input!
>>
>> Best regards,
>>
>> Matthias
>>
>
>


-- 
-------
Jason Kuster
Apache Beam (Incubating) / Google Cloud Dataflow

Reply via email to