Hi Jason,How/what are you going to benchmark?I have been doing it for 
sometime.Want to make sure I know the objective gaps, if there is 
any.ThanksAmir-

      From: Jason Kuster <[email protected]>
 To: [email protected] 
 Sent: Friday, November 18, 2016 11:28 AM
 Subject: Re: Apache Beam Java vs Python performance on Google Cloud
   
Hi Matthias!
Glad to hear you're interested in performance. I've been doing some 
investigation into benchmarking Beam over the last couple of weeks and I'm 
getting fairly close to having something I think will be workable, probably 
next week or the week after. I'm very interested in hearing opinions from the 
community (I solicited feedback from the dev list a few weeks ago but neglected 
to include user@), so I'd love to hear any thoughts you have.
Best,
Jason
On Fri, Nov 18, 2016 at 11:11 AM, Lukasz Cwik <[email protected]> wrote:

I would like to point out that the Java code has been around a lot longer and 
has had more time to be optimized while Python has been much more recent and is 
still having lots of changes with much larger improvements in performance. That 
gap between Python and Java has been steadily decreasing over the past couple 
of months.
On Fri, Nov 18, 2016 at 11:42 AM, Matthias Baetens <matthias.baetens@datatonic. 
com> wrote:

Hi Apache Beam users!
The last months I played around a bit with Google Dataflow/Apache Beam (first 
in Java and lately in Python as well).
This week I did a quick implementation of the same pipeline in both Java and 
Python involving some processing (String operations and int operations) and a 
GroupBy using a Accumulator.
When running the pipeline on Google Cloud,  the Java pipeline performed 4-5 
times faster than the Python pipeline. Now, this probably makes sense since 
Python is in general slower than Java, but I was wondering if there is more to 
it and how I could potentially profile the pipelines in a (semi)-scientific 
way... Maybe some of you have thoughts/input or had similar experiences? Happy 
to hear your input!
Best regards,
Matthias





-- 
-------Jason KusterApache Beam (Incubating) / Google Cloud Dataflow

   

Reply via email to