Re: Poor Python 3.x performance on Dataflow?

2020-02-12 Thread Valentyn Tymofieiev
To close the loop here, the regression reported here is not specific to
Beam or Dataflow. The difference in performance is caused by a 'regression'
in the deprecated numpy random number generator, which we use to generate
synthetic input for the load test pipeline.  Since new releases of numpy
don't support Python 2, our Py2 tests are using a different, older, numpy
version where  that generator happens to perform faster.

You can follow BEAM-9085 for further details.

On Fri, Jan 10, 2020 at 9:26 AM Valentyn Tymofieiev 
wrote:

> Thanks, Kamil. I self-assigned the issue, but if anyone else is
> interested, feel free to take a look in parallel and post your findings on
> the Jira.
>
> On Fri, Jan 10, 2020 at 4:29 AM Kamil Wasilewski <
> kamil.wasilew...@polidea.com> wrote:
>
>> Our first Python3 performance test has just been implemented and we have
>> just started gathering results. Here[1] you can find dashboards with a
>> side-by-side comparison.
>> I also opened a Jira ticket to investigate the difference [2]. Anyone,
>> please feel free to assign it to yourself.
>>
>> Thanks,
>> Kamil
>>
>> [1]
>> https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536
>> [2] https://issues.apache.org/jira/browse/BEAM-9085
>>
>> On Mon, Dec 9, 2019 at 8:38 PM Valentyn Tymofieiev 
>> wrote:
>>
>>> For now we should run Py3 and Py2 tests alongside each other to get a
>>> side-by-side comparison. I suggest we open a Jira ticket to investigate the
>>> difference in performance . We have limited performance test coverage on
>>> Python 3 in Beam, so more Py3 tests would help a lot here, thanks for
>>> adding them.
>>>
>>> On Fri, Dec 6, 2019 at 9:43 AM Robert Bradshaw 
>>> wrote:
>>>
 This is very surprising--I would expect the times to quite similar. Do
 you have profiles for where the (difference in) time is spent? With
 differences like these, I wonder if there are issues with container
 setup (e.g. some things not being installed or cached) for Python 3.

 On Fri, Dec 6, 2019 at 9:06 AM Kamil Wasilewski
  wrote:
 >
 > Hi all,
 >
 > Python 2.7 won't be maintained past 2020 and that's why we want to
 migrate all Python performance tests in Beam from Python 2.7 to Python 3.7.
 However, I was surprised by seeing that after switching Dataflow tests to
 Python 3.x they are a few times slower. For example, the same ParDo test
 that takes approx. 8 minutes to run on Python 2.7 needs approx. 21 minutes
 on Python 3.x. You can find all the results I gathered and the setup here.
 >
 > Do you know any possible reason for this? This issue makes it
 impossible to do the migration, because of the limited resources on Jenkins
 (almost every job would be aborted).
 >
 > Thanks,
 > Kamil

>>>


Re: Poor Python 3.x performance on Dataflow?

2020-01-10 Thread Valentyn Tymofieiev
Thanks, Kamil. I self-assigned the issue, but if anyone else is interested,
feel free to take a look in parallel and post your findings on the Jira.

On Fri, Jan 10, 2020 at 4:29 AM Kamil Wasilewski <
kamil.wasilew...@polidea.com> wrote:

> Our first Python3 performance test has just been implemented and we have
> just started gathering results. Here[1] you can find dashboards with a
> side-by-side comparison.
> I also opened a Jira ticket to investigate the difference [2]. Anyone,
> please feel free to assign it to yourself.
>
> Thanks,
> Kamil
>
> [1]
> https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536
> [2] https://issues.apache.org/jira/browse/BEAM-9085
>
> On Mon, Dec 9, 2019 at 8:38 PM Valentyn Tymofieiev 
> wrote:
>
>> For now we should run Py3 and Py2 tests alongside each other to get a
>> side-by-side comparison. I suggest we open a Jira ticket to investigate the
>> difference in performance . We have limited performance test coverage on
>> Python 3 in Beam, so more Py3 tests would help a lot here, thanks for
>> adding them.
>>
>> On Fri, Dec 6, 2019 at 9:43 AM Robert Bradshaw 
>> wrote:
>>
>>> This is very surprising--I would expect the times to quite similar. Do
>>> you have profiles for where the (difference in) time is spent? With
>>> differences like these, I wonder if there are issues with container
>>> setup (e.g. some things not being installed or cached) for Python 3.
>>>
>>> On Fri, Dec 6, 2019 at 9:06 AM Kamil Wasilewski
>>>  wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Python 2.7 won't be maintained past 2020 and that's why we want to
>>> migrate all Python performance tests in Beam from Python 2.7 to Python 3.7.
>>> However, I was surprised by seeing that after switching Dataflow tests to
>>> Python 3.x they are a few times slower. For example, the same ParDo test
>>> that takes approx. 8 minutes to run on Python 2.7 needs approx. 21 minutes
>>> on Python 3.x. You can find all the results I gathered and the setup here.
>>> >
>>> > Do you know any possible reason for this? This issue makes it
>>> impossible to do the migration, because of the limited resources on Jenkins
>>> (almost every job would be aborted).
>>> >
>>> > Thanks,
>>> > Kamil
>>>
>>


Re: Poor Python 3.x performance on Dataflow?

2020-01-10 Thread Kamil Wasilewski
Our first Python3 performance test has just been implemented and we have
just started gathering results. Here[1] you can find dashboards with a
side-by-side comparison.
I also opened a Jira ticket to investigate the difference [2]. Anyone,
please feel free to assign it to yourself.

Thanks,
Kamil

[1]
https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536
[2] https://issues.apache.org/jira/browse/BEAM-9085

On Mon, Dec 9, 2019 at 8:38 PM Valentyn Tymofieiev 
wrote:

> For now we should run Py3 and Py2 tests alongside each other to get a
> side-by-side comparison. I suggest we open a Jira ticket to investigate the
> difference in performance . We have limited performance test coverage on
> Python 3 in Beam, so more Py3 tests would help a lot here, thanks for
> adding them.
>
> On Fri, Dec 6, 2019 at 9:43 AM Robert Bradshaw 
> wrote:
>
>> This is very surprising--I would expect the times to quite similar. Do
>> you have profiles for where the (difference in) time is spent? With
>> differences like these, I wonder if there are issues with container
>> setup (e.g. some things not being installed or cached) for Python 3.
>>
>> On Fri, Dec 6, 2019 at 9:06 AM Kamil Wasilewski
>>  wrote:
>> >
>> > Hi all,
>> >
>> > Python 2.7 won't be maintained past 2020 and that's why we want to
>> migrate all Python performance tests in Beam from Python 2.7 to Python 3.7.
>> However, I was surprised by seeing that after switching Dataflow tests to
>> Python 3.x they are a few times slower. For example, the same ParDo test
>> that takes approx. 8 minutes to run on Python 2.7 needs approx. 21 minutes
>> on Python 3.x. You can find all the results I gathered and the setup here.
>> >
>> > Do you know any possible reason for this? This issue makes it
>> impossible to do the migration, because of the limited resources on Jenkins
>> (almost every job would be aborted).
>> >
>> > Thanks,
>> > Kamil
>>
>


Re: Poor Python 3.x performance on Dataflow?

2019-12-09 Thread Valentyn Tymofieiev
For now we should run Py3 and Py2 tests alongside each other to get a
side-by-side comparison. I suggest we open a Jira ticket to investigate the
difference in performance . We have limited performance test coverage on
Python 3 in Beam, so more Py3 tests would help a lot here, thanks for
adding them.

On Fri, Dec 6, 2019 at 9:43 AM Robert Bradshaw  wrote:

> This is very surprising--I would expect the times to quite similar. Do
> you have profiles for where the (difference in) time is spent? With
> differences like these, I wonder if there are issues with container
> setup (e.g. some things not being installed or cached) for Python 3.
>
> On Fri, Dec 6, 2019 at 9:06 AM Kamil Wasilewski
>  wrote:
> >
> > Hi all,
> >
> > Python 2.7 won't be maintained past 2020 and that's why we want to
> migrate all Python performance tests in Beam from Python 2.7 to Python 3.7.
> However, I was surprised by seeing that after switching Dataflow tests to
> Python 3.x they are a few times slower. For example, the same ParDo test
> that takes approx. 8 minutes to run on Python 2.7 needs approx. 21 minutes
> on Python 3.x. You can find all the results I gathered and the setup here.
> >
> > Do you know any possible reason for this? This issue makes it impossible
> to do the migration, because of the limited resources on Jenkins (almost
> every job would be aborted).
> >
> > Thanks,
> > Kamil
>


Re: Poor Python 3.x performance on Dataflow?

2019-12-06 Thread Robert Bradshaw
This is very surprising--I would expect the times to quite similar. Do
you have profiles for where the (difference in) time is spent? With
differences like these, I wonder if there are issues with container
setup (e.g. some things not being installed or cached) for Python 3.

On Fri, Dec 6, 2019 at 9:06 AM Kamil Wasilewski
 wrote:
>
> Hi all,
>
> Python 2.7 won't be maintained past 2020 and that's why we want to migrate 
> all Python performance tests in Beam from Python 2.7 to Python 3.7. However, 
> I was surprised by seeing that after switching Dataflow tests to Python 3.x 
> they are a few times slower. For example, the same ParDo test that takes 
> approx. 8 minutes to run on Python 2.7 needs approx. 21 minutes on Python 
> 3.x. You can find all the results I gathered and the setup here.
>
> Do you know any possible reason for this? This issue makes it impossible to 
> do the migration, because of the limited resources on Jenkins (almost every 
> job would be aborted).
>
> Thanks,
> Kamil