Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-16 Thread Kenneth Knowles
+1 Ran the verification scripts. Caveats: - I input a GCS bucket that did not exist, expecting it to be created, so the Dataflow tests failed. - I also skipped the Python tests that asked to write to GitHub. - You also have not built, staged, & signed the Python wheels. It is a bit hidden in

Re: Python SDK timestamp precision

2019-04-16 Thread Kenneth Knowles
I am not so sure this is a good idea. Here are some systems and their precision: Arrow - microseconds BigQuery - microseconds New Java instant - nanoseconds Firestore - microseconds Protobuf - nanoseconds Dataflow backend - microseconds Postgresql - microseconds Pubsub publish time - nanoseconds

Python SDK timestamp precision

2019-04-16 Thread Thomas Weise
The Python SDK currently uses timestamps in microsecond resolution while Java SDK, as most would probably expect, uses milliseconds. This causes a few difficulties with portability (Python coders need to convert to millis for WindowedValue and Timers, which is related to a bug I'm looking into:

Re: Insufficient CPU quota in apache-beam-testing causes test flakes

2019-04-16 Thread Valentyn Tymofieiev
Thanks, Yifan. 1. It appears that there are 32 jenkins-related instances, 16 cores each, which consume over 2/3 of available CPU quota. 2. Among old VMs there are 6 1-core VMs, that look like "gke-io-datastores-*" and "gke-metrics-*". They don't consume much quota, but I am curious why do we have

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-16 Thread Thomas Weise
I opened https://github.com/apache/beam/pull/8319 to eliminate the duplicate yaml file (and cover timestamp coder for the Python SDK). Would appreciate if someone could take a look. (PR doesn't affect the StrUtf8Coder subject, but it is required to fix a timer bug.) Thanks, Thomas On Fri, Apr

Re: Insufficient CPU quota in apache-beam-testing causes test flakes

2019-04-16 Thread Yifan Zou
We recently created 16 compute instances for the Jenkins. Each one of them has 16 CPUs, means they consume 256 CPU in total. I guess that is why the CPU usage in us-central1 remains high. We're working on the migrating the rest of old Jenkins agents, and the old instances will be removed once

Insufficient CPU quota in apache-beam-testing causes test flakes

2019-04-16 Thread Valentyn Tymofieiev
FYI, I have recently observed a large amount of test failures in Beam test suites where Dataflow Jobs failed due to a lack of CPU quota in apache-beam-testing project. We have been adding new suites for Python 3.x versions, which may have contributed to this. problem. I have not investigated yet

Re: pickler.py issue with nested classes

2019-04-16 Thread Udi Meiri
Not sure: my case is using a nested class and the error is a stack overflow (or infinite recursion detection is triggered). It is odd though that they have the same workaround. smime.p7s Description: S/MIME Cryptographic Signature

Re: pickler.py issue with nested classes

2019-04-16 Thread Valentyn Tymofieiev
This looks very similar to https://github.com/uqfoundation/dill/issues/300, however we observed that bug on Python 3, and not on Python 2.7. On Tue, Apr 16, 2019 at 10:58 AM Udi Meiri wrote: > I was looking at migrating unit tests to pytest and found this test which > doesn't pass: >

Re: Removing :beam-website:testWebsite from gradle build target

2019-04-16 Thread Kyle Weaver
> it would be good to have a sort of weekly report on dead links Seeing as checking for broken external links returns a lot of false positives, I'd rather not spam everyone with them. However, I don't know if making it a postcommit will give it sufficient visibility. Not sure what the best way to

pickler.py issue with nested classes

2019-04-16 Thread Udi Meiri
I was looking at migrating unit tests to pytest and found this test which doesn't pass: https://gist.github.com/udim/a71fcb278b56a9a5b7962f4588e14efb (stack overflow) (requires installing python3.7 and "python3.7 -m pip install pytest".) The same command passes with python2.7 and python3.5. I

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-16 Thread Kenneth Knowles
On Tue, Apr 16, 2019 at 9:18 AM Reuven Lax wrote: > A common request (especially in streaming) is to support sorting values by > timestamp, not by the full value. > On this point, I think an explicit secondary key probably addresses the need. Naively implemented, the "sort by values" use case

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-16 Thread Reuven Lax
This is a good conversation. Some things to consider: Since Beam is cross language, the "shufflers" can usually only sort by binary value. This is different than other systems where custom comparators can be used for sorting. We might need to introduce OrderPreservingCoder, and mark the coders

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-16 Thread Kenneth Knowles
1. This is clearly useful, and extensively used. Agree with all that. I think it can work for batch and streaming equally well if sorting is required only per "pane", though I might be overlooking something. 2. A transform need not be primitive to be well-defined and executed in a special way by

Re: [ANNOUNCE] New committer announcement: Boyuan Zhang

2019-04-16 Thread Gleb Kanterov
Congratulations! On Sat, Apr 13, 2019 at 12:53 AM Thomas Weise wrote: > Congrats! > > > On Thu, Apr 11, 2019 at 6:03 PM Reuven Lax wrote: > >> Congratulations Boyuan! >> >> On Thu, Apr 11, 2019 at 4:53 PM Ankur Goenka wrote: >> >>> Congrats Boyuan! >>> >>> On Thu, Apr 11, 2019 at 4:52 PM Mark

Re: Removing :beam-website:testWebsite from gradle build target

2019-04-16 Thread Ismaël Mejía
+1 to removing link validation for website changes. However it would be good to have a sort of weekly report on dead links or another alternative to be aware of them. On Tue, Apr 16, 2019 at 2:43 AM Kyle Weaver wrote: > I agree with Andrew that the external links checks are ultra-flaky and >