date:20191126

yeah, I've excised both test_requires and setup_requires in my test
simplification PR:  https://github.com/apache/beam/pull/10038

I'm happy to see those go sooner rather than later, as it'll reduce the
scope of my PR.  The rest of my PR is about ensuring that build
dependencies like cython and grpc are available at "build" time (i.e. when
setup.py gets called), and the modern solution for this is a
pep517/518-compliant build system, of which tox is one.

-chad



On Tue, Nov 26, 2019 at 6:39 PM Udi Meiri  wrote:

> I'm not sure about where the error with the simplegeneric, timeloop .eggs
> directories come from,
> but I did figure out that they don't get installed as eggs if you add them
> to the "test" extras in setup.py, e.g.:
>
> extras_require={
> 'docs': ['Sphinx>=1.5.2,<2.0'],
> 'test': REQUIRED_TEST_PACKAGES + INTERACTIVE_BEAM,
> 'gcp': GCP_REQUIREMENTS,
> 'interactive': INTERACTIVE_BEAM,
> },
>
> This is further proof of the wisdom of the pytest-runner deprecation
> notice  (emphasis mine):
> """
> Remove ‘pytest’ and any other testing requirements from ‘*tests_require*’,
> preferably removing the setup_requires option.
> """
>
> I believe we don't rely on the tests_require definition. Removing it might
> break developers running "python setup.py test", but the alternative is a
> simple "setup.py && pip install".
>
>
> On Tue, Nov 26, 2019 at 5:14 PM Chad Dombrova  wrote:
>
>> Sorry wrong link:  https://github.com/apache/beam/pull/9915
>>
>>
>>
>> On Tue, Nov 26, 2019 at 5:12 PM Udi Meiri  wrote:
>>
>>> I looked at #9959 but it doesn't seem to modify setup.py?
>>> The additional eggs for timeloop etc. are troubling though. Not sure
>>> where those come from.
>>>
>>> On Tue, Nov 26, 2019 at 4:59 PM Chad Dombrova  wrote:
>>>
 Is setup_requires being used somewhere else, because I'm still getting
 errors after removing it from sdks/python/setup.py.

 I removed it from this PR: https://github.com/apache/beam/pull/9959

 Here's the gradle scan:
 https://scans.gradle.com/s/oinh5xpaly3dk/failure#top=0

 The error shows up differently than before when
 setup_requries=['pytest-runner'] was present -- it's in a gradle traceback
 now rather than the console log.  I've also seen different packages listed
 as the culprit (simplegeneric, timeloop).

 -chad



 On Tue, Nov 26, 2019 at 4:47 PM Udi Meiri  wrote:

> Chad, I believe the answer is the "setup_requires" line is causing the
> sdks/python/.eggs directory to be created.
>
> This command fails with the setup_requires line (same Errno 17), but
> succeeds without it:
> $ \rm -r .eggs/; ../../gradlew installGcpTest
> [~8 failed tasks]
> $ ls .eggs
> pytest_runner-5.2-py2.7.egg  pytest_runner-5.2-py3.5.egg
>  pytest_runner-5.2-py3.6.egg  pytest_runner-5.2-py3.7.egg  README.txt
>
> I'll go ahead and create a PR to remove setup_requires from setup.py.
>
> On Tue, Nov 26, 2019 at 4:16 PM Chad Dombrova 
> wrote:
>
>> It seems like the offending packages are those that only have source
>> distributions (i.e. no wheels).  But why are the eggs being installed in
>> sdks/python/.eggs instead of into the virtualenv created by 
>> setupVirtualenv
>> gradle task or by tox?
>>
>>
>> On Tue, Nov 26, 2019 at 3:59 PM Udi Meiri  wrote:
>>
>>> Basically, I believe what's happening is that a new Gradle task was
>>> added that uses setup.py but doesn't have the same dependency on some 
>>> main
>>> setup.py task that all others depend on (list sdist).
>>>
>>> On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri  wrote:
>>>
 Correction: the error is not gone after removing the line. I get
 instead:
 error: [Errno 17] File exists:
 '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'


 On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:

> I managed to recreate one of the issues with this command:
> ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do
> echo "python setup.py -q nosetests --tests
> apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh 
> ; done
>
> This reliably gives me:
> OSError: [Errno 17] File exists:
> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'
>
> If I remove this line from setup.py the error is gone:
>   setup_requires=['pytest_runner'],
>
>
> On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova 
> wrote:
>
>> Thanks for looking into this. It seems like it might be something
>> to do with data that is cached on the Jenkins slaves between runs, 
>> which
>> may be what prevents this from showing up

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Chamikara Jayalath

On Tue, Nov 26, 2019 at 6:17 PM Reza Rokni  wrote:

> Hi Alexey,
>
> With regards to @Experimental there are a couple of discussions around its
> usage ( or rather over usage! ) on dev@. It is something that we need to
> clean up ( some of those IO are now being used on production env for
> years!).
>

I agree that we should move some IO connectors out of experimental state
and probably this should be a separate discussion. Also, this issue is
probably more than for IO connectors since there are other parts of the
code that is marked as experimental as well, sometimes for a good reason
(for example, SDF).



>
> Cheers
>
> Reza
>
> On Wed, 27 Nov 2019 at 04:50, Luke Cwik  wrote:
>
>> I suggested the wrapper because sometimes the intent of the APIs can be
>> translated easily but this is not always the case.
>>
>> Good to know that it is all marked @Experimental.
>>
>> On Tue, Nov 26, 2019 at 12:30 PM Cam Mach  wrote:
>>
>>> Thank you, Alex for sharing the information, and Luke for the questions.
>>> I like the idea that just depreciate the V1 IOs, and just maintain V2
>>> IOs, so we can support whoever want continue with V1.
>>> Just as Alex said, a lot of users, including my teams :-) , use the V1
>>> IOs in production for real workload. So it'll be hard to remove V1 IOs or
>>> force them migrate to V2. But let hear if there are any other ideas?
>>>
>>> Btw, making V1 a wrapper around V2 is not very positive, code will get
>>> more complicated since V2 API is very different from V1's.
>>>
>>> Thanks,
>>>
>>>
>>>
>>> On Tue, Nov 26, 2019 at 8:21 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 AFAICT, all AWS SDK V1 IOs (SnsIO, SqsIO, DynamoDBIO, KinesisIO) are
 marked as "Experimental". So, it should not be a problem to gracefully
 deprecate and finally remove them. We already did the similar procedure for
 “HadoopInputFormatIO”, which was renamed to just “HadoopFormatIO” (since it
 started to support HadoopOutputFormatI as well). Old “HadoopInputFormatIO”
 was deprecated and removed after *3 consecutive* Beam releases (as we
 agreed on mailing list).

 In the same time, some users for some reasons would not be able or to
 want to move on AWS SDK V2. So, I’d prefer to just deprecate AWS SDK V1 IOs
 and accept new features/fixes *only* for V2 IOs.

>>>
+1 for deprecating AWS V1 IO connectors as opposed to removing as well
unless we can confirm that usage is extremely limited.


>
 Talking about “Experimental” annotation. Sorry in advance If I missed
 that and switch a subject a bit, but do we have clear rules or an agreement
 when IO becomes stable and should not be marked as experimental anymore?
 *Most* of our Java IOs are marked as Experimental but many of them
 were using in production by real users under real load. Does it mean that
 they are ready to be stable in terms of API? Perhaps, this topic deserves a
 new discussion if there are several opinions on that.

>>>
Probably, decision to move component APIs (for example, an IO connector)
out of experimental state should be done on a case-by-case basis.

Thanks,
Cham


>
 On 26 Nov 2019, at 00:39, Luke Cwik  wrote:

 Phase I sounds fine.

 Apache Beam follows semantic versioning and I believe removing the IOs
 will be a backwards incompatible change unless they were marked
 experimental which will be a problem for Phase 2.

 What is the feasibility of making the V1 transforms wrappers around V2?

 On Mon, Nov 25, 2019 at 1:46 PM Cam Mach  wrote:

> Hello Beam Devs,
>
> I have been working on the migration of Amazon Web Services IO
> connectors into the new AWS SDK for Java V2. The goal is to have an 
> updated
> implementation aligned with the most recent AWS improvements. So far we
> have already migrated the connectors for AWS SNS, SQS and  DynamoDB.
>
> In the meantime some contributions are still going on V1 IOs. So far
> we have dealt with those by porting (or asking contributors) to port the
> changes into V2 IOs too because we don’t want features of both versions to
> be unaligned but this may quickly become a maintenance issue, so we want 
> to
> discuss a plan to stop supporting (deprecate) V1 IOs and encourage users 
> to
> move to V2.
>
> Phase I (ASAP):
>
>- Mark migrated AWS V1 IOs as deprecated
>- Document migration path to V2
>
> Phase II (end of 2020):
>
>- Decide a date or Beam release to remove the V1 IOs
>- Send a notification to the community 3 months before we remove
>them
>- Completely get rid of V1 IOs
>
>
> Please let me know what you think or if you see any potential issues?
>
> Thanks,
> Cam Mach
>
>

>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please

Re: cython test instability

I'm not sure about where the error with the simplegeneric, timeloop .eggs
directories come from,
but I did figure out that they don't get installed as eggs if you add them
to the "test" extras in setup.py, e.g.:

extras_require={
'docs': ['Sphinx>=1.5.2,<2.0'],
'test': REQUIRED_TEST_PACKAGES + INTERACTIVE_BEAM,
'gcp': GCP_REQUIREMENTS,
'interactive': INTERACTIVE_BEAM,
},

This is further proof of the wisdom of the pytest-runner deprecation notice
 (emphasis mine):
"""
Remove ‘pytest’ and any other testing requirements from ‘*tests_require*’,
preferably removing the setup_requires option.
"""

I believe we don't rely on the tests_require definition. Removing it might
break developers running "python setup.py test", but the alternative is a
simple "setup.py && pip install".


On Tue, Nov 26, 2019 at 5:14 PM Chad Dombrova  wrote:

> Sorry wrong link:  https://github.com/apache/beam/pull/9915
>
>
>
> On Tue, Nov 26, 2019 at 5:12 PM Udi Meiri  wrote:
>
>> I looked at #9959 but it doesn't seem to modify setup.py?
>> The additional eggs for timeloop etc. are troubling though. Not sure
>> where those come from.
>>
>> On Tue, Nov 26, 2019 at 4:59 PM Chad Dombrova  wrote:
>>
>>> Is setup_requires being used somewhere else, because I'm still getting
>>> errors after removing it from sdks/python/setup.py.
>>>
>>> I removed it from this PR: https://github.com/apache/beam/pull/9959
>>>
>>> Here's the gradle scan:
>>> https://scans.gradle.com/s/oinh5xpaly3dk/failure#top=0
>>>
>>> The error shows up differently than before when
>>> setup_requries=['pytest-runner'] was present -- it's in a gradle traceback
>>> now rather than the console log.  I've also seen different packages listed
>>> as the culprit (simplegeneric, timeloop).
>>>
>>> -chad
>>>
>>>
>>>
>>> On Tue, Nov 26, 2019 at 4:47 PM Udi Meiri  wrote:
>>>
 Chad, I believe the answer is the "setup_requires" line is causing the
 sdks/python/.eggs directory to be created.

 This command fails with the setup_requires line (same Errno 17), but
 succeeds without it:
 $ \rm -r .eggs/; ../../gradlew installGcpTest
 [~8 failed tasks]
 $ ls .eggs
 pytest_runner-5.2-py2.7.egg  pytest_runner-5.2-py3.5.egg
  pytest_runner-5.2-py3.6.egg  pytest_runner-5.2-py3.7.egg  README.txt

 I'll go ahead and create a PR to remove setup_requires from setup.py.

 On Tue, Nov 26, 2019 at 4:16 PM Chad Dombrova 
 wrote:

> It seems like the offending packages are those that only have source
> distributions (i.e. no wheels).  But why are the eggs being installed in
> sdks/python/.eggs instead of into the virtualenv created by 
> setupVirtualenv
> gradle task or by tox?
>
>
> On Tue, Nov 26, 2019 at 3:59 PM Udi Meiri  wrote:
>
>> Basically, I believe what's happening is that a new Gradle task was
>> added that uses setup.py but doesn't have the same dependency on some 
>> main
>> setup.py task that all others depend on (list sdist).
>>
>> On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri  wrote:
>>
>>> Correction: the error is not gone after removing the line. I get
>>> instead:
>>> error: [Errno 17] File exists:
>>> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'
>>>
>>>
>>> On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:
>>>
 I managed to recreate one of the issues with this command:
 ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo
 "python setup.py -q nosetests --tests
 apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; 
 done

 This reliably gives me:
 OSError: [Errno 17] File exists:
 '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'

 If I remove this line from setup.py the error is gone:
   setup_requires=['pytest_runner'],


 On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova 
 wrote:

> Thanks for looking into this. It seems like it might be something
> to do with data that is cached on the Jenkins slaves between runs, 
> which
> may be what prevents this from showing up locally?
>
> If your theory about setuptools is correct, and it sounds likely,
> we should be able to lock down the version, which we should 
> definitely be
> doing for all of our dependencies.
>
> -chad
>
>
>
> On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay 
> wrote:
>
>> I tried to debug but did not make much progress. I cannot
>> reproduce locally, however all python precommits and postcommits are
>> failing.
>>
>> One guess is, setuptools released a new version that does not
>> support eggs a few

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Reza Rokni

Hi Alexey,

With regards to @Experimental there are a couple of discussions around its
usage ( or rather over usage! ) on dev@. It is something that we need to
clean up ( some of those IO are now being used on production env for
years!).

Cheers

Reza

On Wed, 27 Nov 2019 at 04:50, Luke Cwik  wrote:

> I suggested the wrapper because sometimes the intent of the APIs can be
> translated easily but this is not always the case.
>
> Good to know that it is all marked @Experimental.
>
> On Tue, Nov 26, 2019 at 12:30 PM Cam Mach  wrote:
>
>> Thank you, Alex for sharing the information, and Luke for the questions.
>> I like the idea that just depreciate the V1 IOs, and just maintain V2
>> IOs, so we can support whoever want continue with V1.
>> Just as Alex said, a lot of users, including my teams :-) , use the V1
>> IOs in production for real workload. So it'll be hard to remove V1 IOs or
>> force them migrate to V2. But let hear if there are any other ideas?
>>
>> Btw, making V1 a wrapper around V2 is not very positive, code will get
>> more complicated since V2 API is very different from V1's.
>>
>> Thanks,
>>
>>
>>
>> On Tue, Nov 26, 2019 at 8:21 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> AFAICT, all AWS SDK V1 IOs (SnsIO, SqsIO, DynamoDBIO, KinesisIO) are
>>> marked as "Experimental". So, it should not be a problem to gracefully
>>> deprecate and finally remove them. We already did the similar procedure for
>>> “HadoopInputFormatIO”, which was renamed to just “HadoopFormatIO” (since it
>>> started to support HadoopOutputFormatI as well). Old “HadoopInputFormatIO”
>>> was deprecated and removed after *3 consecutive* Beam releases (as we
>>> agreed on mailing list).
>>>
>>> In the same time, some users for some reasons would not be able or to
>>> want to move on AWS SDK V2. So, I’d prefer to just deprecate AWS SDK V1 IOs
>>> and accept new features/fixes *only* for V2 IOs.
>>>
>>> Talking about “Experimental” annotation. Sorry in advance If I missed
>>> that and switch a subject a bit, but do we have clear rules or an agreement
>>> when IO becomes stable and should not be marked as experimental anymore?
>>> *Most* of our Java IOs are marked as Experimental but many of them were
>>> using in production by real users under real load. Does it mean that they
>>> are ready to be stable in terms of API? Perhaps, this topic deserves a new
>>> discussion if there are several opinions on that.
>>>
>>> On 26 Nov 2019, at 00:39, Luke Cwik  wrote:
>>>
>>> Phase I sounds fine.
>>>
>>> Apache Beam follows semantic versioning and I believe removing the IOs
>>> will be a backwards incompatible change unless they were marked
>>> experimental which will be a problem for Phase 2.
>>>
>>> What is the feasibility of making the V1 transforms wrappers around V2?
>>>
>>> On Mon, Nov 25, 2019 at 1:46 PM Cam Mach  wrote:
>>>
 Hello Beam Devs,

 I have been working on the migration of Amazon Web Services IO
 connectors into the new AWS SDK for Java V2. The goal is to have an updated
 implementation aligned with the most recent AWS improvements. So far we
 have already migrated the connectors for AWS SNS, SQS and  DynamoDB.

 In the meantime some contributions are still going on V1 IOs. So far we
 have dealt with those by porting (or asking contributors) to port the
 changes into V2 IOs too because we don’t want features of both versions to
 be unaligned but this may quickly become a maintenance issue, so we want to
 discuss a plan to stop supporting (deprecate) V1 IOs and encourage users to
 move to V2.

 Phase I (ASAP):

- Mark migrated AWS V1 IOs as deprecated
- Document migration path to V2

 Phase II (end of 2020):

- Decide a date or Beam release to remove the V1 IOs
- Send a notification to the community 3 months before we remove
them
- Completely get rid of V1 IOs

 Please let me know what you think or if you see any potential issues?

 Thanks,
 Cam Mach

>>>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: [discuss] Using a logger hierarchy in Python

Ah I'll try to add this tomorrow before going out for the weekend.
-P.

On Wed, Nov 20, 2019 at 12:15 PM Valentyn Tymofieiev 
wrote:

> Based on my recent debugging experience for
> https://issues.apache.org/jira/browse/BEAM-8651, I think it may be
> helpful to include thread IDs, into the log entries, or have an option to
> easily enable this. I imagine that having process ID may also be helpful in
> other situations.
>
> On Tue, Nov 19, 2019 at 11:17 AM Chad Dombrova  wrote:
>
>> Pablo, it might be necessary to setup a root logging handler if one does
>> not exist already.  I noticed that a LocalJobServicer that I was testing
>> against stopped emitting tracebacks when I rebased onto the latest from
>> master.  Setting up the root handler fixed it.  I'm still testing this, and
>> I might be misinterpreting what I saw, but I wanted to get eyes on it in
>> case I don't have time to get a definitive answer.
>>
>> -chad
>>
>>
>>
>> On Fri, Nov 15, 2019 at 4:30 PM Pablo Estrada  wrote:
>>
>>> Thanks all,
>>> 2/3 of PRs are merged (using _LOGGER). It should be pretty easy to
>>> switch the variable name to _log via sed.
>>> Best
>>> -P.
>>>
>>> On Fri, Nov 15, 2019 at 2:08 PM Kyle Weaver  wrote:
>>>
 +1 for per-module loggers (what Robert said).

 On Fri, Nov 15, 2019 at 1:48 PM Udi Meiri  wrote:

> +1, but can we use something less verbose and shift key heavy than
> _LOGGER like log or _log?
>
> Also please dedupe with these existing bugs:
> https://issues.apache.org/jira/browse/BEAM-3523
> https://issues.apache.org/jira/browse/BEAM-1825
>
> On Thu, Nov 14, 2019 at 8:02 AM Thomas Weise  wrote:
>
>> Awesome, thanks Chad!
>>
>> On Wed, Nov 13, 2019 at 10:26 PM Chad Dombrova 
>> wrote:
>>
>>> Hi Thomas,
>>>
>>>
 Will this include the ability for users to configure logging via
 pipeline options?

>>>
>>> We're working on a proposal to allow pluggable logging handlers that
>>> can be configured via pipeline options.  For example, it would allow 
>>> you to
>>> add a new logging handler for StackDriver or Elasticsearch.  Will 
>>> hopefully
>>> have a document to share soon.
>>>
>>> -chad
>>>
>>>

Re: cython test instability

Sorry wrong link:  https://github.com/apache/beam/pull/9915



On Tue, Nov 26, 2019 at 5:12 PM Udi Meiri  wrote:

> I looked at #9959 but it doesn't seem to modify setup.py?
> The additional eggs for timeloop etc. are troubling though. Not sure where
> those come from.
>
> On Tue, Nov 26, 2019 at 4:59 PM Chad Dombrova  wrote:
>
>> Is setup_requires being used somewhere else, because I'm still getting
>> errors after removing it from sdks/python/setup.py.
>>
>> I removed it from this PR: https://github.com/apache/beam/pull/9959
>>
>> Here's the gradle scan:
>> https://scans.gradle.com/s/oinh5xpaly3dk/failure#top=0
>>
>> The error shows up differently than before when
>> setup_requries=['pytest-runner'] was present -- it's in a gradle traceback
>> now rather than the console log.  I've also seen different packages listed
>> as the culprit (simplegeneric, timeloop).
>>
>> -chad
>>
>>
>>
>> On Tue, Nov 26, 2019 at 4:47 PM Udi Meiri  wrote:
>>
>>> Chad, I believe the answer is the "setup_requires" line is causing the
>>> sdks/python/.eggs directory to be created.
>>>
>>> This command fails with the setup_requires line (same Errno 17), but
>>> succeeds without it:
>>> $ \rm -r .eggs/; ../../gradlew installGcpTest
>>> [~8 failed tasks]
>>> $ ls .eggs
>>> pytest_runner-5.2-py2.7.egg  pytest_runner-5.2-py3.5.egg
>>>  pytest_runner-5.2-py3.6.egg  pytest_runner-5.2-py3.7.egg  README.txt
>>>
>>> I'll go ahead and create a PR to remove setup_requires from setup.py.
>>>
>>> On Tue, Nov 26, 2019 at 4:16 PM Chad Dombrova  wrote:
>>>
 It seems like the offending packages are those that only have source
 distributions (i.e. no wheels).  But why are the eggs being installed in
 sdks/python/.eggs instead of into the virtualenv created by setupVirtualenv
 gradle task or by tox?


 On Tue, Nov 26, 2019 at 3:59 PM Udi Meiri  wrote:

> Basically, I believe what's happening is that a new Gradle task was
> added that uses setup.py but doesn't have the same dependency on some main
> setup.py task that all others depend on (list sdist).
>
> On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri  wrote:
>
>> Correction: the error is not gone after removing the line. I get
>> instead:
>> error: [Errno 17] File exists:
>> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'
>>
>>
>> On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:
>>
>>> I managed to recreate one of the issues with this command:
>>> ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo
>>> "python setup.py -q nosetests --tests
>>> apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; 
>>> done
>>>
>>> This reliably gives me:
>>> OSError: [Errno 17] File exists:
>>> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'
>>>
>>> If I remove this line from setup.py the error is gone:
>>>   setup_requires=['pytest_runner'],
>>>
>>>
>>> On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova 
>>> wrote:
>>>
 Thanks for looking into this. It seems like it might be something
 to do with data that is cached on the Jenkins slaves between runs, 
 which
 may be what prevents this from showing up locally?

 If your theory about setuptools is correct, and it sounds likely,
 we should be able to lock down the version, which we should definitely 
 be
 doing for all of our dependencies.

 -chad



 On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay 
 wrote:

> I tried to debug but did not make much progress. I cannot
> reproduce locally, however all python precommits and postcommits are
> failing.
>
> One guess is, setuptools released a new version that does not
> support eggs a few days ago, that might be the cause (
> https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but
> that should have reproduced locally.
> Maybe something is wrong with the jenkins machines, and we could
> perhaps bring them to a clean state.
>
> I suspected this being related to pytest somehow (as the first 4
> JIRAs had pytest in the error line) but the error Chad saw is 
> different.
>
> +Valentyn Tymofieiev  and +Yifan Zou
>  could you help with looking into this?
>
>
> Ahmet
>
>
>
> On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik 
> wrote:
>
>> I also started to see this on PRs that I'm reviewing.
>> BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with 
>> setup.py and
>> egg_info but this looks different then all of those so I filed 
>> BEAM-8831.
>>
>>
>> On Mon, Nov 25, 2019 at

Re: cython test instability

I looked at #9959 but it doesn't seem to modify setup.py?
The additional eggs for timeloop etc. are troubling though. Not sure where
those come from.

On Tue, Nov 26, 2019 at 4:59 PM Chad Dombrova  wrote:

> Is setup_requires being used somewhere else, because I'm still getting
> errors after removing it from sdks/python/setup.py.
>
> I removed it from this PR: https://github.com/apache/beam/pull/9959
>
> Here's the gradle scan:
> https://scans.gradle.com/s/oinh5xpaly3dk/failure#top=0
>
> The error shows up differently than before when
> setup_requries=['pytest-runner'] was present -- it's in a gradle traceback
> now rather than the console log.  I've also seen different packages listed
> as the culprit (simplegeneric, timeloop).
>
> -chad
>
>
>
> On Tue, Nov 26, 2019 at 4:47 PM Udi Meiri  wrote:
>
>> Chad, I believe the answer is the "setup_requires" line is causing the
>> sdks/python/.eggs directory to be created.
>>
>> This command fails with the setup_requires line (same Errno 17), but
>> succeeds without it:
>> $ \rm -r .eggs/; ../../gradlew installGcpTest
>> [~8 failed tasks]
>> $ ls .eggs
>> pytest_runner-5.2-py2.7.egg  pytest_runner-5.2-py3.5.egg
>>  pytest_runner-5.2-py3.6.egg  pytest_runner-5.2-py3.7.egg  README.txt
>>
>> I'll go ahead and create a PR to remove setup_requires from setup.py.
>>
>> On Tue, Nov 26, 2019 at 4:16 PM Chad Dombrova  wrote:
>>
>>> It seems like the offending packages are those that only have source
>>> distributions (i.e. no wheels).  But why are the eggs being installed in
>>> sdks/python/.eggs instead of into the virtualenv created by setupVirtualenv
>>> gradle task or by tox?
>>>
>>>
>>> On Tue, Nov 26, 2019 at 3:59 PM Udi Meiri  wrote:
>>>
 Basically, I believe what's happening is that a new Gradle task was
 added that uses setup.py but doesn't have the same dependency on some main
 setup.py task that all others depend on (list sdist).

 On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri  wrote:

> Correction: the error is not gone after removing the line. I get
> instead:
> error: [Errno 17] File exists:
> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'
>
>
> On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:
>
>> I managed to recreate one of the issues with this command:
>> ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo
>> "python setup.py -q nosetests --tests
>> apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; 
>> done
>>
>> This reliably gives me:
>> OSError: [Errno 17] File exists:
>> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'
>>
>> If I remove this line from setup.py the error is gone:
>>   setup_requires=['pytest_runner'],
>>
>>
>> On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova 
>> wrote:
>>
>>> Thanks for looking into this. It seems like it might be something to
>>> do with data that is cached on the Jenkins slaves between runs, which 
>>> may
>>> be what prevents this from showing up locally?
>>>
>>> If your theory about setuptools is correct, and it sounds likely, we
>>> should be able to lock down the version, which we should definitely be
>>> doing for all of our dependencies.
>>>
>>> -chad
>>>
>>>
>>>
>>> On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay 
>>> wrote:
>>>
 I tried to debug but did not make much progress. I cannot reproduce
 locally, however all python precommits and postcommits are failing.

 One guess is, setuptools released a new version that does not
 support eggs a few days ago, that might be the cause (
 https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but
 that should have reproduced locally.
 Maybe something is wrong with the jenkins machines, and we could
 perhaps bring them to a clean state.

 I suspected this being related to pytest somehow (as the first 4
 JIRAs had pytest in the error line) but the error Chad saw is 
 different.

 +Valentyn Tymofieiev  and +Yifan Zou
  could you help with looking into this?


 Ahmet



 On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:

> I also started to see this on PRs that I'm reviewing.
> BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with 
> setup.py and
> egg_info but this looks different then all of those so I filed 
> BEAM-8831.
>
>
> On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova 
> wrote:
>
>> Actually, it looks like I'm getting the same error on multiple
>> PRs: https://scans.gradle.com/s/ihfmrxr7evslw
>>
>>
>>
>>
>> On Mon, Nov 25, 2019 at 10:26 PM Chad

Re: cython test instability

Is setup_requires being used somewhere else, because I'm still getting
errors after removing it from sdks/python/setup.py.

I removed it from this PR: https://github.com/apache/beam/pull/9959

Here's the gradle scan:
https://scans.gradle.com/s/oinh5xpaly3dk/failure#top=0

The error shows up differently than before when
setup_requries=['pytest-runner'] was present -- it's in a gradle traceback
now rather than the console log.  I've also seen different packages listed
as the culprit (simplegeneric, timeloop).

-chad



On Tue, Nov 26, 2019 at 4:47 PM Udi Meiri  wrote:

> Chad, I believe the answer is the "setup_requires" line is causing the
> sdks/python/.eggs directory to be created.
>
> This command fails with the setup_requires line (same Errno 17), but
> succeeds without it:
> $ \rm -r .eggs/; ../../gradlew installGcpTest
> [~8 failed tasks]
> $ ls .eggs
> pytest_runner-5.2-py2.7.egg  pytest_runner-5.2-py3.5.egg
>  pytest_runner-5.2-py3.6.egg  pytest_runner-5.2-py3.7.egg  README.txt
>
> I'll go ahead and create a PR to remove setup_requires from setup.py.
>
> On Tue, Nov 26, 2019 at 4:16 PM Chad Dombrova  wrote:
>
>> It seems like the offending packages are those that only have source
>> distributions (i.e. no wheels).  But why are the eggs being installed in
>> sdks/python/.eggs instead of into the virtualenv created by setupVirtualenv
>> gradle task or by tox?
>>
>>
>> On Tue, Nov 26, 2019 at 3:59 PM Udi Meiri  wrote:
>>
>>> Basically, I believe what's happening is that a new Gradle task was
>>> added that uses setup.py but doesn't have the same dependency on some main
>>> setup.py task that all others depend on (list sdist).
>>>
>>> On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri  wrote:
>>>
 Correction: the error is not gone after removing the line. I get
 instead:
 error: [Errno 17] File exists:
 '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'


 On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:

> I managed to recreate one of the issues with this command:
> ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo
> "python setup.py -q nosetests --tests
> apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; 
> done
>
> This reliably gives me:
> OSError: [Errno 17] File exists:
> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'
>
> If I remove this line from setup.py the error is gone:
>   setup_requires=['pytest_runner'],
>
>
> On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova 
> wrote:
>
>> Thanks for looking into this. It seems like it might be something to
>> do with data that is cached on the Jenkins slaves between runs, which may
>> be what prevents this from showing up locally?
>>
>> If your theory about setuptools is correct, and it sounds likely, we
>> should be able to lock down the version, which we should definitely be
>> doing for all of our dependencies.
>>
>> -chad
>>
>>
>>
>> On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay  wrote:
>>
>>> I tried to debug but did not make much progress. I cannot reproduce
>>> locally, however all python precommits and postcommits are failing.
>>>
>>> One guess is, setuptools released a new version that does not
>>> support eggs a few days ago, that might be the cause (
>>> https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but
>>> that should have reproduced locally.
>>> Maybe something is wrong with the jenkins machines, and we could
>>> perhaps bring them to a clean state.
>>>
>>> I suspected this being related to pytest somehow (as the first 4
>>> JIRAs had pytest in the error line) but the error Chad saw is different.
>>>
>>> +Valentyn Tymofieiev  and +Yifan Zou
>>>  could you help with looking into this?
>>>
>>>
>>> Ahmet
>>>
>>>
>>>
>>> On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:
>>>
 I also started to see this on PRs that I'm reviewing.
 BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with 
 setup.py and
 egg_info but this looks different then all of those so I filed 
 BEAM-8831.


 On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova 
 wrote:

> Actually, it looks like I'm getting the same error on multiple
> PRs: https://scans.gradle.com/s/ihfmrxr7evslw
>
>
>
>
> On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova 
> wrote:
>
>> Hi all,
>> The cython tests started failing on one of my PRs which were
>> succeeding before.   The error is one that I've never seen before
>> (separated onto different lines to make it easier to read):
>>
>> Caused by: org.gradle.api.GradleException:
>> Could not copy

Re: cython test instability

Chad, I believe the answer is the "setup_requires" line is causing the
sdks/python/.eggs directory to be created.

This command fails with the setup_requires line (same Errno 17), but
succeeds without it:
$ \rm -r .eggs/; ../../gradlew installGcpTest
[~8 failed tasks]
$ ls .eggs
pytest_runner-5.2-py2.7.egg  pytest_runner-5.2-py3.5.egg
 pytest_runner-5.2-py3.6.egg  pytest_runner-5.2-py3.7.egg  README.txt

I'll go ahead and create a PR to remove setup_requires from setup.py.

On Tue, Nov 26, 2019 at 4:16 PM Chad Dombrova  wrote:

> It seems like the offending packages are those that only have source
> distributions (i.e. no wheels).  But why are the eggs being installed in
> sdks/python/.eggs instead of into the virtualenv created by setupVirtualenv
> gradle task or by tox?
>
>
> On Tue, Nov 26, 2019 at 3:59 PM Udi Meiri  wrote:
>
>> Basically, I believe what's happening is that a new Gradle task was added
>> that uses setup.py but doesn't have the same dependency on some main
>> setup.py task that all others depend on (list sdist).
>>
>> On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri  wrote:
>>
>>> Correction: the error is not gone after removing the line. I get instead:
>>> error: [Errno 17] File exists:
>>> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'
>>>
>>>
>>> On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:
>>>
 I managed to recreate one of the issues with this command:
 ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo
 "python setup.py -q nosetests --tests
 apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; done

 This reliably gives me:
 OSError: [Errno 17] File exists:
 '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'

 If I remove this line from setup.py the error is gone:
   setup_requires=['pytest_runner'],


 On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova 
 wrote:

> Thanks for looking into this. It seems like it might be something to
> do with data that is cached on the Jenkins slaves between runs, which may
> be what prevents this from showing up locally?
>
> If your theory about setuptools is correct, and it sounds likely, we
> should be able to lock down the version, which we should definitely be
> doing for all of our dependencies.
>
> -chad
>
>
>
> On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay  wrote:
>
>> I tried to debug but did not make much progress. I cannot reproduce
>> locally, however all python precommits and postcommits are failing.
>>
>> One guess is, setuptools released a new version that does not support
>> eggs a few days ago, that might be the cause (
>> https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but that
>> should have reproduced locally.
>> Maybe something is wrong with the jenkins machines, and we could
>> perhaps bring them to a clean state.
>>
>> I suspected this being related to pytest somehow (as the first 4
>> JIRAs had pytest in the error line) but the error Chad saw is different.
>>
>> +Valentyn Tymofieiev  and +Yifan Zou
>>  could you help with looking into this?
>>
>>
>> Ahmet
>>
>>
>>
>> On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:
>>
>>> I also started to see this on PRs that I'm reviewing.
>>> BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py 
>>> and
>>> egg_info but this looks different then all of those so I filed 
>>> BEAM-8831.
>>>
>>>
>>> On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova 
>>> wrote:
>>>
 Actually, it looks like I'm getting the same error on multiple PRs:
 https://scans.gradle.com/s/ihfmrxr7evslw




 On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova 
 wrote:

> Hi all,
> The cython tests started failing on one of my PRs which were
> succeeding before.   The error is one that I've never seen before
> (separated onto different lines to make it easier to read):
>
> Caused by: org.gradle.api.GradleException:
> Could not copy file
>
> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
> /src/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'
> to
>
> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
> /src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'.
>
> Followed immediately by an error about could not create a
> directory of the same name.  Here's the gradle scan:
>
>
> https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd=WzZd#top=0
>
> Any ideas?
>
> -chad
>
>
>
>

Re: real real-time beam

2019-11-26 Thread Kenneth Knowles

On Tue, Nov 26, 2019 at 1:00 AM Jan Lukavský  wrote:

> > I will not try to formalize this notion in this email. But I will note
> that since it is universally assured, it would be zero cost and
> significantly safer to formalize it and add an annotation noting it was
> required. It has nothing to do with event time ordering, only trigger
> firing ordering.
>
> I cannot agree with the last sentence (and I'm really not doing this on
> purpose :-)). Panes generally arrive out of order, as mentioned several
> times in the discussions linked from this thread. If we want to ensure
> "trigger firing ordering", we can use the pane index, that is correct. But
> - that is actually equivalent to sorting by event time, because pane index
> order will be (nearly) the same as event time order. This is due to the
> fact, that pane index and event time correlate (both are monotonic).
>
Trigger firings can have decreasing event timestamps w/ the minimum
timestamp combiner*. I do think the issue at hand is best analyzed in terms
of the explicit ordering on panes. And I do think we need to have an
explicit guarantee or annotation strong enough to describe a
correct-under-all-allowed runners sink. Today an antagonistic runner could
probably break a lot of things.

Kenn

*In fact, they can decrease via the "maximum" timestamp combiner because
actually timestamp combiners only apply to the elements that particular
pane. This is weird, and maybe a design bug, but good to know about.


> The pane index "only" solves the issue of preserving ordering even in case
> where there are multiple firings within the same timestamp (regardless of
> granularity). This was mentioned in the initial discussion about event time
> ordering, and is part of the design doc - users should be allowed to
> provide UDF for extracting time-correlated ordering field (which means
> ability to choose a preferred, or authoritative, observer which assigns
> unambiguous ordering to events). Example of this might include Kafka
> offsets as well, or any queue index for that matter. This is not yet
> implemented, but could (should) be in the future.
>
> The only case where these two things are (somewhat) different is the case
> mentioned by @Steve - if the output is stateless ParDo, which will get
> fused. But that is only because the processing is single-threaded per key,
> and therefore the ordering is implied by timer ordering (and careful here,
> many runners don't have this ordering 100% correct, as of now - this
> problem luckily appears only when there are multiple timers per key).
> Moreover, if there should be a failure, then the output might (would) get
> back in time anyway. If there would be a shuffle operation after
> GBK/Combine, then the ordering is no longer guaranteed and must be
> explicitly taken care of.
>
> Last note, I must agree with @Rui that all these discussions are very much
> related to retractions (precisely the ability to implement them).
>
> Jan
> On 11/26/19 7:34 AM, Kenneth Knowles wrote:
>
> Hi Aaron,
>
> Another insightful observation.
>
> Whenever an aggregation (GBK / Combine per key) has a trigger firing,
> there is a per-key sequence number attached. It is included in metadata
> known as "PaneInfo" [1]. The value of PaneInfo.getIndex() is colloquially
> referred to as the "pane index". You can also make use of the "on time
> index" if you like. The best way to access this metadata is to add a
> parameter of type PaneInfo to your DoFn's @ProcessElement method. This
> works for stateful or stateless DoFn.
>
> Most of Beam's IO connectors do not explicitly enforce that outputs occur
> in pane index order but instead rely on the hope that the runner delivers
> panes in order to the sink. IMO this is dangerous but it has not yet caused
> a known issue. In practice, each "input key to output key 'path' " through
> a pipeline's logic does preserve order for all existing runners AFAIK and
> it is the formalization that is missing. It is related to an observation by 
> +Rui
> Wang  that processing retractions requires the same
> key-to-key ordering.
>
> I will not try to formalize this notion in this email. But I will note
> that since it is universally assured, it would be zero cost and
> significantly safer to formalize it and add an annotation noting it was
> required. It has nothing to do with event time ordering, only trigger
> firing ordering.
>
> Kenn
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/PaneInfo.java
> [2]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java#L557
>
>
> On Mon, Nov 25, 2019 at 4:06 PM Pablo Estrada  wrote:
>
>> The blog posts on stateful and timely computation with Beam should help
>> clarify a lot about how to use state and timers to do this:
>> https://beam.apache.org/blog/2017/02/13/stateful-processing.html
>>

Re: cython test instability

It seems like the offending packages are those that only have source
distributions (i.e. no wheels).  But why are the eggs being installed in
sdks/python/.eggs instead of into the virtualenv created by setupVirtualenv
gradle task or by tox?


On Tue, Nov 26, 2019 at 3:59 PM Udi Meiri  wrote:

> Basically, I believe what's happening is that a new Gradle task was added
> that uses setup.py but doesn't have the same dependency on some main
> setup.py task that all others depend on (list sdist).
>
> On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri  wrote:
>
>> Correction: the error is not gone after removing the line. I get instead:
>> error: [Errno 17] File exists:
>> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'
>>
>>
>> On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:
>>
>>> I managed to recreate one of the issues with this command:
>>> ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo
>>> "python setup.py -q nosetests --tests
>>> apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; done
>>>
>>> This reliably gives me:
>>> OSError: [Errno 17] File exists:
>>> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'
>>>
>>> If I remove this line from setup.py the error is gone:
>>>   setup_requires=['pytest_runner'],
>>>
>>>
>>> On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova  wrote:
>>>
 Thanks for looking into this. It seems like it might be something to do
 with data that is cached on the Jenkins slaves between runs, which may be
 what prevents this from showing up locally?

 If your theory about setuptools is correct, and it sounds likely, we
 should be able to lock down the version, which we should definitely be
 doing for all of our dependencies.

 -chad



 On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay  wrote:

> I tried to debug but did not make much progress. I cannot reproduce
> locally, however all python precommits and postcommits are failing.
>
> One guess is, setuptools released a new version that does not support
> eggs a few days ago, that might be the cause (
> https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but that
> should have reproduced locally.
> Maybe something is wrong with the jenkins machines, and we could
> perhaps bring them to a clean state.
>
> I suspected this being related to pytest somehow (as the first 4 JIRAs
> had pytest in the error line) but the error Chad saw is different.
>
> +Valentyn Tymofieiev  and +Yifan Zou
>  could you help with looking into this?
>
>
> Ahmet
>
>
>
> On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:
>
>> I also started to see this on PRs that I'm reviewing.
>> BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py 
>> and
>> egg_info but this looks different then all of those so I filed BEAM-8831.
>>
>>
>> On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova 
>> wrote:
>>
>>> Actually, it looks like I'm getting the same error on multiple PRs:
>>> https://scans.gradle.com/s/ihfmrxr7evslw
>>>
>>>
>>>
>>>
>>> On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova 
>>> wrote:
>>>
 Hi all,
 The cython tests started failing on one of my PRs which were
 succeeding before.   The error is one that I've never seen before
 (separated onto different lines to make it easier to read):

 Caused by: org.gradle.api.GradleException:
 Could not copy file

 '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
 /src/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'
 to

 '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
 /src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'.

 Followed immediately by an error about could not create a directory
 of the same name.  Here's the gradle scan:


 https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd=WzZd#top=0

 Any ideas?

 -chad

Re: cython test instability

Basically, I believe what's happening is that a new Gradle task was added
that uses setup.py but doesn't have the same dependency on some main
setup.py task that all others depend on (list sdist).

On Tue, Nov 26, 2019 at 3:49 PM Udi Meiri  wrote:

> Correction: the error is not gone after removing the line. I get instead:
> error: [Errno 17] File exists:
> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'
>
>
> On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:
>
>> I managed to recreate one of the issues with this command:
>> ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo
>> "python setup.py -q nosetests --tests
>> apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; done
>>
>> This reliably gives me:
>> OSError: [Errno 17] File exists:
>> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'
>>
>> If I remove this line from setup.py the error is gone:
>>   setup_requires=['pytest_runner'],
>>
>>
>> On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova  wrote:
>>
>>> Thanks for looking into this. It seems like it might be something to do
>>> with data that is cached on the Jenkins slaves between runs, which may be
>>> what prevents this from showing up locally?
>>>
>>> If your theory about setuptools is correct, and it sounds likely, we
>>> should be able to lock down the version, which we should definitely be
>>> doing for all of our dependencies.
>>>
>>> -chad
>>>
>>>
>>>
>>> On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay  wrote:
>>>
 I tried to debug but did not make much progress. I cannot reproduce
 locally, however all python precommits and postcommits are failing.

 One guess is, setuptools released a new version that does not support
 eggs a few days ago, that might be the cause (
 https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but that
 should have reproduced locally.
 Maybe something is wrong with the jenkins machines, and we could
 perhaps bring them to a clean state.

 I suspected this being related to pytest somehow (as the first 4 JIRAs
 had pytest in the error line) but the error Chad saw is different.

 +Valentyn Tymofieiev  and +Yifan Zou
  could you help with looking into this?


 Ahmet



 On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:

> I also started to see this on PRs that I'm reviewing.
> BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py 
> and
> egg_info but this looks different then all of those so I filed BEAM-8831.
>
>
> On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova 
> wrote:
>
>> Actually, it looks like I'm getting the same error on multiple PRs:
>> https://scans.gradle.com/s/ihfmrxr7evslw
>>
>>
>>
>>
>> On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova 
>> wrote:
>>
>>> Hi all,
>>> The cython tests started failing on one of my PRs which were
>>> succeeding before.   The error is one that I've never seen before
>>> (separated onto different lines to make it easier to read):
>>>
>>> Caused by: org.gradle.api.GradleException:
>>> Could not copy file
>>> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
>>> /src/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'
>>> to
>>> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
>>> /src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'.
>>>
>>> Followed immediately by an error about could not create a directory
>>> of the same name.  Here's the gradle scan:
>>>
>>>
>>> https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd=WzZd#top=0
>>>
>>> Any ideas?
>>>
>>> -chad
>>>
>>>
>>>
>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature

Re: cython test instability

Correction: the error is not gone after removing the line. I get instead:
error: [Errno 17] File exists:
'/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/dill-0.3.1.1-py2.7.egg'


On Tue, Nov 26, 2019 at 3:45 PM Udi Meiri  wrote:

> I managed to recreate one of the issues with this command:
> ~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo
> "python setup.py -q nosetests --tests
> apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; done
>
> This reliably gives me:
> OSError: [Errno 17] File exists:
> '/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'
>
> If I remove this line from setup.py the error is gone:
>   setup_requires=['pytest_runner'],
>
>
> On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova  wrote:
>
>> Thanks for looking into this. It seems like it might be something to do
>> with data that is cached on the Jenkins slaves between runs, which may be
>> what prevents this from showing up locally?
>>
>> If your theory about setuptools is correct, and it sounds likely, we
>> should be able to lock down the version, which we should definitely be
>> doing for all of our dependencies.
>>
>> -chad
>>
>>
>>
>> On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay  wrote:
>>
>>> I tried to debug but did not make much progress. I cannot reproduce
>>> locally, however all python precommits and postcommits are failing.
>>>
>>> One guess is, setuptools released a new version that does not support
>>> eggs a few days ago, that might be the cause (
>>> https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but that
>>> should have reproduced locally.
>>> Maybe something is wrong with the jenkins machines, and we could perhaps
>>> bring them to a clean state.
>>>
>>> I suspected this being related to pytest somehow (as the first 4 JIRAs
>>> had pytest in the error line) but the error Chad saw is different.
>>>
>>> +Valentyn Tymofieiev  and +Yifan Zou
>>>  could you help with looking into this?
>>>
>>>
>>> Ahmet
>>>
>>>
>>>
>>> On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:
>>>
 I also started to see this on PRs that I'm reviewing.
 BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py and
 egg_info but this looks different then all of those so I filed BEAM-8831.


 On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova 
 wrote:

> Actually, it looks like I'm getting the same error on multiple PRs:
> https://scans.gradle.com/s/ihfmrxr7evslw
>
>
>
>
> On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova 
> wrote:
>
>> Hi all,
>> The cython tests started failing on one of my PRs which were
>> succeeding before.   The error is one that I've never seen before
>> (separated onto different lines to make it easier to read):
>>
>> Caused by: org.gradle.api.GradleException:
>> Could not copy file
>> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
>> /src/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'
>> to
>> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
>> /src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'.
>>
>> Followed immediately by an error about could not create a directory
>> of the same name.  Here's the gradle scan:
>>
>>
>> https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd=WzZd#top=0
>>
>> Any ideas?
>>
>> -chad
>>
>>
>>
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature

Re: cython test instability

I managed to recreate one of the issues with this command:
~/src/beam/sdks/python$ \rm -r .eggs/ && for i in $(seq 2); do echo "python
setup.py -q nosetests --tests
apache_beam.pipeline_test:DoFnTest.test_incomparable_default &" | sh ; done

This reliably gives me:
OSError: [Errno 17] File exists:
'/usr/local/google/home/ehudm/src/beam/sdks/python/.eggs/pytest_runner-5.2-py2.7.egg'

If I remove this line from setup.py the error is gone:
  setup_requires=['pytest_runner'],


On Tue, Nov 26, 2019 at 2:54 PM Chad Dombrova  wrote:

> Thanks for looking into this. It seems like it might be something to do
> with data that is cached on the Jenkins slaves between runs, which may be
> what prevents this from showing up locally?
>
> If your theory about setuptools is correct, and it sounds likely, we
> should be able to lock down the version, which we should definitely be
> doing for all of our dependencies.
>
> -chad
>
>
>
> On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay  wrote:
>
>> I tried to debug but did not make much progress. I cannot reproduce
>> locally, however all python precommits and postcommits are failing.
>>
>> One guess is, setuptools released a new version that does not support
>> eggs a few days ago, that might be the cause (
>> https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but that
>> should have reproduced locally.
>> Maybe something is wrong with the jenkins machines, and we could perhaps
>> bring them to a clean state.
>>
>> I suspected this being related to pytest somehow (as the first 4 JIRAs
>> had pytest in the error line) but the error Chad saw is different.
>>
>> +Valentyn Tymofieiev  and +Yifan Zou
>>  could you help with looking into this?
>>
>>
>> Ahmet
>>
>>
>>
>> On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:
>>
>>> I also started to see this on PRs that I'm reviewing.
>>> BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py and
>>> egg_info but this looks different then all of those so I filed BEAM-8831.
>>>
>>>
>>> On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova 
>>> wrote:
>>>
 Actually, it looks like I'm getting the same error on multiple PRs:
 https://scans.gradle.com/s/ihfmrxr7evslw




 On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova 
 wrote:

> Hi all,
> The cython tests started failing on one of my PRs which were
> succeeding before.   The error is one that I've never seen before
> (separated onto different lines to make it easier to read):
>
> Caused by: org.gradle.api.GradleException:
> Could not copy file
> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
> /src/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'
> to
> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
> /src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'.
>
> Followed immediately by an error about could not create a directory of
> the same name.  Here's the gradle scan:
>
>
> https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd=WzZd#top=0
>
> Any ideas?
>
> -chad
>
>
>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature

Re: cython test instability

Thanks for looking into this. It seems like it might be something to do
with data that is cached on the Jenkins slaves between runs, which may be
what prevents this from showing up locally?

If your theory about setuptools is correct, and it sounds likely, we should
be able to lock down the version, which we should definitely be doing for
all of our dependencies.

-chad

On Tue, Nov 26, 2019 at 1:33 PM Ahmet Altay  wrote:

> I tried to debug but did not make much progress. I cannot reproduce
> locally, however all python precommits and postcommits are failing.
>
> One guess is, setuptools released a new version that does not support eggs
> a few days ago, that might be the cause (
> https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but that
> should have reproduced locally.
> Maybe something is wrong with the jenkins machines, and we could perhaps
> bring them to a clean state.
>
> I suspected this being related to pytest somehow (as the first 4 JIRAs had
> pytest in the error line) but the error Chad saw is different.
>
> +Valentyn Tymofieiev  and +Yifan Zou
>  could you help with looking into this?
>
>
> Ahmet
>
>
>
> On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:
>
>> I also started to see this on PRs that I'm reviewing.
>> BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py and
>> egg_info but this looks different then all of those so I filed BEAM-8831.
>>
>>
>> On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova  wrote:
>>
>>> Actually, it looks like I'm getting the same error on multiple PRs:
>>> https://scans.gradle.com/s/ihfmrxr7evslw
>>>
>>>
>>>
>>>
>>> On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova 
>>> wrote:
>>>
 Hi all,
 The cython tests started failing on one of my PRs which were succeeding
 before.   The error is one that I've never seen before (separated onto
 different lines to make it easier to read):

 Caused by: org.gradle.api.GradleException:
 Could not copy file
 '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
 /src/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'
 to
 '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
 /src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'.

 Followed immediately by an error about could not create a directory of
 the same name.  Here's the gradle scan:

 https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd=WzZd#top=0

 Any ideas?

 -chad

Re: cython test instability

2019-11-26 Thread Ahmet Altay

I tried to debug but did not make much progress. I cannot reproduce
locally, however all python precommits and postcommits are failing.

One guess is, setuptools released a new version that does not support eggs
a few days ago, that might be the cause (
https://github.com/pypa/setuptools/blob/master/CHANGES.rst) but that should
have reproduced locally.
Maybe something is wrong with the jenkins machines, and we could perhaps
bring them to a clean state.

I suspected this being related to pytest somehow (as the first 4 JIRAs had
pytest in the error line) but the error Chad saw is different.

+Valentyn Tymofieiev  and +Yifan Zou
 could you help with looking into this?

Ahmet

On Tue, Nov 26, 2019 at 9:14 AM Luke Cwik  wrote:

> I also started to see this on PRs that I'm reviewing.
> BEAM-8793, BEAM-8653, BEAM-8631, BEAM-8249 mention issues with setup.py and
> egg_info but this looks different then all of those so I filed BEAM-8831.
>
>
> On Mon, Nov 25, 2019 at 10:27 PM Chad Dombrova  wrote:
>
>> Actually, it looks like I'm getting the same error on multiple PRs:
>> https://scans.gradle.com/s/ihfmrxr7evslw
>>
>>
>>
>>
>> On Mon, Nov 25, 2019 at 10:26 PM Chad Dombrova  wrote:
>>
>>> Hi all,
>>> The cython tests started failing on one of my PRs which were succeeding
>>> before.   The error is one that I've never seen before (separated onto
>>> different lines to make it easier to read):
>>>
>>> Caused by: org.gradle.api.GradleException:
>>> Could not copy file
>>> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
>>> /src/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'
>>> to
>>> '/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit@2
>>> /src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/.eggs/simplegeneric-0.8.1-py2.7.egg'.
>>>
>>> Followed immediately by an error about could not create a directory of
>>> the same name.  Here's the gradle scan:
>>>
>>>
>>> https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd=WzZd#top=0
>>>
>>> Any ideas?
>>>
>>> -chad
>>>
>>>
>>>
>>>
>>>

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Luke Cwik

I suggested the wrapper because sometimes the intent of the APIs can be
translated easily but this is not always the case.

Good to know that it is all marked @Experimental.

On Tue, Nov 26, 2019 at 12:30 PM Cam Mach  wrote:

> Thank you, Alex for sharing the information, and Luke for the questions.
> I like the idea that just depreciate the V1 IOs, and just maintain V2 IOs,
> so we can support whoever want continue with V1.
> Just as Alex said, a lot of users, including my teams :-) , use the V1 IOs
> in production for real workload. So it'll be hard to remove V1 IOs or force
> them migrate to V2. But let hear if there are any other ideas?
>
> Btw, making V1 a wrapper around V2 is not very positive, code will get
> more complicated since V2 API is very different from V1's.
>
> Thanks,
>
>
>
> On Tue, Nov 26, 2019 at 8:21 AM Alexey Romanenko 
> wrote:
>
>> AFAICT, all AWS SDK V1 IOs (SnsIO, SqsIO, DynamoDBIO, KinesisIO) are
>> marked as "Experimental". So, it should not be a problem to gracefully
>> deprecate and finally remove them. We already did the similar procedure for
>> “HadoopInputFormatIO”, which was renamed to just “HadoopFormatIO” (since it
>> started to support HadoopOutputFormatI as well). Old “HadoopInputFormatIO”
>> was deprecated and removed after *3 consecutive* Beam releases (as we
>> agreed on mailing list).
>>
>> In the same time, some users for some reasons would not be able or to
>> want to move on AWS SDK V2. So, I’d prefer to just deprecate AWS SDK V1 IOs
>> and accept new features/fixes *only* for V2 IOs.
>>
>> Talking about “Experimental” annotation. Sorry in advance If I missed
>> that and switch a subject a bit, but do we have clear rules or an agreement
>> when IO becomes stable and should not be marked as experimental anymore?
>> *Most* of our Java IOs are marked as Experimental but many of them were
>> using in production by real users under real load. Does it mean that they
>> are ready to be stable in terms of API? Perhaps, this topic deserves a new
>> discussion if there are several opinions on that.
>>
>> On 26 Nov 2019, at 00:39, Luke Cwik  wrote:
>>
>> Phase I sounds fine.
>>
>> Apache Beam follows semantic versioning and I believe removing the IOs
>> will be a backwards incompatible change unless they were marked
>> experimental which will be a problem for Phase 2.
>>
>> What is the feasibility of making the V1 transforms wrappers around V2?
>>
>> On Mon, Nov 25, 2019 at 1:46 PM Cam Mach  wrote:
>>
>>> Hello Beam Devs,
>>>
>>> I have been working on the migration of Amazon Web Services IO
>>> connectors into the new AWS SDK for Java V2. The goal is to have an updated
>>> implementation aligned with the most recent AWS improvements. So far we
>>> have already migrated the connectors for AWS SNS, SQS and  DynamoDB.
>>>
>>> In the meantime some contributions are still going on V1 IOs. So far we
>>> have dealt with those by porting (or asking contributors) to port the
>>> changes into V2 IOs too because we don’t want features of both versions to
>>> be unaligned but this may quickly become a maintenance issue, so we want to
>>> discuss a plan to stop supporting (deprecate) V1 IOs and encourage users to
>>> move to V2.
>>>
>>> Phase I (ASAP):
>>>
>>>- Mark migrated AWS V1 IOs as deprecated
>>>- Document migration path to V2
>>>
>>> Phase II (end of 2020):
>>>
>>>- Decide a date or Beam release to remove the V1 IOs
>>>- Send a notification to the community 3 months before we remove them
>>>- Completely get rid of V1 IOs
>>>
>>>
>>> Please let me know what you think or if you see any potential issues?
>>>
>>> Thanks,
>>> Cam Mach
>>>
>>>
>>

Re: [DISCUSS] AWS IOs V1 Deprecation Plan

2019-11-26 Thread Cam Mach

Thank you, Alex for sharing the information, and Luke for the questions.
I like the idea that just depreciate the V1 IOs, and just maintain V2 IOs,
so we can support whoever want continue with V1.
Just as Alex said, a lot of users, including my teams :-) , use the V1 IOs
in production for real workload. So it'll be hard to remove V1 IOs or force
them migrate to V2. But let hear if there are any other ideas?

Btw, making V1 a wrapper around V2 is not very positive, code will get more
complicated since V2 API is very different from V1's.

Thanks,



On Tue, Nov 26, 2019 at 8:21 AM Alexey Romanenko 
wrote:

> AFAICT, all AWS SDK V1 IOs (SnsIO, SqsIO, DynamoDBIO, KinesisIO) are
> marked as "Experimental". So, it should not be a problem to gracefully
> deprecate and finally remove them. We already did the similar procedure for
> “HadoopInputFormatIO”, which was renamed to just “HadoopFormatIO” (since it
> started to support HadoopOutputFormatI as well). Old “HadoopInputFormatIO”
> was deprecated and removed after *3 consecutive* Beam releases (as we
> agreed on mailing list).
>
> In the same time, some users for some reasons would not be able or to want
> to move on AWS SDK V2. So, I’d prefer to just deprecate AWS SDK V1 IOs and
> accept new features/fixes *only* for V2 IOs.
>
> Talking about “Experimental” annotation. Sorry in advance If I missed that
> and switch a subject a bit, but do we have clear rules or an agreement when
> IO becomes stable and should not be marked as experimental anymore? *Most*
> of our Java IOs are marked as Experimental but many of them were using in
> production by real users under real load. Does it mean that they are ready
> to be stable in terms of API? Perhaps, this topic deserves a new discussion
> if there are several opinions on that.
>
> On 26 Nov 2019, at 00:39, Luke Cwik  wrote:
>
> Phase I sounds fine.
>
> Apache Beam follows semantic versioning and I believe removing the IOs
> will be a backwards incompatible change unless they were marked
> experimental which will be a problem for Phase 2.
>
> What is the feasibility of making the V1 transforms wrappers around V2?
>
> On Mon, Nov 25, 2019 at 1:46 PM Cam Mach  wrote:
>
>> Hello Beam Devs,
>>
>> I have been working on the migration of Amazon Web Services IO connectors
>> into the new AWS SDK for Java V2. The goal is to have an updated
>> implementation aligned with the most recent AWS improvements. So far we
>> have already migrated the connectors for AWS SNS, SQS and  DynamoDB.
>>
>> In the meantime some contributions are still going on V1 IOs. So far we
>> have dealt with those by porting (or asking contributors) to port the
>> changes into V2 IOs too because we don’t want features of both versions to
>> be unaligned but this may quickly become a maintenance issue, so we want to
>> discuss a plan to stop supporting (deprecate) V1 IOs and encourage users to
>> move to V2.
>>
>> Phase I (ASAP):
>>
>>- Mark migrated AWS V1 IOs as deprecated
>>- Document migration path to V2
>>
>> Phase II (end of 2020):
>>
>>- Decide a date or Beam release to remove the V1 IOs
>>- Send a notification to the community 3 months before we remove them
>>- Completely get rid of V1 IOs
>>
>>
>> Please let me know what you think or if you see any potential issues?
>>
>> Thanks,
>> Cam Mach
>>
>>
>

Update on push-down for SQL IOs.

2019-11-26 Thread Kirill Kozlov

Hello everyone!

I have been working on a push-down feature and would like to give a brief
update on what is done and is still under works.

*Things that are done*:
General API for SQL IOs to provide information about what filters/projects
they support [1]:
- *Filter* can be unsupported, supported with field reordering, and
supported without field reordering.
- *Predicate* is broken down into a conjunctive normal form (CNF) and
passed to a validator class to check what parts are supported or
unsupported by an IO.

A Calcite rule [2] that checks for push-down support, constructs a new IO
source Rel [3] with pushed-down projects and filters when applicable, and
preserves unsupported filters/projects.

BigQuery should perform push-down when running queries in DIRECT_READ
method [4].

MongoDB project push-down support is in a PR [5] and predicate support will
be added soon.

*Things that are in progress:*
Documenting how developers can enable push-down for IOs that support it.

Documenting certain limitation for BigQuery push-down (ex: comparing values
of 2 columns is not supported at the moment, so it is being preserved in a
Calc).

Updating google-cloud-bigquerystorage to 0.117.0-beta. Earlier versions
have a gRPC message limit set to ~11MB, which may cause some pipelies to
break when reading from a table with rows larger than the limit.

Adding some sort of performance tests to run continuously to
measure speed-up and detect regressions.

Deciding how cost should be computed for the IO source Rel with push-down
[6]. Right now the following formula is used: cost of an IO without
push-down minus the normalized (between 0.0 and 1.0) benefit of a performed
push-down.
The challenge here is to make the change to the cost small enough to not
break join reordering, but large enough to make the optimizer favor
pushed-down IO.

If you have any suggestions/questions/concerns I would love to hear them.

[1]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/BeamSqlTable.java#L36
[2]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
[3]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java
[4]
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java#L128
[5] https://github.com/apache/beam/pull/10095
[6] https://github.com/apache/beam/pull/10060

--
Kirill

Re: Beam Testing Tools FAQ

Very cool. Thanks Lukasz!

On Tue, Nov 26, 2019 at 9:41 AM Alan Myrvold  wrote:

> Nice, thanks!
>
> On Tue, Nov 26, 2019 at 8:04 AM Robert Bradshaw 
> wrote:
>
>> Thanks!
>>
>> On Tue, Nov 26, 2019 at 7:43 AM Łukasz Gajowy  wrote:
>> >
>> > Hi all,
>> >
>> > our documentation (either confluence or the website docs) describes how
>> to create various integration and performance tests - there already are
>> core operations tests, nexmark and IO test documentation pages. However, we
>> are lacking some general docs to describe what tools do we have and what is
>> the purpose of them.
>> >
>> > Therefore, I took the liberty of creating the Beam Testing Tools FAQ on
>> our confluence:
>> > https://cwiki.apache.org/confluence/display/BEAM/Beam+Testing+Tools+FAQ
>> >
>> > Hopefully, this is helpful and sheds some more light on that important
>> part of our infrastructure. If you feel that something is missing there,
>> feel free to let me know or add it yourself. :)
>> >
>> > Thanks,
>> > Łukasz
>>
>

Re: Cleaning up Approximate Algorithms in Beam

2019-11-26 Thread Robert Bradshaw

I think this thread is sufficient.

On Mon, Nov 25, 2019 at 5:59 PM Reza Rokni  wrote:

> Hi,
>
> So do we need a vote for the final list of actions? Or is this thread
> enough to go ahead and raise the PR's?
>
> Cheers
>
> Reza
>
> On Tue, 26 Nov 2019 at 06:01, Ahmet Altay  wrote:
>
>>
>>
>> On Mon, Nov 18, 2019 at 10:57 AM Robert Bradshaw 
>> wrote:
>>
>>> On Sun, Nov 17, 2019 at 5:16 PM Reza Rokni  wrote:
>>>
 *Ahmet: FWIW, There is a python implementation only for this
 version: 
 https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/stats.py#L38
 
  *
 Eventually we will be able to make use of cross language transforms to
 help with feature parity. Until then, are we ok with marking this
 deprecated in python, even though we do not have another solution. Or leave
 it as is in Python now, as it does not have sketch capability so can only
 be used for outputting results directly from the pipeline.

>>>
>> If it is our intention to add the capability eventually, IMO it makes
>> sense to mark the existing functionality deprecated in Python as well.
>>
>>
 *Reuven: I think this is the sort of thing that has been experimental
 forever, and therefore not experimental (e.g. the entire triggering API is
 experimental as are all our file-based sinks). I think that many users use
 this, and probably store the state implicitly in streaming pipelines.*
 True, I have an old action item to try and go through and PR against
 old @experimental annotations but need to find time. So for this
 discussion; I guess this should be marked as deprecated if we change it
 even though its @experimental.

>>>
>>> Agreed.
>>>
>>>
 *Rob: I'm not following this--by naming things after their
 implementation rather than their intent I think they will be harder to
 search for. *
 This is to add to the name the implementation, after the intent. For
 example ApproximateCountDistinctZetaSketch, I believe should be easy to
 search for and it is clear which implementation is used. Allowing for a
 potentially better implementation ApproximateCountDistinct.

>>>
>>> OK, if we have both I'm more OK with that. This is better than the names
>>> like HllCount, which seems to be what was suggested.
>>>
>>> Another approach would be to have a required  parameter which is an enum
 of the implementation options.
 ApproximateCountDistinct.of().usingImpl(ZETA) ?

>>>
>>> Ideally this could be an optional parameter, or possibly only required
>>> during update until we figure out a good way for the runner to plug this in
>>> appropreately.
>>>
>>> Rob/Kenn: On Combiner discussion, should we tie action items from the
 needs of this thread to this larger discussion?

 Cheers
 Reza

 On Fri, 15 Nov 2019 at 08:32, Robert Bradshaw 
 wrote:

> On Thu, Nov 14, 2019 at 1:06 AM Kenneth Knowles 
> wrote:
>
>> Wow. Nice summary, yes. Major calls to action:
>>
>> 0. Never allow a combiner that does not include the format of its
>> state clear in its name/URN. The "update compatibility" problem makes 
>> their
>> internal accumulator state essentially part of their public API. 
>> Combiners
>> named for what they do are an inherent risk, since we might have a new 
>> way
>> to do the same operation with different implementation-detail state.
>>
>
> It seems this will make for a worse user experience, motivated solely
> by limitations in our implementation. I think we can do better.
> Hypothetical idea: what if upgrade required access to the original graph
> (or at least metadata about it) during construction? In this case an
> ApproximateDistinct could look at what was used last time and try to do 
> the
> same, but be free to do something better when unconstrained. Another
> approach would be to encode several alternative expansions in the Beam
> graph and let the runner do the picking (based on prior submission).
> (Making the CombineFn, as opposed to the composite, have several
> alternatives seems harder to reason about, but maybe worth pursuing as
> well).
>
> This is not unique to Combiners, but any stateful DoFn, or composite
> operations with non-trivial internal structure (and coders). This has been
> discussed a lot, perhaps there are some ideas there we could borrow?
>
> And they will match search terms better, which is a major problem.
>>
>
> I'm not following this--by naming things after their implementation
> rather than their intent I think they will be harder to search for.
>
>
>> 1. Point users to HllCount. This seems to be the best of the three.
>> Does it have a name that is clear enough about the format of its state?

Re: [UPDATE] Preparing for Beam 2.17.0 release

2019-11-26 Thread Mikhail Gryzykhin

Hello everybody,

Got release branch green except gradle build that timeout and fails with go
tests that look like flake.

I'll go over remaining PRs and Jiras today and do final tests validation.
Will start RC process afterwards.

--Mikhail

On Fri, Nov 22, 2019 at 9:29 PM Jan Lukavský  wrote:

> Hi Mikhail,
> I created PR for [BEAM-8812]. It is linked in the JIRA.
> Jan
>
> Dne 23. 11. 2019 0:45 napsal uživatel Mikhail Gryzykhin  >:
>
> UPD:
> on current branch there's timeout on gradle build job, I'm mitigating it
> by increasing job time. Seems that this job runs most of python tests. We
> might look into adjusting the target.
>
> Second failure is https://issues.apache.org/jira/browse/BEAM-8812 . I
> would really appreciate if someone can help me debug this one.
>
> --Mikhail
>
> On Tue, Nov 19, 2019 at 10:14 PM Kenneth Knowles  wrote:
>
> I've poked through the bugs and there do seem to be a few that are
> finished and a few that may not be started that should probably be deferred
> if they can be triaged to not be blockers.
>
> Kenn
>
> On Fri, Nov 15, 2019 at 2:13 PM Mikhail Gryzykhin 
> wrote:
>
> Hi everyone,
>
> There's still an outstanding cherry-pick PR that I can't merge due to
> tests failing on it and release branch validation PR
> . Once I get tests green, I'll
> send another update and review outstanding open issues.
>
> --Mikhail
>
> On Fri, Nov 15, 2019 at 10:40 AM Thomas Weise  wrote:
>
> Any update regarding the release?
>
> The list still shows 10 open issues:
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20fixVersion%20%3D%202.17.0%20and%20resolution%20is%20EMPTY
>
> Is the RC blocked on those?
>
>
>
>
>
>
> On Mon, Oct 28, 2019 at 12:46 PM Ahmet Altay  wrote:
>
>
>
> On Mon, Oct 28, 2019 at 12:44 PM Gleb Kanterov  wrote:
>
> It looks like BigQueryIO DIRECT_READ is broken since 2.16.0, I've added a
> ticket describing the problem and possible fix, see BEAM-8504
>  [1].
>
>
> Should this be added to 2.16 blog post as a known issue?
>
>
>
> [1]: https://issues.apache.org/jira/browse/BEAM-8504
>
> On Wed, Oct 23, 2019 at 9:19 PM Kenneth Knowles  wrote:
>
> I opened https://github.com/apache/beam/pull/9862 to raise the
> documentation of Fix Version to the top level. It also includes the write
> up of Jira priorities, to make clear that "Blocker" priority does not refer
> to release blocking.
>
> On Wed, Oct 23, 2019 at 11:16 AM Kenneth Knowles  wrote:
>
> I've gone over the tickets and removed Fix Version from many of them that
> do not seem to be critical defects. If I removed Fix Version from a ticket
> you care about, please feel free to add it back. I am not trying to decide
> what is in/out of the release, just trying to triage the Jira data to match
> expected practices.
>
> It should probably be documented somewhere outside of the release guide.
> As far as I can tell, the fact that we triage them down to zero is the only
> place we mention that it is used to indicate release blockers and not used
> for feature targets.
>
> Kenn
>
> On Wed, Oct 23, 2019 at 10:40 AM Kenneth Knowles  wrote:
>
>  Wow, 28 release blocking tickets! That is the most I've ever seen, by
> far. Many appear to be feature requests, not release-blocking defects. I
> believe this is not according to our normal best practice. The release
> cadence should not wait for features in progress, with exceptions discussed
> on dev@. As a matter of best practice, I think we should triage feature
> requests to not have Fix Version set until it has been discussed on dev@.
>
> Kenn
>
> On Wed, Oct 23, 2019 at 9:55 AM Mikhail Gryzykhin 
> wrote:
>
> Hi all,
>
> Beam 2.17 release branch cut is scheduled today (2019/10/23) according to
> the release calendar [1].  I'll start working on the branch cutoff and
> later work on cherry picking blocker fixes.
>
> If you have release blocking issues for 2.17 please mark their "Fix
> Version" as 2.17.0 [2]. This tag is already created in JIRA in case you
> would like to move any non-blocking issues to that version.
>
> There is a decent amount of open bugs to be resolved in 2.17.0 [2] and
> only 4 [3] are marked as blockers. Please, review those if these bugs are
> actually to be resolved in 2.17.0 and prioritize fixes if possible.
>
> Any thoughts, comments, objections?
>
> Regards.
> Mikhail.
>
>
> [1]
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
> [2]
> https://issues.apache.org/jira/browse/BEAM-8457?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Reopened%2C%20Open%2C%20%22In%20Progress%22%2C%20%22Under%20Discussion%22%2C%20%22In%20Implementation%22%2C%20%22Triage%20Needed%22)%20AND%20fixVersion%20%3D%202.17.0
>

Re: Beam Testing Tools FAQ

2019-11-26 Thread Alan Myrvold

Nice, thanks!

On Tue, Nov 26, 2019 at 8:04 AM Robert Bradshaw  wrote:

> Thanks!
>
> On Tue, Nov 26, 2019 at 7:43 AM Łukasz Gajowy  wrote:
> >
> > Hi all,
> >
> > our documentation (either confluence or the website docs) describes how
> to create various integration and performance tests - there already are
> core operations tests, nexmark and IO test documentation pages. However, we
> are lacking some general docs to describe what tools do we have and what is
> the purpose of them.
> >
> > Therefore, I took the liberty of creating the Beam Testing Tools FAQ on
> our confluence:
> > https://cwiki.apache.org/confluence/display/BEAM/Beam+Testing+Tools+FAQ
> >
> > Hopefully, this is helpful and sheds some more light on that important
> part of our infrastructure. If you feel that something is missing there,
> feel free to let me know or add it yourself. :)
> >
> > Thanks,
> > Łukasz
>

Re: Contributor Permission for Beam Jira tickets

I've added you as a contributor! Thanks!
-P.

On Mon, Nov 25, 2019 at 11:13 PM David Song 
wrote:

> Hi,
>
> This is David from DataPLS EngProd team (wintermelons@). I am working on
> integration tests with some Beam runners over Dataflow.
> Can someone add me as a contributor for the Beam's Jira tracker? I have an
> open bug, and would like to assign myself to it.
> My Jira username is wintermelons, and the Jira ticket is
> https://issues.apache.org/jira/browse/BEAM-8814
>
> Thanks,
> David
>
>

Re: Failed retrieving service account