Re: gradle clean causes long-running python installs

2019-01-18 Thread Udi Meiri
grpcio-tools could probably be moved under the "test" tag in setup.py. Not
sure why it has to be specified in gradle configs.

On Fri, Jan 18, 2019 at 11:43 AM Kenneth Knowles  wrote:

> Can you `setupVirtualEnv` just enough to run `setup.py clean` without
> installing gcpio-tools, etc?
>
> Kenn
>
> On Fri, Jan 18, 2019 at 11:20 AM Udi Meiri  wrote:
>
>> setup.py has requirements like setuptools, which are installed in the
>> virtual environment.
>> So even running the clean command requires the virtualenv to be set up.
>>
>> A possible fix could be to skip :beam-sdks-python:cleanPython if
>> setupVirtualenv has not been run. (perhaps by checking for the existence of
>> its output directory
>> <https://github.com/apache/beam/blob/94322f3b138d9b4d5ca69b3d18645e5eb3267b23/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1565>
>> )
>>
>> On Wed, Jan 16, 2019 at 7:03 PM Kenneth Knowles  wrote:
>>
>>> Filed https://issues.apache.org/jira/browse/BEAM-6459 to record the
>>> conclusion. Doesn't require Beam knowledge so I labeled "starter".
>>>
>>> Kenn
>>>
>>> On Wed, Jan 16, 2019 at 12:14 AM Michael Luckey 
>>> wrote:
>>>
>>>> This seems to be on purpose [1]
>>>>
>>>> AFAIU setup is done to be able to call into setup.py clean. We probably
>>>> should work around that.
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1600-L1610
>>>>
>>>> On Wed, Jan 16, 2019 at 7:01 AM Manu Zhang 
>>>> wrote:
>>>>
>>>>> I have the same question. Sometimes even `./gradlew clean` fails due
>>>>> to failure of `setupVirtualEnv` tasks.
>>>>>
>>>>> Manu Zhang
>>>>> On Jan 16, 2019, 12:22 PM +0800, Kenneth Knowles ,
>>>>> wrote:
>>>>>
>>>>> A global `./gradlew clean` runs various `setupVirtualEnv` tasks that
>>>>> invoke things such as `setup.py bdist_wheel for grpcio-tools`. Overall it
>>>>> took 4 minutes. Is this intended?
>>>>>
>>>>> Kenn
>>>>>
>>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: gradle clean causes long-running python installs

2019-01-18 Thread Udi Meiri
setup.py has requirements like setuptools, which are installed in the
virtual environment.
So even running the clean command requires the virtualenv to be set up.

A possible fix could be to skip :beam-sdks-python:cleanPython if
setupVirtualenv has not been run. (perhaps by checking for the existence of
its output directory

)

On Wed, Jan 16, 2019 at 7:03 PM Kenneth Knowles  wrote:

> Filed https://issues.apache.org/jira/browse/BEAM-6459 to record the
> conclusion. Doesn't require Beam knowledge so I labeled "starter".
>
> Kenn
>
> On Wed, Jan 16, 2019 at 12:14 AM Michael Luckey 
> wrote:
>
>> This seems to be on purpose [1]
>>
>> AFAIU setup is done to be able to call into setup.py clean. We probably
>> should work around that.
>>
>> [1]
>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1600-L1610
>>
>> On Wed, Jan 16, 2019 at 7:01 AM Manu Zhang 
>> wrote:
>>
>>> I have the same question. Sometimes even `./gradlew clean` fails due to
>>> failure of `setupVirtualEnv` tasks.
>>>
>>> Manu Zhang
>>> On Jan 16, 2019, 12:22 PM +0800, Kenneth Knowles ,
>>> wrote:
>>>
>>> A global `./gradlew clean` runs various `setupVirtualEnv` tasks that
>>> invoke things such as `setup.py bdist_wheel for grpcio-tools`. Overall it
>>> took 4 minutes. Is this intended?
>>>
>>> Kenn
>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Adding KMS support to generic filesystem interface

2019-01-18 Thread Udi Meiri
Hi Ismaël,
I'd like your feedback, especially from the AWS perspective.
I wasn't aware of BEAM-3821, but I did create a JIRA for Cloud KMS support
on GCS: https://issues.apache.org/jira/browse/BEAM-5959

Some details of my plan for KMS support:
1. Add KMS settings to sources and sinks.
2. Add a --kmsKey flag that is passed to the runner and applies to pipeline
state.

On Fri, Jan 18, 2019 at 8:24 AM Ismaël Mejía  wrote:

> Hello Udi,
>
> I implemented the support for KMS in Amazon and I am really interested
> in check your PR. However I won't have time to do it until next
> monday. I hope waiting a bit is ok with you if you want some feedback
> from me.
>
> I am curious if you considered or are aware of this issue:
> BEAM-3821 Support a pluggable key management system (KMS)
> https://issues.apache.org/jira/browse/BEAM-3821
>
>
> On Fri, Jan 18, 2019 at 1:51 AM Udi Meiri  wrote:
> >
> > Hi,
> > I'd like to add support for creating files using a cloud Key Management
> System.
> > A KMS allows you to audit, create, rotate, and disable encryption keys.
> Both AWS and GCP have such a service..
> >
> > I wanted to show the community what I've been working on and see if
> there are any comments or objection before submitting a PR.
> >
> https://github.com/udim/beam/commit/d29f1ef26c58489416a2d413eb029596d96e1f25
> >
> > Reference docs:
> > AWS S3:
> https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html
> > GCP GCS:
> https://cloud.google.com/storage/docs/encryption/using-customer-managed-keys#add-object-key
>


smime.p7s
Description: S/MIME Cryptographic Signature


Confluence wiki edit access request

2019-01-18 Thread Udi Meiri
username: udim

Thanks!


smime.p7s
Description: S/MIME Cryptographic Signature


TestDirectRunner for Java?

2019-01-15 Thread Udi Meiri
Hi,
I want to use DirectRunner for a new IT I'm writing, since it's testing I/O
code that's runner agnostic. The problem is that DirectRunner doesn't have
a TestDataflowRunner analog, so features like OnSuccessMatcher aren't
available.

Any objections to adding a TestDirectRunner class?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Confluence wiki edit access request

2019-01-22 Thread Udi Meiri
bump

On Fri, Jan 18, 2019 at 1:57 PM Udi Meiri  wrote:

> username: udim
>
> Thanks!
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Udi Meiri
Some options:
- You could wait to assert until after p.waitForFinish().
- You could PAssert using SerializableMatcher and allow any
lastModifiedTime.

On Tue, Jan 22, 2019 at 3:56 PM Alex Amato  wrote:

> +Jeff, Eugene,
>
> Hi Jeff and Eugene,
>
> I've noticed that Jeff's PR
> 
>  introduced
> a race condition in this test, but its not clear exactly how to add Jeff's
> test check in a thread safe way. I believe this to be the source of the
> flakeyness Do you have any suggestions Eugene (since you authored this
> test)?
>
> I added some details to this JIRA issue explaining in full
> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>
>
> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato  wrote:
>
>> I've seen this fail in a few different PRs for different contributors,
>> and its causing some issues during the presubmit process.. This is a
>> multithreadred test with a lot of sleeps, so it looks a bit suspicious as
>> the source of the problem.
>>
>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>
>> I filed a JIRA for this issue:
>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-22 Thread Udi Meiri
Alex, the only way to implement my suggestion #1 (that I know of) would be
to write to a file and read it back.
I don't have good example for #2.

Eugene's suggestion no. 1 seems like a good idea. There are some example
<https://github.com/apache/beam/blob/324a1bcc820945731ccce7dd7e5354247b841356/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java#L335-L340>
in the codebase.

On Tue, Jan 22, 2019 at 5:16 PM Eugene Kirpichov 
wrote:

> Yeah the "List expected" is constructed
> from Files.getLastModifiedTime() calls before the files are actually
> modified, the code is basically unconditionally broken rather than merely
> flaky.
>
> There's several easy options:
> 1) Use PAssert.that().satisfies() instead of .contains(), and use
> assertThat().contains() inside that, with the list constructed at time the
> assertion is applied rather than declared.
> 2) Implement a Matcher that ignores last modified time and use
> that
>
> Jeff - your option #3 is unfortunately also race-prone, because the code
> may match the files after they have been written but before
> setLastModifiedTime was called.
>
> On Tue, Jan 22, 2019 at 5:08 PM Jeff Klukas  wrote:
>
>> Another option:
>>
>> #3 Have the writer thread call Files.setLastModifiedTime explicitly after
>> each File.write. Then the lastModifiedMillis can be a stable value for each
>> file and we can use those same static values in our expected result. I
>> think that would also eliminate the race condition.
>>
>> On Tue, Jan 22, 2019 at 7:48 PM Alex Amato  wrote:
>>
>>> Thanks Udi, is there a good example for either of these?
>>> #1 - seems like you have to rewrite your assertion logic without the
>>> PAssert? Is there some way to capture the pipeline output and iterate over
>>> it? The pattern I have seen for this in the past also has thread safety
>>> issues (Using a DoFn at the end of the pipeline to add the output to a
>>> collection is not safe since the collection can be executed concurrently)
>>> #2 - Would BigqueryMatcher be a good example for this? which is used in
>>> BigQueryTornadoesIT.java Or is there another example you would suggest
>>> looking at for reference?
>>>
>>>- I guess to this you need to implement the SerializableMatcher
>>>interface and use the matcher as an option in the pipeline options.
>>>
>>>
>>> On Tue, Jan 22, 2019 at 4:28 PM Udi Meiri  wrote:
>>>
>>>> Some options:
>>>> - You could wait to assert until after p.waitForFinish().
>>>> - You could PAssert using SerializableMatcher and allow any
>>>> lastModifiedTime.
>>>>
>>>> On Tue, Jan 22, 2019 at 3:56 PM Alex Amato  wrote:
>>>>
>>>>> +Jeff, Eugene,
>>>>>
>>>>> Hi Jeff and Eugene,
>>>>>
>>>>> I've noticed that Jeff's PR
>>>>> <https://github.com/apache/beam/commit/410d6c7b5f933dcb0280894553c1e576ee4e4884>
>>>>>  introduced
>>>>> a race condition in this test, but its not clear exactly how to add Jeff's
>>>>> test check in a thread safe way. I believe this to be the source of the
>>>>> flakeyness Do you have any suggestions Eugene (since you authored this
>>>>> test)?
>>>>>
>>>>> I added some details to this JIRA issue explaining in full
>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>
>>>>>
>>>>> On Tue, Jan 22, 2019 at 3:34 PM Alex Amato  wrote:
>>>>>
>>>>>> I've seen this fail in a few different PRs for different
>>>>>> contributors, and its causing some issues during the presubmit process..
>>>>>> This is a multithreadred test with a lot of sleeps, so it looks a bit
>>>>>> suspicious as the source of the problem.
>>>>>>
>>>>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3688/testReport/org.apache.beam.sdk.io/FileIOTest/testMatchWatchForNewFiles/
>>>>>>
>>>>>> I filed a JIRA for this issue:
>>>>>> https://jira.apache.org/jira/browse/BEAM-6491?filter=-2
>>>>>>
>>>>>>
>>>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Confluence wiki edit access request

2019-01-23 Thread Udi Meiri
Thank you!

On Wed, Jan 23, 2019 at 12:49 PM Ismaël Mejía  wrote:

>  Done.
>
> On Tue, Jan 22, 2019 at 8:53 PM Udi Meiri  wrote:
> >
> > bump
> >
> > On Fri, Jan 18, 2019 at 1:57 PM Udi Meiri  wrote:
> >>
> >> username: udim
> >>
> >> Thanks!
>


smime.p7s
Description: S/MIME Cryptographic Signature


Adding KMS support to generic filesystem interface

2019-01-17 Thread Udi Meiri
Hi,
I'd like to add support for creating files using a cloud Key Management
System.
A KMS allows you to audit, create, rotate, and disable encryption keys.
Both AWS and GCP have such a service..

I wanted to show the community what I've been working on and see if there
are any comments or objection before submitting a PR.
https://github.com/udim/beam/commit/d29f1ef26c58489416a2d413eb029596d96e1f25

Reference docs:
AWS S3:
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingKMSEncryption.html
GCP GCS:
https://cloud.google.com/storage/docs/encryption/using-customer-managed-keys#add-object-key


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Issue with publishing maven artefacts locally

2018-12-12 Thread Udi Meiri
On Wed, Dec 12, 2018 at 11:00 AM Scott Wegner  wrote:

> Thanks for pointing this out Alexy. This seems like we unintentionally
> broke something in PR#7197 [1]
>
> +Garrett Jones , who authored the change.
> Garrett can you help investigate?
>
> I went to check to see if we have any existing Jenkins jobs that would've
> caught this break. It seems the beam_Release_Gradle_NightlySnapshot job [2]
> has been failing for the last 10 days. Has anybody looked into this?
>
See "Beam snapshots broken" thread.


>
> [1] https://github.com/apache/beam/pull/7197
> [2] https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/
>
> On Wed, Dec 12, 2018 at 5:57 AM Alexey Romanenko 
> wrote:
>
>> Hi all,
>>
>> I used to publish maven artefacts into local repository using this kind
>> of command for example:
>>
>> *./gradlew -Ppublishing --no-parallel
>> -PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/
>> -p sdks/java/io/kafka/ publish*
>>
>> It worked fine till today. Seems like (according to "git bisect”) this
>> recent commit [1] introduced new functionality and now it fails with an
>> error:
>>
>>
>>
>>
>> ** What went wrong:A problem occurred configuring project
>> ':beam-sdks-java-io-kafka'.> Exception thrown while executing model rule:
>> PublishingPluginRules#publishing(ExtensionContainer)   > Cannot set the
>> value of read-only property 'repositories' for object of type
>> org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.*
>>
>> Does anyone know if this is a bug or I should use another command for the
>> same purposes?
>>
>>
>> [1]
>> https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9
>>
>>
>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


smime.p7s
Description: S/MIME Cryptographic Signature


Java performance tests dashboard

2018-12-12 Thread Udi Meiri
Hi Lukasz,
I was looking for statistics on I/O performance for writes of many files
(~10k) on GCS.

I found this dashboard

and
I have some questions.
1. The tests that are "local filesystem" seem to be running on Dataflow and
writing to GCS - is it okay to rename them to be officially GCS tests?
2. Is it okay if I add additional GCS tests to this dashboard?


smime.p7s
Description: S/MIME Cryptographic Signature


excessive java precommit logging

2018-12-19 Thread Udi Meiri
Hi all,
I'd like to reduce precommit log sizes on Jenkins. For example:
https://builds.apache.org/job/beam_PreCommit_Java_Commit/3181/consoleFull
is 79M, which makes Chrome sluggish to use on it (tab is constantly using a
whole cpu core).

I know this might be controversial, but I'd like to propose to remove the
--info flag from the gradlew command line.


smime.p7s
Description: S/MIME Cryptographic Signature


Re: excessive java precommit logging

2018-12-19 Thread Udi Meiri
The gradle scan doesn't pinpoint the error message, and it doesn't contain
all the lines: https://scans.gradle.com/s/ckhjrjdexpuzm/console-log

The logs might be useful, but usually not from passing tests. Doesn't
gradle log output from failed tests by default?

On Wed, Dec 19, 2018 at 1:22 PM Thomas Weise  wrote:

> I usually follow the download procedure outlined by Scott to look at the
> logs.
>
> These logs are big, but when there is a problem it is sometimes essential
> to have the extra output, especially for less frequent flakes.
>
> Reducing logs would then require the author to add extra logging to the PR
> (and attempt to reproduce), which is also not nice.
>
> Thomas
>
>
> On Wed, Dec 19, 2018 at 11:47 AM Scott Wegner  wrote:
>
>> I'm not sure what we lose by dropping the --info flag, but I generally
>> worry about reducing log output since logs are the main resource for
>> diagnosing Jenkins build errors.
>>
>> It seems the issue is that Chrome doesn't scale well to large log files.
>> A few alternative solutions:
>>
>> 1. Use the produced Build Scan (example: [1]) instead of the raw console
>> log. The build scan is quite useful at pointing to what actually failed,
>> and filtering log output for only that task.
>> 2. Instead of consoleFull, use consoleText ("View as plain text" link in
>> Jenkins), which seems to be much easier on Chrome
>> 3. Download the consoleText output locally and use your favorite log
>> viewer that can scale to large files.
>>
>> [1] https://gradle.com/s/ckhjrjdexpuzm
>>
>> On Wed, Dec 19, 2018 at 10:42 AM Udi Meiri  wrote:
>>
>>> Hi all,
>>> I'd like to reduce precommit log sizes on Jenkins. For example:
>>> https://builds.apache.org/job/beam_PreCommit_Java_Commit/3181/consoleFull
>>> is 79M, which makes Chrome sluggish to use on it (tab is constantly
>>> using a whole cpu core).
>>>
>>> I know this might be controversial, but I'd like to propose to remove
>>> the --info flag from the gradlew command line.
>>>
>>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-06 Thread Udi Meiri
For DirectRunner there are regressions in query 7 sql direct runner batch
mode
<https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424=732741424=411089194>
(2x)
and streaming mode (5x).


On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:

> I see a regression for query 7 spark runner batch mode
> <https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712=1782465104=462502368>
>  on
> about 2018-11-13.
> [image: image.png]
>
> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
> wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the version 2.9.0,
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.9.0-RC1" [5],
>> * website pull request listing the release [6] and publishing the API
>> reference manual [7].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.9.0 release to help with validation
>> [7].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1054/
>> [5] https://github.com/apache/beam/tree/v2.9.0-RC1
>> [6] https://github.com/apache/beam/pull/7215
>> [7] https://github.com/apache/beam-site/pull/584
>> [8]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-06 Thread Udi Meiri
I see a regression for query 7 spark runner batch mode

on
about 2018-11-13.
[image: image.png]

On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 2.9.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.9.0-RC1" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.9.0 release to help with validation
> [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Cham
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
> [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1054/
> [5] https://github.com/apache/beam/tree/v2.9.0-RC1
> [6] https://github.com/apache/beam/pull/7215
> [7] https://github.com/apache/beam-site/pull/584
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Need help regarding memory leak issue

2018-11-16 Thread Udi Meiri
If you're working with Dataflow, it supports this flag:
https://github.com/apache/beam/blob/75e9f645c7bec940b87b93f416823b020e4c5f69/sdks/python/apache_beam/options/pipeline_options.py#L602
which uses guppy for heap profiling.

On Fri, Nov 16, 2018 at 3:08 PM Ruoyun Huang  wrote:

> Even tough the algorithm works on your batch system, did you verify
> anything that can rule out the possibility where it is the underlying ML
> package causing the memory leak?
>
> If not, maybe replace your prediction with a dummy function which does not
> load any model at all, and always just give the same prediction. Then do
> the same plotting, let us see what it looks like. And a plus with version
> two: still a dummy prediction, but with model loaded.Given we don't
> have much clue at this stage, at least this probably can give us more
> confidence in whether it is the underlying ML package causing the issue, or
> from beam sdk. just my 2 cents.
>
>
> On Thu, Nov 15, 2018 at 4:54 PM Rakesh Kumar  wrote:
>
>> Thanks for responding Ruoyun,
>>
>> We are not sure yet who is causing the leak, but once we run out of the
>> memory then sdk worker crashes and pipeline is forced to restart. Check the
>> memory usage patterns in the attached image. Each line in that graph is
>> representing one task manager host.
>>  You are right we are running the models for predictions.
>>
>> Here are few observations:
>>
>> 1. All the tasks manager memory usage climb over time but some of the
>> task managers' memory climb really fast because they are running the ML
>> models. These models are definitely using memory intensive data structure
>> (pandas data frame etc) hence their memory usage climb really fast.
>> 2. We had almost the same code running in different infrastructure
>> (non-streaming) that doesn't cause any memory issue.
>> 3. Even when the pipeline has restarted, the memory is not released. It
>> is still hogged by something. You can notice in the attached image that
>> pipeline restarted around 13:30. At that time it is definitely released
>> some portion of the memory but didn't completely released all memory.
>> Notice that, when the pipeline was originally started, it started with 30%
>> of the memory but when got restarted by the job manager it started with 60%
>> of the memory.
>>
>>
>>
>> On Thu, Nov 15, 2018 at 3:31 PM Ruoyun Huang  wrote:
>>
>>> trying to understand the situation you are having.
>>>
>>> By saying 'kills the appllication', is that a leak in the application
>>> itself, or the workers being the root cause?  Also are you running ML
>>> models inside Python SDK DoFn's?  Then I suppose it is running some
>>> predictions rather than model training?
>>>
>>> On Thu, Nov 15, 2018 at 1:08 PM Rakesh Kumar 
>>> wrote:
>>>
 I am using *Beam Python SDK *to run my app in production. The app is
 running machine learning models. I am noticing some memory leak which
 eventually kills the application. I am not sure the source of memory leak.
 Currently, I am using object graph
  to dump the memory
 stats. I hope I will get some useful information out of this. I have also
 looked into Guppy library  and they
 are almost the same.

 Do you guys have any recommendation for debugging this issue? Do we
 have any tooling in the SDK that can help to debug it?
 Please feel free to share your experience if you have debugged similar
 issues in past.

 Thank you,
 Rakesh

>>>
>>>
>>> --
>>> 
>>> Ruoyun  Huang
>>>
>>>
>
> --
> 
> Ruoyun  Huang
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Enforce javadoc comments in public methods?

2019-01-07 Thread Udi Meiri
+1

On Mon, Jan 7, 2019 at 4:49 PM Daniel Oliveira 
wrote:

> +1
>
> I like this idea, especially with the line number requirement. The exact
> number of lines is debatable, but you could go as low as 10 lines and that
> would exclude any trivial setters and getters. Even better might be if it's
> possible to configure checkstyle to ignore this for getters and setters (I
> don't know if checkstyle supports this, but I know that other tools are
> able to auto-detect getters and setters).
>
> I'm not dead-set against having annotation to suppress the comment, but it
> carries the risk that code will be left un-commented because both the dev
> and reviewer think it's self-explanatory, and then someone new to the
> codebase finds it confusing.
>
> On Mon, Jan 7, 2019 at 11:31 AM Ankur Goenka  wrote:
>
>> I think it makes sense.
>> Having an annotation to suppress this check for a method/class instead of
>> adding trivial comment would be useful.
>>
>> On Mon, Jan 7, 2019 at 9:53 AM Ruoyun Huang  wrote:
>>
>>> Yeah. Agree there is no reason to enforce anything for trivial methods
>>> like setter/getter.
>>>
>>> What I meant is to enforce only for a method that is *BOTH* 1) public
>>> method 2) has longer than N lines.
>>>
>>> sorry for not making the proposal clear enough in the original message,
>>> it should've better titled "enforce ... on non-trivial public methods".
>>>
>>>
>>>
>>> On Mon, Jan 7, 2019 at 1:31 AM Robert Bradshaw 
>>> wrote:
>>>
 IMHO, requiring comments on trivial methods like setters and getters
 is often a net negative, but setting some standard could be useful.

 On Mon, Jan 7, 2019 at 7:35 AM Jean-Baptiste Onofré 
 wrote:
 >
 > Hi,
 >
 > for the presence of a comment on public method, it's a good idea. Now,
 > about the number of lines, not sure it's a good idea. I'm thinking
 about
 > the getter/setter which are public. Most of the time, the comment is
 > pretty simple (and useless ;)).
 >
 > Regards
 > JB
 >
 > On 07/01/2019 04:35, Ruoyun Huang wrote:
 > > Hi, everyone,
 > >
 > >
 > > We were wondering whether it is a good idea to make checkstyle
 > > enforce public method comments. Our current behavior of JavaDoc
 check is:
 > >
 > >  1.
 > >
 > > Missing Class javadoc comment is reported as error.
 > >
 > >  2.
 > >
 > > Method comment missing is explicitly allowed. see [1].  It is
 not
 > > even shown as warning.
 > >
 > >  3.
 > >
 > > The actual javadoc target gives warning when certain tags are
 > > missing in javadoc, but not if the whole comment is missing.
 > >
 > >
 > >How about we enforce method comments for **1) public method and
 2)
 > > method that is longer than N lines**. (N=~30 seems a good number,
 > > leading to ~50 violations in current repository). I can find out the
 > > corresponding contributors to fill in the missing comments, before
 we
 > > turning the check fully on.
 > >
 > >
 > >One caveat though is that we might want skip this check on test
 code,
 > > but I am not sure yet if our current setup can easily handle
 separated
 > > rules for main code versus test code.
 > >
 > >
 > > Is this a good idea?  Thoughts and suggestions?
 > >
 > >
 > > [1]
 > >
 https://github.com/apache/beam/blame/5ceffb246c0c38ad68dd208e951a1f39c90ef85c/sdks/java/build-tools/src/main/resources/beam/checkstyle.xml#L111
 > >
 > >
 > > Cheers,
 > >
 >
 > --
 > Jean-Baptiste Onofré
 > jbono...@apache.org
 > http://blog.nanthrax.net
 > Talend - http://www.talend.com

>>>
>>>
>>> --
>>> 
>>> Ruoyun  Huang
>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: excessive java precommit logging

2019-01-04 Thread Udi Meiri
To follow up, I did some research yesterday on removing --info and my
findings are:
- Gradle Test tasks generate HTML and Junit XML reports. Both contain a
stacktrace, STDOUT, and STDERR of the failed test (example
<https://builds.apache.org/job/beam_PreCommit_Java_Phrase/515/testReport/junit/org.apache.beam.runners.samza.runtime/SamzaStoreStateInternalsTest/testSetStateIterator/>
).
So even though --info wasn't specified the output are not lost.
- Python SDK tests don't use Test tasks (they exec tox), and thus are not
affected by --info. Python tests aren't excessively verbose however.
- Go tests should also generate reports (via gogradle), but I haven't found
any and I can't seem to run ./gradlew :beam-sdks-go:test on my workstation.

Suggestion:
- Remove --info (https://github.com/apache/beam/pull/7409)
- If we find Gradle tasks that aren't somehow reporting or logging to
console on failure, that's a bug and the task should be fixed.

On Thu, Dec 20, 2018 at 6:09 AM Kenneth Knowles  wrote:

> I support lowering the log level. The default is `lifecycle`.
>
> For reference, here's where it was increased to `info`:
> https://github.com/apache/beam/pull/4644
>
> The reason was to see more details about Gradle's dependency management.
> We were seeing dependency download flakes on things that should not require
> re-downloading. No longer an issue.
>
> To easily tweak  it on a one-off basis without having to change a Jenkins
> job, you can edit gradle.properties in a commit on your PR:
>
> org.gradle.logging.level=(quiet,warn,lifecycle,info,debug)
> org.gradle.warning.mode=(all,none,summary)
> org.gradle.console=(auto,plain,rich,verbose)
> org.gradle.caching.debug=(true,false)
>
> Kenn
>
> On Thu, Dec 20, 2018 at 6:49 AM Robert Bradshaw 
> wrote:
>
>> Interestingly, I was thinking exactly the same thing the other day.
>>
>> If we could drop the info logs for passing tests, that would be ideal.
>> Regardless, tests should fail (when possible) with actionable
>> messages. I think the rare case of not being able to reproduce the
>> error locally if info logs are needed makes it OK to go and add
>> logging to jenkins as a one-off. (If it's about jenkins build errors,
>> perhaps we could build with higher verbosity before testing with a
>> lower one.)
>> On Thu, Dec 20, 2018 at 11:24 AM Maximilian Michels 
>> wrote:
>> >
>> > Thanks Udi for bringing this up. I'm also for dropping INFO. It's just
>> > incredible verbose. More importantly, from my experience the INFO level
>> doesn't
>> > help debugging problems, but it makes finding errors messages or
>> warnings harder.
>> >
>> > That said, here's what I do to search through the log:
>> >
>> > 1) curl /consoleText | less
>> >
>> > This is when I just want to quickly look for something.
>> >
>> > 2) curl /consoleText > log.txt
>> > less log.txt
>> >
>> > Here we store the log to a file first, then use 'less' or 'grep' to
>> search it.
>> >
>> > When in 'less', I use '/' to grep through the lines. Pressing 'n' or
>> 'N' gets
>> > you forward and back in the search results.
>> >
>> > That works pretty well, but I think we would do us a favor by dropping
>> the log
>> > level. Shall we try it out?
>> >
>> > -Max
>> >
>> > On 19.12.18 23:27, Udi Meiri wrote:
>> > > The gradle scan doesn't pinpoint the error message, and it doesn't
>> contain all
>> > > the lines: https://scans.gradle.com/s/ckhjrjdexpuzm/console-log
>> > >
>> > > The logs might be useful, but usually not from passing tests. Doesn't
>> gradle log
>> > > output from failed tests by default?
>> > >
>> > > On Wed, Dec 19, 2018 at 1:22 PM Thomas Weise > > > <mailto:t...@apache.org>> wrote:
>> > >
>> > > I usually follow the download procedure outlined by Scott to look
>> at the logs.
>> > >
>> > > These logs are big, but when there is a problem it is sometimes
>> essential to
>> > > have the extra output, especially for less frequent flakes.
>> > >
>> > > Reducing logs would then require the author to add extra logging
>> to the PR
>> > > (and attempt to reproduce), which is also not nice.
>> > >
>> > > Thomas
>> > >
>> > >
>> > > On Wed, Dec 19, 2018 at 11:47 AM Scott Wegner > > > <mailto:sc...@apache.org>> wrote:
>> > >
>> > > I

Re: Add code quality checks to pre-commits.

2019-01-03 Thread Udi Meiri
+1 for adding more code quality signals. Could we add them in an
advisory-only mode at first? (a warning and not an error)

I'm curious how the "technical debt" metric is determined.

I'm not familiar with SonarQube. What languages does it support?

On Thu, Jan 3, 2019 at 10:19 AM Mikhail Gryzykhin  wrote:

> Hi everyone,
>
> In our current builds we (can) run multiple code quality checks tools like
> checkstyle, findbugs, code test coverage via cubertura. However we do not
> utilize many of those signals.
>
> I suggest to add requirements to code based on those tools. Specifically,
> I suggest to add pre-commit checks that will require PRs to conform to some
> quality checks.
>
> We can see good example of thresholds to add at Apache SonarQube provided
> default quality gate config
> :
> 80% tests coverage on new code,
> 5% technical technical debt on new code,
> No bugs/Vulnerabilities added.
>
> As another part of this proposal, I want to suggest the use of SonarQube
> for tracking code statistics and as agent for enforcing code quality
> thresholds. It is Apache provided tool that has integration with Jenkins or
> Gradle via plugins.
>
> I believe some reporting to SonarQube was configured for mvn builds of
> some of Beam sub-projects, but was lost during migration to gradle.
>
> I was looking for other options, but so far found only general configs to
> gradle builds that will fail build if code coverage for project is too low.
> Such approach will force us to backfill tests for all existing code that
> can be tedious and demand learning of all legacy code that might not be
> part of current work.
>
> I suggest to discuss and come to conclusion on two points in this tread:
> 1. Do we want to add code quality checks to our pre-commit jobs and
> require them to pass before PR is merged?
>
> Suggested: Add code quality checks listed above at first, adjust them as
> we see fit in the future.
>
> 2. What tools do we want to utilize for analyzing code quality?
>
> Under discussion. Suggested: SonarQube, but will depend on functionality
> level we want to achieve.
>
>
> Regards,
> --Mikhail
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: workspace cleanups needed on jenkins master

2019-01-03 Thread Udi Meiri
On Thu, Dec 27, 2018 at 11:02 AM Ismaël Mejía  wrote:

> Bringing this subject for awareness to dev@
> We are sadly part of this top.
> Does somebody know what this data is? And if we can clean it periodically?
> Can somebody with more sysadmin super powers take a look and act on this.
>
> -- Forwarded message -
> From: Chris Lambertus 
> Date: Thu, Dec 27, 2018 at 1:36 AM
> Subject: workspace cleanups needed on jenkins master
> To: 
>
>
> All,
>
> The Jenkins master needs to be cleaned up. Could the following
> projects please reduce your usage significantly by 5 January. After 5
> Jan Infra will be purging more aggressively and updating job
> configurations as needed. As a rule of thumb, we’d like to see
> projects retain no more than 1 week or 7 builds worth of historical
> data at the absolute maximum. Larger projects should retain less to
> avoid using up a disproportionate amount of space on the master.
>
> Some workspaces without any identifiable associated Project will be
> removed.
>
>
>
> 3911 GB .
> 275 GB ./incubator-netbeans-linux
> 270 GB ./pulsar-website-build
> 249 GB ./pulsar-master
> 199 GB ./Packaging
> 127 GB ./HBase
> 121 GB ./PreCommit-ZOOKEEPER-github-pr-build
> 107 GB ./Any23-trunk
> 102 GB ./incubator-netbeans-release
> 79 GB ./incubator-netbeans-linux-experiment
> 77 GB ./beam_PostCommit_Java_PVR_Flink_Batch
>

Wow, this job has huge logs (400MB+).
https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Java_PVR_Flink_Batch/330/console
A few weeks back I suggested removing the --info flag passed to Gradle.
I haven't done that yet, but it might help reduce the size of logs. (which
I assume are all stored on master?)

Short-term, we could reduce retention back down to 14 instead of the current
30

.
+Alan Myrvold  +Mikhail Gryzykhin   we
don't need 30 days retention any longer for test dashboards, right?


> 76 GB ./HBase-1.3-JDK8
> 70 GB ./Jackrabbit-Oak-Windows
> 70 GB ./stanbol-0.12
> 59 GB ./HBase-Find-Flaky-Tests
> 54 GB ./CouchDB
> 51 GB ./beam_PostCommit_Java_PVR_Flink_Streaming
> 48 GB ./incubator-netbeans-windows
> 47 GB ./FlexJS
> 42 GB ./HBase
> 41 GB ./pulsar-pull-request
> 37 GB ./ZooKeeper_branch35_jdk8
> 32 GB ./HBase-Flaky-Tests
> 32 GB ./Atlas-master-NoTests
> 31 GB ./Atlas-1.0-NoTests
> 31 GB ./beam_PreCommit_Java_Commit
> 30 GB ./Zookeeper_UT_Java18
> 29 GB ./Phoenix-4.x-HBase-1.4
> 28 GB ./HBase-2.0-hadoop3-tests
> 27 GB ./flink-github-ci
> 27 GB ./ZooKeeper_branch35_jdk7
> 27 GB ./oodt-trunk
> 25 GB ./opennlp
> 25 GB ./Trinidad
> 22 GB ./Phoenix-4.x-HBase-1.3
> 21 GB ./ZooKeeper_Flaky_StressTest
> 21 GB ./Atlas-master-AllTests
> 21 GB ./beam_PostCommit_Java_ValidatesRunner_Flink
> 20 GB ./HBase-1.3-JDK7
> 20 GB ./PreCommit-HBASE-Build
> 18 GB ./hadoop-trunk-win
> 18 GB ./HBase-1.2-JDK7
> 18 GB ./HBASE-14070.HLC
> 18 GB ./maven-box
> 17 GB ./Atlas-1.0-AllTests
> 17 GB ./Archiva-TLP-Gitbox
> 17 GB ./Apache
> 17 GB ./Phoenix-5.x-HBase-2.0
> 17 GB ./Phoenix-omid2
> 16 GB ./Lucene-Solr-BadApples-NightlyTests-7.x
> 15 GB ./HBase-2.0
> 14 GB ./flume-trunk
> 14 GB ./beam_PostCommit_Java_ValidatesRunner_Samza
> 14 GB ./HBase-Trunk_matrix
> 13 GB ./commons-csv
> 13 GB ./HBase-Flaky-Tests-old-just-master
> 13 GB ./oodt-coverage
> 12 GB ./incubator-rya-master-with-optionals
> 12 GB ./Syncope-master-deploy
> 11 GB ./PreCommit-PHOENIX-Build
> 11 GB ./Stratos-Master-Nightly-Build
> 11 GB ./Phoenix-master
> 11 GB ./Hadoop-trunk-JACC
> 10 GB ./ctakes-trunk-package
> 10 GB ./FlexJS
> 10 GB ./Atlas-1.0-IntegrationTests
> 9 GB ./incubator-rya-master
> 9 GB ./Atlas-master-IntegrationTests
> 9 GB ./beam_PostCommit_Java_ValidatesRunner_Spark
> 9 GB ./ZooKeeper_UT_Java7
> 9 GB ./Qpid-Broker-J-7.0.x-TestMatrix
> 9 GB ./oodt-dependency-update
> 9 GB ./Apache
> 8 GB ./Struts-examples-JDK8-master
> 8 GB ./Phoenix-4.x-HBase-1.2
> 8 GB ./flume-github-pull-request
> 8 GB ./HBase-HBASE-14614
> 8 GB ./tika-trunk-jdk1.7
> 8 GB ./HBase-1.2-JDK8
> 8 GB ./HBase-1.5
> 7 GB ./Atlas-master-UnitTests
> 7 GB ./tika-2.x-windows
> 7 GB ./incubator-rya-master-with-optionals-pull-requests
> 7 GB ./Hive-trunk
> 7 GB ./beam_PreCommit_Java_Cron
> 7 GB ./Atlas-1.0-UnitTests
> 6 GB ./Jackrabbit
> 6 GB ./beam_PostCommit_Java_PVR_Flink_PR
> 6 GB ./Lucene-Solr-Clover-master
> 6 GB ./Syncope-2_0_X-deploy
> 6 GB ./beam_PostCommit_Java_ValidatesRunner_Apex
> 6 GB ./Tika-trunk
> 6 GB ./pirk
> 6 GB ./Syncope-2_1_X-deploy
> 6 GB ./PLC4X
> 6 GB ./myfaces-current-2.0-integration-tests
> 5 GB ./commons-lang
> 5 GB ./Nemo
> 5 GB ./Mesos-Buildbot
> 5 GB ./Qpid-Broker-J-7.1.x-TestMatrix
> 5 GB ./beam_PostCommit_Java_Nexmark_Flink
> 5 GB ./Qpid-Broker-J-TestMatrix
> 5 GB ./ZooKeeper-Hammer
> 5 GB ./Camel
> 5 GB ./Royale
> 5 GB ./tika-branch-1x
> 5 GB ./ManifoldCF-ant
> 5 GB ./PreCommit-SQOOP-Build
> 5 GB ./HBase-1.4
> 5 GB ./ZooKeeper_UT_Stress
> 4 GB 

Re: workspace cleanups needed on jenkins master

2019-01-03 Thread Udi Meiri
Alan, Mikhail, feel free to merge https://github.com/apache/beam/pull/7410
when ready.

On Thu, Jan 3, 2019 at 5:39 PM Mikhail Gryzykhin  wrote:

> It is not required for https://s.apache.org/beam-community-metrics .
>
> I believe that's main dash we have atm.
>
> @Alan Myrvold  Can you confirm?
>
> --Mikhail
>
> Have feedback <http://go/migryz-feedback>?
>
>
> On Thu, Jan 3, 2019 at 1:59 PM Udi Meiri  wrote:
>
>>
>>
>> On Thu, Dec 27, 2018 at 11:02 AM Ismaël Mejía  wrote:
>>
>>> Bringing this subject for awareness to dev@
>>> We are sadly part of this top.
>>> Does somebody know what this data is? And if we can clean it
>>> periodically?
>>> Can somebody with more sysadmin super powers take a look and act on this.
>>>
>>> -- Forwarded message -
>>> From: Chris Lambertus 
>>> Date: Thu, Dec 27, 2018 at 1:36 AM
>>> Subject: workspace cleanups needed on jenkins master
>>> To: 
>>>
>>>
>>> All,
>>>
>>> The Jenkins master needs to be cleaned up. Could the following
>>> projects please reduce your usage significantly by 5 January. After 5
>>> Jan Infra will be purging more aggressively and updating job
>>> configurations as needed. As a rule of thumb, we’d like to see
>>> projects retain no more than 1 week or 7 builds worth of historical
>>> data at the absolute maximum. Larger projects should retain less to
>>> avoid using up a disproportionate amount of space on the master.
>>>
>>> Some workspaces without any identifiable associated Project will be
>>> removed.
>>>
>>>
>>>
>>> 3911 GB .
>>> 275 GB ./incubator-netbeans-linux
>>> 270 GB ./pulsar-website-build
>>> 249 GB ./pulsar-master
>>> 199 GB ./Packaging
>>> 127 GB ./HBase
>>> 121 GB ./PreCommit-ZOOKEEPER-github-pr-build
>>> 107 GB ./Any23-trunk
>>> 102 GB ./incubator-netbeans-release
>>> 79 GB ./incubator-netbeans-linux-experiment
>>> 77 GB ./beam_PostCommit_Java_PVR_Flink_Batch
>>>
>>
>> Wow, this job has huge logs (400MB+).
>> https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Java_PVR_Flink_Batch/330/console
>> A few weeks back I suggested removing the --info flag passed to Gradle.
>> I haven't done that yet, but it might help reduce the size of logs.
>> (which I assume are all stored on master?)
>>
>> Short-term, we could reduce retention back down to 14 instead of the current
>> 30
>> <https://github.com/apache/beam/blob/5716dba6d10f32dfa8ab807ffacc2e68f9267225/.test-infra/jenkins/CommonJobProperties.groovy#L46>
>> .
>> +Alan Myrvold  +Mikhail Gryzykhin
>>   we don't need 30 days retention any longer for test
>> dashboards, right?
>>
>>
>>> 76 GB ./HBase-1.3-JDK8
>>> 70 GB ./Jackrabbit-Oak-Windows
>>> 70 GB ./stanbol-0.12
>>> 59 GB ./HBase-Find-Flaky-Tests
>>> 54 GB ./CouchDB
>>> 51 GB ./beam_PostCommit_Java_PVR_Flink_Streaming
>>> 48 GB ./incubator-netbeans-windows
>>> 47 GB ./FlexJS
>>> 42 GB ./HBase
>>> 41 GB ./pulsar-pull-request
>>> 37 GB ./ZooKeeper_branch35_jdk8
>>> 32 GB ./HBase-Flaky-Tests
>>> 32 GB ./Atlas-master-NoTests
>>> 31 GB ./Atlas-1.0-NoTests
>>> 31 GB ./beam_PreCommit_Java_Commit
>>> 30 GB ./Zookeeper_UT_Java18
>>> 29 GB ./Phoenix-4.x-HBase-1.4
>>> 28 GB ./HBase-2.0-hadoop3-tests
>>> 27 GB ./flink-github-ci
>>> 27 GB ./ZooKeeper_branch35_jdk7
>>> 27 GB ./oodt-trunk
>>> 25 GB ./opennlp
>>> 25 GB ./Trinidad
>>> 22 GB ./Phoenix-4.x-HBase-1.3
>>> 21 GB ./ZooKeeper_Flaky_StressTest
>>> 21 GB ./Atlas-master-AllTests
>>> 21 GB ./beam_PostCommit_Java_ValidatesRunner_Flink
>>> 20 GB ./HBase-1.3-JDK7
>>> 20 GB ./PreCommit-HBASE-Build
>>> 18 GB ./hadoop-trunk-win
>>> 18 GB ./HBase-1.2-JDK7
>>> 18 GB ./HBASE-14070.HLC
>>> 18 GB ./maven-box
>>> 17 GB ./Atlas-1.0-AllTests
>>> 17 GB ./Archiva-TLP-Gitbox
>>> 17 GB ./Apache
>>> 17 GB ./Phoenix-5.x-HBase-2.0
>>> 17 GB ./Phoenix-omid2
>>> 16 GB ./Lucene-Solr-BadApples-NightlyTests-7.x
>>> 15 GB ./HBase-2.0
>>> 14 GB ./flume-trunk
>>> 14 GB ./beam_PostCommit_Java_ValidatesRunner_Samza
>>> 14 GB ./HBase-Trunk_matrix
>>> 13 GB ./commons-csv
>>> 13 GB ./HBase-Flaky-Tests-old-just-master
>>> 13 GB ./oodt-coverage
>>> 12 

Re: Cross-language pipelines

2019-01-22 Thread Udi Meiri
Also debugability: collecting logs from each of these systems.

On Tue, Jan 22, 2019 at 10:53 AM Chamikara Jayalath 
wrote:

> Thanks Robert.
>
> On Tue, Jan 22, 2019 at 4:39 AM Robert Bradshaw 
> wrote:
>
>> Now that we have the FnAPI, I started playing around with support for
>> cross-language pipelines. This will allow things like IOs to be shared
>> across all languages, SQL to be invoked from non-Java, TFX tensorflow
>> transforms to be invoked from non-Python, etc. and I think is the next
>> step in extending (and taking advantage of) the portability layer
>> we've developed. These are often composite transforms whose inner
>> structure depends in non-trivial ways on their configuration.
>>
>
> Some additional benefits of cross-language transforms are given below.
>
> (1) Current large collection of Java IO connectors will be become
> available to other languages.
> (2) Current Java and Python transforms will be available for Go and any
> other future SDKs.
> (3) New transform authors will be able to pick their language of choice
> and make their transform available to all Beam SDKs. For example, this can
> be the language the transform author is most familiar with or the only
> language for which a client library is available for connecting to an
> external data store.
>
>
>> I created a PR [1] that basically follows the "expand via an external
>> process" over RPC alternative from the proposals we came up with when
>> we were discussing this last time [2]. There are still some unknowns,
>> e.g. how to handle artifacts supplied by an alternative SDK (they
>> currently must be provided by the environment), but I think this is a
>> good incremental step forward that will already be useful in a large
>> number of cases. It would be good to validate the general direction
>> and I would be interested in any feedback others may have on it.
>>
>
> I think there are multiple semi-dependent problems we have to tackle to
> reach the final goal of supporting fully-fledged cross-language transforms
> in Beam. I agree with taking an incremental approach here with overall
> vision in mind. Some other problems we have to tackle involve following.
>
> * Defining a user API that will allow pipelines defined in a SDK X to use
> transforms defined in SDK Y.
> * Update various runners to use URN/payload based environment definition
> [1]
> * Updating various runners to support starting containers for multiple
> environments/languages for the same pipeline and supporting executing
> pipeline steps in containers started for multiple environments.
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L952
>
>
>
>
>
>
>
>
>>
>> - Robert
>>
>> [1] https://github.com/apache/beam/pull/7316
>> [2] https://s.apache.org/beam-mixed-language-pipelines
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PROPOSAL] Standardize Gradle structure in Python SDK

2019-03-29 Thread Udi Meiri
I don't use gradle commands for Python development either, because they are
slow (no incremental testing).



On Fri, Mar 29, 2019 at 9:16 AM Michael Luckey  wrote:

>
>
> On Fri, Mar 29, 2019 at 2:31 PM Robert Bradshaw 
> wrote:
>
>> On Fri, Mar 29, 2019 at 12:54 PM Michael Luckey 
>> wrote:
>> >
>> > Really like the idea of improving here.
>> >
>> > Unfortunately, I haven't worked with python on that scale yet, so bear
>> with my naive understandings in this regard. If I understand correctly, the
>> suggestion will result in a couple of projects consisting only of a
>> build,gradle file to kind of workaround on gradles decision not to
>> parallelize within projects, right? In consequence, this also kind of
>> decouples projects from their content - they stuff which constitutes the
>> project - and forces the build file to 'somehow reach out to content of
>> other (only python root?) projects. E.g couples projects. This somehow
>> 'feels non natural' to me. But, of course, might be the path to go. As I
>> said before, never worked on python on that scale.
>>
>> It feels a bit odd to me as well. Is it possible to have multiple
>> projects per directory (e.g. a suite of testing ones) rather than
>> having to break things up like this, especially if the goal is
>> primarily to get parallel running of tests? Especially if we could
>> automatically create the cross-product rather than manually? There
>> also seems to be some redundancy with what tox is doing here.
>>
>
> Not sure, whether I understand correctly. But I do not think that's
> possible. If we are going to do some cross-product, we are probably better
> of doing that on tasks, e.g. by leveraging task rules or programmatically
> adding tasks (which is already done in parts). Of course, this will not
> help with parallelisation (but might enable that, see below).
>
>
>>
>> > But I believe to remember Robert talking about using in project
>> parallelisation for his development. Is this something which could also
>> work on CI? Of course, that will not help with different python versions,
>> but maybe that could be solved also by gradles variants which are
>> introduced in 5.3 - definitely need some time to investigate the
>> possibilities here. On first sight it feels like lots of duplication to
>> create 'builds' for any python version. Or wouldn't that be the case?
>> >
>> > And another naive thought on my side, isn't that non parallelizability
>> also caused by the monolithic setup of the python code base? E.g. if I
>> understand correctly, java sdk is split into core/runners/ios etc, each
>> encapsulate into full blown projects, i.e. buckets of sources, tests and
>> build file. Would it technically possible to do something similar with
>> python? I assume that being discussed before and teared apart, but couldn't
>> find on mailing list.
>>
>> Neither the culture nor the tooling of Python supports lots of
>> interdependent "sub-packages" for a single project--at least not
>> something smaller than one would want to deploy to Pypi. So while one
>> could do this, it'd be going against the grain. There are also much
>> lower-hanging opportunities for parallelization (e.g. running the test
>> suites for separate python versions in parallel).
>>
>> It's not very natural (as I understand it) with Go either. If we're
>> talking directory re-organization, I think it would make sense to
>> consider having top-level java, python, go, ... next to model,
>> website, etc.
>>
>
> Yes. We shouldn't work against common culture/practices, but try to
> embrace native tooling and add support where required.
>
> To reiterate on parallelisation, there are (at least) three opportunities:
>
> 1. Parallelise on test level. For python, this is detox?
>

This is actually 2 levels. :)
1a. Parallelise at the nosetest level - run unit tests in parallel in a
single tox environment. (I have a PR in progress to migrate to pytest, and
we should be able to do file-level parallelism provided we solve pickling
issues.)
1b. Parallelise at the tox environment level, e..g, somehow running the
multiple tox environments (py27,py27-cython,py35,...) in parallel.


> 2. Parallelise on Gradle project level (--parallel option)
> 3. Parallelise on CI level
>
> So what I ve done before, if 1. does not help, nor 2. cause project is
> 'just to big' was 3, i.e. splitting on CI level. So the simplest thing I
> could imagine right now would be - as suggested above - to split
> pythonPreCommit into something like pythonX_YPrecommit, which then runs
> those different python versions in parallel. Of course, that could be done
> ad infinitum by splitting further into runners, IOs whatever.
>
> OTOH, we do already have tons of jobs running as we seem to map Gradle
> tasks to Jenkins jobs. So it might be more appropriate to leverage CI
> parallelization on jenkins pipeline level. Something like creating
> pythonPrecommit as a pipeline, which itself runs several steps in parallel.
> Did not contemplate on 

Re: Build blocking on

2019-03-25 Thread Udi Meiri
It shouldn't stall. That's a bug.
OTOH, I never use the `build` target.
I'll try running that myself.

On Mon, Mar 25, 2019, 07:24 Michael Luckey  wrote:

> Hi,
>
> trying to run './gradlew build' on vanilla setup, my build consistently
> stalls during execution of python gcp tests, e.g. on both of
> - > :beam-sdks-python:testPy2Gcp
> - > :beam-sdks-python-test-suites-tox-py35:testPy35Gcp
>
> Console output:
>  snip 
> test_big_query_standard_sql
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> ... SKIP: IT is skipped because --test-pipeline-options is not specified
> test_big_query_standard_sql_kms_key
> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
> ... SKIP: This test requires BQ Dataflow native source support for KMS,
> which is not available yet.
> test_multiple_destinations_transform
> (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) ... SKIP:
> IT is skipped because --test-pipeline-options is not specified
> test_one_job_fails_all_jobs_fail
> (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) ... SKIP:
> IT is skipped because --test-pipeline-options is not specified
> test_records_traverse_transform_with_mocks
> (apache_beam.io.gcp.bigquery_file_loads_test.TestBigQueryFileLoads) ...
>
> output ends here, would expect a failed or ok here.
>
>
> Afterwards no progress - even waiting for hours. Any idea, what might be
> causing this? Do I need to add some GCP properties for this task ?
>
> Any ideas, what I am doing wrong?
>
> best,
>
> michel
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Build blocking on

2019-03-25 Thread Udi Meiri
Okay, `./gradlew build` failed pretty quickly for me:

> Task :beam-sdks-go:resolveBuildDependencies FAILED
cloud.google.com/go: commit='4f6c921ec566a33844f4e7879b31cd8575a6982d',
urls=[https://code.googlesource.com/gocloud] does not exist in
/usr/local/google/home/ehudm/.gradle/go/repo/
cloud.google.com/go/625660c387d9403fde4d73cacaf2d2ac, updating will be
performed.

https://gradle.com/s/x5zqbc5zwd3bg

(Now I remember why I stopped using `build` :/)

On Mon, Mar 25, 2019 at 5:30 PM Udi Meiri  wrote:

> It shouldn't stall. That's a bug.
> OTOH, I never use the `build` target.
> I'll try running that myself.
>
> On Mon, Mar 25, 2019, 07:24 Michael Luckey  wrote:
>
>> Hi,
>>
>> trying to run './gradlew build' on vanilla setup, my build consistently
>> stalls during execution of python gcp tests, e.g. on both of
>> - > :beam-sdks-python:testPy2Gcp
>> - > :beam-sdks-python-test-suites-tox-py35:testPy35Gcp
>>
>> Console output:
>>  snip 
>> test_big_query_standard_sql
>> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
>> ... SKIP: IT is skipped because --test-pipeline-options is not specified
>> test_big_query_standard_sql_kms_key
>> (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT)
>> ... SKIP: This test requires BQ Dataflow native source support for KMS,
>> which is not available yet.
>> test_multiple_destinations_transform
>> (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) ... SKIP:
>> IT is skipped because --test-pipeline-options is not specified
>> test_one_job_fails_all_jobs_fail
>> (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) ... SKIP:
>> IT is skipped because --test-pipeline-options is not specified
>> test_records_traverse_transform_with_mocks
>> (apache_beam.io.gcp.bigquery_file_loads_test.TestBigQueryFileLoads) ...
>>
>> output ends here, would expect a failed or ok here.
>>
>>
>> Afterwards no progress - even waiting for hours. Any idea, what might be
>> causing this? Do I need to add some GCP properties for this task ?
>>
>> Any ideas, what I am doing wrong?
>>
>> best,
>>
>> michel
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Build blocking on

2019-03-26 Thread Udi Meiri
Robert, from what I recall it's not flaky for me - it consistently fails.
Let me know if there's a way to get more logging about this error.

On Mon, Mar 25, 2019, 19:50 Robert Burke  wrote:

> It's concerning to me that 1) the Go dependency resolution via gogradle is
> flaky, and 2) that it can block other languages.
>
> I suppose 2) makes sense since it's part of the container bootstrapping
> code, but that makes 1) a serious problem, of which I wasn't aware.
> I should have time to investigate this in the next two weeks.
>
> On Mon, 25 Mar 2019 at 18:08, Michael Luckey  wrote:
>
>> Just for the record,
>>
>> using a vm here, because did not yet get all task running on my mac, and
>> did not want to mess with my setup.
>>
>> So installed vanilla ubuntu-18.04 LTS on virtual box, 26GB ram, 6 cores
>> and further
>>
>> sudo apt update
>>
>> sudo apt install gcc
>>
>> sudo apt install make
>>
>> sudo apt install perl
>>
>> sudo apt install curl
>>
>> sudo apt install openjdk-8-jdk
>>
>> sudo apt install python
>>
>> sudo apt install -y software-properties-common
>>
>> sudo add-apt-repository ppa:deadsnakes/ppa
>>
>> sudo apt update
>>
>> sudo apt install python3.5
>>
>> sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent
>> software-properties-common
>>
>> curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key
>> add -
>>
>> sudo apt-key fingerprint 0EBFCD88
>>
>> sudo add-apt-repository "deb [arch=amd64]
>> https://download.docker.com/linux/ubuntu \
>>
>> $(lsb_release -cs) \
>>
>> stable"
>>
>> sudo apt-get update
>>
>> sudo apt-get install docker-ce docker-ce-cli containerd.io
>>
>> sudo groupadd docker
>>
>> sudo usermod -aG docker $USER
>>
>> git config --global user.email "d...@spam.me"
>>
>> git config --global user.name "Some Guy"
>>
>> curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
>>
>> sudo python get-pip.py
>>
>> rm get-pip.py
>>
>> sudo pip install --upgrade virtualenv
>>
>> sudo pip install cython
>>
>> sudo apt-get install python-dev
>>
>> sudo apt-get install python3-distutils
>>
>> sudo apt-get install python3-dev # for python3.x installs
>>
>>
>> git clone https://github.com/apache/beam.git cd beam/ ./gradlew build
>>
>> Nothing else changed/added. (hopefully, need to reassure myself here)
>>
>> Unfortunately, this is failing. Need to exclude those python tests (and
>> of course website, which usually fails on lira links)
>>
>> So I might be missing some env settings for gap, dunno. Probably missed
>> some docs.
>>
>>
>>
>> On Tue, Mar 26, 2019 at 1:46 AM Michael Luckey 
>> wrote:
>>
>>> Thanks Udi for trying that!
>>>
>>> In fact, the go dependency resolution is flaky. Did not look into that,
>>> but just rerunning usually works. Of course, less than optimal, but,
>>> well...
>>>
>>> Running build target is of course just an aggregation of task to run.
>>> And unfortunately just running that
>>>
>>> ./gradlew  :beam-sdks-python:testPy2Gcp
>>>
>>> stalls on my (virtual) machine.
>>>
>>> On Tue, Mar 26, 2019 at 1:35 AM Udi Meiri  wrote:
>>>
>>>> Okay, `./gradlew build` failed pretty quickly for me:
>>>>
>>>> > Task :beam-sdks-go:resolveBuildDependencies FAILED
>>>> cloud.google.com/go:
>>>> commit='4f6c921ec566a33844f4e7879b31cd8575a6982d', urls=[
>>>> https://code.googlesource.com/gocloud] does not exist in
>>>> /usr/local/google/home/ehudm/.gradle/go/repo/
>>>> cloud.google.com/go/625660c387d9403fde4d73cacaf2d2ac, updating will be
>>>> performed.
>>>>
>>>> https://gradle.com/s/x5zqbc5zwd3bg
>>>>
>>>> (Now I remember why I stopped using `build` :/)
>>>>
>>>> On Mon, Mar 25, 2019 at 5:30 PM Udi Meiri  wrote:
>>>>
>>>>> It shouldn't stall. That's a bug.
>>>>> OTOH, I never use the `build` target.
>>>>> I'll try running that myself.
>>>>>
>>>>> On Mon, Mar 25, 2019, 07:24 Michael Luckey 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> trying to run './gradlew build' on vanilla

Re: [ANNOUNCE] New committer announcement: Mark Liu

2019-03-25 Thread Udi Meiri
Congrats Mark!

On Mon, Mar 25, 2019 at 9:24 AM Ahmet Altay  wrote:

> Congratulations, Mark! 
>
> On Mon, Mar 25, 2019 at 7:24 AM Tim Robertson 
> wrote:
>
>> Congratulations Mark!
>>
>>
>> On Mon, Mar 25, 2019 at 3:18 PM Michael Luckey 
>> wrote:
>>
>>> Nice! Congratulations, Mark.
>>>
>>> On Mon, Mar 25, 2019 at 2:42 PM Katarzyna Kucharczyk <
>>> ka.kucharc...@gmail.com> wrote:
>>>
 Congratulations, Mark! 

 On Mon, Mar 25, 2019 at 11:24 AM Gleb Kanterov 
 wrote:

> Congratulations!
>
> On Mon, Mar 25, 2019 at 10:23 AM Łukasz Gajowy 
> wrote:
>
>> Congrats! :)
>>
>>
>>
>> pon., 25 mar 2019 o 08:11 Aizhamal Nurmamat kyzy 
>> napisał(a):
>>
>>> Congratulations, Mark!
>>>
>>> On Sun, Mar 24, 2019 at 23:18 Pablo Estrada 
>>> wrote:
>>>
 Yeaah  Mark! : ) Congrats : D

 On Sun, Mar 24, 2019 at 10:32 PM Yifan Zou 
 wrote:

> Congratulations Mark!
>
> On Sun, Mar 24, 2019 at 10:25 PM Connell O'Callaghan <
> conne...@google.com> wrote:
>
>> Well done congratulations Mark!!!
>>
>> On Sun, Mar 24, 2019 at 10:17 PM Robert Burke 
>> wrote:
>>
>>> Congratulations Mark! 
>>>
>>> On Sun, Mar 24, 2019, 10:08 PM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
 Congratulations, Mark!

 Thanks for your contributions, in particular for your efforts
 to parallelize test execution for Python SDK and increase the 
 speed of
 Python precommit checks.

 On Sun, Mar 24, 2019 at 9:40 PM Kenneth Knowles <
 k...@apache.org> wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
>  committer: Mark Liu.
>
> Mark has been contributing to Beam since late 2016! He has
> proposed 100+ pull requests. Mark was instrumental in expanding 
> test and
> infrastructure coverage, especially for Python. In
> consideration of Mark's contributions, the Beam PMC trusts Mark 
> with the
> responsibilities of a Beam committer [1].
>
> Thank you, Mark, for your contributions.
>
> Kenn
>
> [1] https://beam.apache.org/contribute/become-a-committer/
> #an-apache-beam-committer
>
 --
>>>
>>> *Aizhamal Nurmamat kyzy*
>>>
>>> Open Source Program Manager
>>>
>>> 646-355-9740 Mobile
>>>
>>> 601 North 34th Street, Seattle, WA 98103
>>>
>>>
>>>
>
> --
> Cheers,
> Gleb
>



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Build blocking on

2019-03-26 Thread Udi Meiri
"rm -r ~/.gradle/go/repo/" worked for me (there was more than one package
with issues).
My ~/.bashrc has
  export GOPATH=$HOME/go
so maybe that's making the difference in my setup.

On Tue, Mar 26, 2019 at 11:28 AM Thomas Weise  wrote:

> Can this be addressed by having "clean" remove all state that gogradle
> leaves behind? This staleness issue has bitten me a few times also and it
> would be good to have a reliable way to deal with it, even if it involves
> an extra clean.
>
>
> On Tue, Mar 26, 2019 at 11:14 AM Michael Luckey 
> wrote:
>
>> @Udi
>> Did you try to just delete the
>> '/usr/local/google/home/ehudm/.gradle/go/repo/cloud.google.com' folder?
>>
>> @Robert
>> As said before, I am a bit scared about the implications. Shelling out is
>> done by python, and from build perspective, this does not work very well,
>> unfortunately. I.e. no caching, up-to-date checks etc...
>>
>> But of course, we need to play with this a bit more.
>>
>> On Tue, Mar 26, 2019 at 6:24 PM Robert Burke  wrote:
>>
>>> Reading the error from the gradle scan, it largely looks like some part
>>> of the GCP dependencies for the build depends on a package, where the
>>> commit version is no longer around. The main issue with gogradle is that
>>> it's entirely distinct from the usual Go workflow, which means deps users
>>> use are likely to be different to what's in the lock file.
>>>
>>> This work will be tracked in
>>> https://issues.apache.org/jira/browse/BEAM-5379
>>> GoGradle hasn't moved to support the new-go way of handling deps, so my
>>> inclination is to simplify to simple scripts for Gradle that shell out the
>>> to Go tool for handling Go dep management, over trying to fix GoGradle.
>>>
>>> On Tue, 26 Mar 2019 at 09:43, Udi Meiri  wrote:
>>>
>>>> Robert, from what I recall it's not flaky for me - it consistently
>>>> fails. Let me know if there's a way to get more logging about this error.
>>>>
>>>> On Mon, Mar 25, 2019, 19:50 Robert Burke  wrote:
>>>>
>>>>> It's concerning to me that 1) the Go dependency resolution via
>>>>> gogradle is flaky, and 2) that it can block other languages.
>>>>>
>>>>> I suppose 2) makes sense since it's part of the container
>>>>> bootstrapping code, but that makes 1) a serious problem, of which I wasn't
>>>>> aware.
>>>>> I should have time to investigate this in the next two weeks.
>>>>>
>>>>> On Mon, 25 Mar 2019 at 18:08, Michael Luckey 
>>>>> wrote:
>>>>>
>>>>>> Just for the record,
>>>>>>
>>>>>> using a vm here, because did not yet get all task running on my mac,
>>>>>> and did not want to mess with my setup.
>>>>>>
>>>>>> So installed vanilla ubuntu-18.04 LTS on virtual box, 26GB ram, 6
>>>>>> cores and further
>>>>>>
>>>>>> sudo apt update
>>>>>>
>>>>>> sudo apt install gcc
>>>>>>
>>>>>> sudo apt install make
>>>>>>
>>>>>> sudo apt install perl
>>>>>>
>>>>>> sudo apt install curl
>>>>>>
>>>>>> sudo apt install openjdk-8-jdk
>>>>>>
>>>>>> sudo apt install python
>>>>>>
>>>>>> sudo apt install -y software-properties-common
>>>>>>
>>>>>> sudo add-apt-repository ppa:deadsnakes/ppa
>>>>>>
>>>>>> sudo apt update
>>>>>>
>>>>>> sudo apt install python3.5
>>>>>>
>>>>>> sudo apt-get install apt-transport-https ca-certificates curl
>>>>>> gnupg-agent software-properties-common
>>>>>>
>>>>>> curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo
>>>>>> apt-key add -
>>>>>>
>>>>>> sudo apt-key fingerprint 0EBFCD88
>>>>>>
>>>>>> sudo add-apt-repository "deb [arch=amd64]
>>>>>> https://download.docker.com/linux/ubuntu \
>>>>>>
>>>>>> $(lsb_release -cs) \
>>>>>>
>>>>>> stable"
>>>>>>
>>>>>> sudo apt-get update
>>>>>>
>>>>>> sudo apt-get install do

Re: Build blocking on

2019-03-26 Thread Udi Meiri
Luckey, I couldn't recreate your issue, but I still haven't done a full
build.
I created a new GCE VM with using the ubuntu-1804-bionic-v20190212a image
(n1-standard-4 machine type).

Ran the following:
sudo apt-get update
sudo apt-get install python-pip
sudo apt-get install python-virtualenv
git clone https://github.com/apache/beam.git
cd beam
./gradlew :beam-sdks-python:testPy2Gcp
[failed: no JAVA_HOME]
sudo apt-get install openjdk-8-jdk
./gradlew :beam-sdks-python:testPy2Gcp

Got: BUILD SUCCESSFUL in 7m 52s

Then I tried:
./gradlew build

And ran out of disk space. :) (beam/ is taking 4.5G and the VM boot disk is
10G total)

On Tue, Mar 26, 2019 at 1:35 PM Robert Burke  wrote:

> Michael, your concern is reasonable, especially with the experience with
> python, though that does help me bootstrap this work. :)
>
> The go tools provide caching and avoid redoing work if the source files
> haven't changed. This applies most particularly for `go build` and `go
> test`. As long as the go code isn't changing at every invocation, this
> should be fine. I'm not aware of the same being the case for the usual
> python tools.
>
>  The real trick is ensuring a valid and consistent environment for the go
> code.
>
> The environment question becomes easier for everyone by moving to go
> modules, which were designed to provide these kinds of consistent builds.
> It also avoids needing a GOPATH set. Any directory is permitted, as long as
> the go.mod is present.
>
> (The Go SDK doesn't yet us go modules, so go.mod and go.sum aren't yet in
> the repo.)
>
> The main blocker is see is updating the Jenkins machines to have the
> latest version of Go (1.12) instead of 1.10, which doesn't support modules.
> This only blocks a final submission, rather than the work fortunately.
>
> On Tue, Mar 26, 2019, 1:08 PM Udi Meiri  wrote:
>
>> "rm -r ~/.gradle/go/repo/" worked for me (there was more than one package
>> with issues).
>> My ~/.bashrc has
>>   export GOPATH=$HOME/go
>> so maybe that's making the difference in my setup.
>>
>> On Tue, Mar 26, 2019 at 11:28 AM Thomas Weise  wrote:
>>
>>> Can this be addressed by having "clean" remove all state that gogradle
>>> leaves behind? This staleness issue has bitten me a few times also and it
>>> would be good to have a reliable way to deal with it, even if it involves
>>> an extra clean.
>>>
>>>
>>> On Tue, Mar 26, 2019 at 11:14 AM Michael Luckey 
>>> wrote:
>>>
>>>> @Udi
>>>> Did you try to just delete the
>>>> '/usr/local/google/home/ehudm/.gradle/go/repo/cloud.google.com' folder?
>>>>
>>>> @Robert
>>>> As said before, I am a bit scared about the implications. Shelling out
>>>> is done by python, and from build perspective, this does not work very
>>>> well, unfortunately. I.e. no caching, up-to-date checks etc...
>>>>
>>>> But of course, we need to play with this a bit more.
>>>>
>>>> On Tue, Mar 26, 2019 at 6:24 PM Robert Burke 
>>>> wrote:
>>>>
>>>>> Reading the error from the gradle scan, it largely looks like some
>>>>> part of the GCP dependencies for the build depends on a package, where the
>>>>> commit version is no longer around. The main issue with gogradle is that
>>>>> it's entirely distinct from the usual Go workflow, which means deps users
>>>>> use are likely to be different to what's in the lock file.
>>>>>
>>>>> This work will be tracked in
>>>>> https://issues.apache.org/jira/browse/BEAM-5379
>>>>> GoGradle hasn't moved to support the new-go way of handling deps, so
>>>>> my inclination is to simplify to simple scripts for Gradle that shell out
>>>>> the to Go tool for handling Go dep management, over trying to fix 
>>>>> GoGradle.
>>>>>
>>>>> On Tue, 26 Mar 2019 at 09:43, Udi Meiri  wrote:
>>>>>
>>>>>> Robert, from what I recall it's not flaky for me - it consistently
>>>>>> fails. Let me know if there's a way to get more logging about this error.
>>>>>>
>>>>>> On Mon, Mar 25, 2019, 19:50 Robert Burke  wrote:
>>>>>>
>>>>>>> It's concerning to me that 1) the Go dependency resolution via
>>>>>>> gogradle is flaky, and 2) that it can block other languages.
>>>>>>>
>>>>>>> I suppose 2) makes sense since it's part of the container
>>>>>>> bootstrapping code, but th

Re: What quick command to catch common issues before pushing a python PR?

2019-02-25 Thread Udi Meiri
Talking about Python:
I only know of "./gradlew lint", which include style and some py3
compliance checking.
There is no auto-fix like spotlessApply AFAIK.

As a side-note, I really dislike our python line continuation indent rule,
since pycharm can't be configured to adhere to it and I find myself
manually adjusting whitespace all the time.


On Mon, Feb 25, 2019 at 10:22 AM Kenneth Knowles  wrote:

> FWIW gradle is a depgraph-based build system. You can gain a few seconds
> by putting all but spotlessApply in one command.
>
> ./gradlew spotlessApply && ./gradlew checkstyleMain checkstyleTest javadoc
> findbugsMain compileTestJava compileJava
>
> It might be clever to define a meta-task. Gradle "base plugin" has the
> notable check (build and run tests), assemble (make artifacts), and build
> (assemble + check, badly named!)
>
> I think something like "everything except running tests and building
> artifacts" might be helpful.
>
> Kenn
>
> On Mon, Feb 25, 2019 at 10:13 AM Alex Amato  wrote:
>
>> I made a thread about this a while back for java, but I don't think the
>> same commands like sptoless work for python.
>>
>> auto fixing lint issues
>> running and quick checks which would fail the PR (without running the
>> whole precommit?)
>> Something like findbugs to detect common issues (i.e. py3 compliance)
>>
>> FWIW, this is what I have been using for java. It will catch pretty much
>> everything except presubmit test failures.
>>
>> ./gradlew spotlessApply && ./gradlew checkstyleMain && ./gradlew
>> checkstyleTest && ./gradlew javadoc && ./gradlew findbugsMain && ./gradlew
>> compileTestJava && ./gradlew compileJava
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Virtualenv setup issues on new machine

2019-02-28 Thread Udi Meiri
I think gradle is complaining that the path can't be found.
Is there more information if you run it with --info?

On Thu, Feb 28, 2019, 14:35 Ankur Goenka  wrote:

> Hi Beamers,
>
> I am trying build python sdk from a fresh git checkout on a new linux
> machine but the setupVirtualEnv task is failing with the error below. The
> complete build scan is at
> https://scans.gradle.com/s/h3jwzeg5aralk/failure?openFailures=WzBd=WzQsM10#top=0
>
> From the error it seems that gradle is trying to find the virtualenv
> command in beam/python folder.
> I am able to run virtualenv from the bash directly and PATH seems to be
> setup correctly.
>
> Anypointers about what might be happening?
>
>
> org.gradle.api.tasks.TaskExecutionException
> :
> Execution failed for task ':beam-sdks-python:setupVirtualenv'.
> Open stacktrace
> Caused by:
> org.gradle.process.internal.ExecException
> :
> A problem occurred starting process 'command 'virtualenv''
> Open stacktrace
> Caused by:
> net.rubygrapefruit.platform.NativeException
> :
> Could not start 'virtualenv'
> Open stacktrace
> Caused by:
> java.io.IOException
> :
> Cannot run program "virtualenv" (in directory
> "/tmp/beam/beam/sdks/python"): error=2, No such file or directory
> Close stacktrace
> at net.rubygrapefruit.platform.internal.DefaultProcessLauncher.start
> (DefaultProcessLauncher.java:25)
> at net.rubygrapefruit.platform.internal.WrapperProcessLauncher.start
> (WrapperProcessLauncher.java:36)
> at org.gradle.process.internal.ExecHandleRunner.run
> (ExecHandleRunner.java:67)
> at org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.
> run(CurrentBuildOperationPreservingRunnable.java:42)
> at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.
> onExecute(ExecutorPolicy.java:63)
> at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run
> (ManagedExecutorImpl.java:46)
> at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.
> run(ThreadFactoryImpl.java:55)
> Caused by:
> java.io.IOException
> :
> error=2, No such file or directory
> Close stacktrace
> at net.rubygrapefruit.platform.internal.DefaultProcessLauncher.start
> (DefaultProcessLauncher.java:25)
> at net.rubygrapefruit.platform.internal.WrapperProcessLauncher.start
> (WrapperProcessLauncher.java:36)
> at org.gradle.process.internal.ExecHandleRunner.run
> (ExecHandleRunner.java:67)
> at org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.
> run(CurrentBuildOperationPreservingRunnable.java:42)
> at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.
> onExecute(ExecutorPolicy.java:63)
> at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run
> (ManagedExecutorImpl.java:46)
> at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.
> run(ThreadFactoryImpl.java:55)
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: precommit tests: please ignore Python_pytest

2019-02-27 Thread Udi Meiri
I've added a flag in my latest push so that a commit-triggered job is not
created. It works :)

On Wed, Feb 27, 2019, 11:30 Pablo Estrada  wrote:

> Currently it's not possible, because we run a Seed job that sets the
> configuration globally. Lukazs Gajowy had talked to me about improvements
> to this process, but they've not been pursued further by anyone so far.
>
> TLDR: Currently it's not possible : (
> -P.
>
> On Wed, Feb 27, 2019 at 3:29 AM Maximilian Michels  wrote:
>
>> Thanks for the heads-up Udi. I noticed the GitHub check and figured that
>> it might not be fully-functional yet.
>>
>> It would be nice if there was a way to enable those hooks only for a
>> testing PR, such that they do not interfere with other PRs.
>>
>> Perhaps somebody has an idea how to do that?
>>
>> -Max
>>
>> On 27.02.19 01:13, Udi Meiri wrote:
>> > Hi all,
>> >
>> > I'm testing running Python tests using pytest, and I've added a
>> > temporary Jenkins jobs that seems to be triggering for PRs, even though
>> > I set triggerPathPatterns to an empty list. (file
>> > <
>> https://github.com/apache/beam/pull/7949/files#diff-1eadfdfe334e9d500efa54b427882c84R27
>> >)
>> >
>> > Please ignore any failures for this test.
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Virtualenv setup issues on new machine

2019-02-28 Thread Udi Meiri
Weird. Is that a known bug?

On Thu, Feb 28, 2019 at 3:19 PM Ankur Goenka  wrote:

> The issue seems to be with "." in the virtualenv path.
> virtualenv works after moving from
> "/usr/local/google/home/goenka/.local/bin" to "/usr/bin"
>
> On Thu, Feb 28, 2019 at 2:57 PM Udi Meiri  wrote:
>
>> I think gradle is complaining that the path can't be found.
>> Is there more information if you run it with --info?
>>
>> On Thu, Feb 28, 2019, 14:35 Ankur Goenka  wrote:
>>
>>> Hi Beamers,
>>>
>>> I am trying build python sdk from a fresh git checkout on a new linux
>>> machine but the setupVirtualEnv task is failing with the error below. The
>>> complete build scan is at
>>> https://scans.gradle.com/s/h3jwzeg5aralk/failure?openFailures=WzBd=WzQsM10#top=0
>>>
>>> From the error it seems that gradle is trying to find the virtualenv
>>> command in beam/python folder.
>>> I am able to run virtualenv from the bash directly and PATH seems to be
>>> setup correctly.
>>>
>>> Anypointers about what might be happening?
>>>
>>>
>>> org.gradle.api.tasks.TaskExecutionException
>>> :
>>> Execution failed for task ':beam-sdks-python:setupVirtualenv'.
>>> Open stacktrace
>>> Caused by:
>>> org.gradle.process.internal.ExecException
>>> :
>>> A problem occurred starting process 'command 'virtualenv''
>>> Open stacktrace
>>> Caused by:
>>> net.rubygrapefruit.platform.NativeException
>>> :
>>> Could not start 'virtualenv'
>>> Open stacktrace
>>> Caused by:
>>> java.io.IOException
>>> :
>>> Cannot run program "virtualenv" (in directory
>>> "/tmp/beam/beam/sdks/python"): error=2, No such file or directory
>>> Close stacktrace
>>> at net.rubygrapefruit.platform.internal.DefaultProcessLauncher.start
>>> (DefaultProcessLauncher.java:25)
>>> at net.rubygrapefruit.platform.internal.WrapperProcessLauncher.start
>>> (WrapperProcessLauncher.java:36)
>>> at org.gradle.process.internal.ExecHandleRunner.run
>>> (ExecHandleRunner.java:67)
>>> at
>>> org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.
>>> run(CurrentBuildOperationPreservingRunnable.java:42)
>>> at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.
>>> onExecute(ExecutorPolicy.java:63)
>>> at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run
>>> (ManagedExecutorImpl.java:46)
>>> at
>>> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.
>>> run(ThreadFactoryImpl.java:55)
>>> Caused by:
>>> java.io.IOException
>>> :
>>> error=2, No such file or directory
>>> Close stacktrace
>>> at net.rubygrapefruit.platform.internal.DefaultProcessLauncher.start
>>> (DefaultProcessLauncher.java:25)
>>> at net.rubygrapefruit.platform.internal.WrapperProcessLauncher.start
>>> (WrapperProcessLauncher.java:36)
>>> at org.gradle.process.internal.ExecHandleRunner.run
>>> (ExecHandleRunner.java:67)
>>> at
>>> org.gradle.internal.operations.CurrentBuildOperationPreservingRunnable.
>>> run(CurrentBuildOperationPreservingRunnable.java:42)
>>> at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.
>>> onExecute(ExecutorPolicy.java:63)
>>> at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run
>>> (ManagedExecutorImpl.java:46)
>>> at
>>> org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.
>>> run(ThreadFactoryImpl.java:55)
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [BEAM-6761] Pydoc is giving cryptic error messages, blocking my PR :(

2019-03-01 Thread Udi Meiri
I think it's referring to the big comment at the top of the
sdks/python/apache_beam/testing/metric_result_matchers.py.
The line numbers are relative to the beginning of the block.

On Fri, Mar 1, 2019 at 2:21 PM Alex Amato  wrote:

> BEAM-6761 
>
> This is blocking my PR at the moment, the output doesn't seem to match the
> file and I am not sure how to proceed
>
> pydoc Output
>
> https://scans.gradle.com/s/im6t66hhy4bdq/console-log?task=:beam-sdks-python:docs#L3
> 
>
> Files
> https://github.com/apache/beam/pull/7936/files
> 
>
>
>
> /usr/local/google/home/ajamato/go/src/
> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
> of apache_beam.testing.metric_result_matchers:13: WARNING: Unexpected
> indentation.
> /usr/local/google/home/ajamato/go/src/
> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
> of apache_beam.testing.metric_result_matchers:15: WARNING: Block quote ends
> without a blank line; unexpected unindent.
> /usr/local/google/home/ajamato/go/src/
> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
> of apache_beam.testing.metric_result_matchers:18: WARNING: Definition list
> ends without a blank line; unexpected unindent.
> /usr/local/google/home/ajamato/go/src/
> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
> of apache_beam.testing.metric_result_matchers:19: WARNING: Definition list
> ends without a blank line; unexpected unindent.
> /usr/local/google/home/ajamato/go/src/
> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
> of apache_beam.testing.metric_result_matchers:21: WARNING: Unexpected
> indentation.
> /usr/local/google/home/ajamato/go/src/
> github.com/apache/beam/sdks/python/apache_beam/testing/metric_result_matchers.py:docstring
> of apache_beam.testing.metric_result_matchers:22: WARNING: Block quote ends
> without a blank line; unexpected unindent.
>
>
>
> = copy of the file in its current state (I will probably modify the PR
> 
>
> https://pastebin.com/8bWrPZVJ
>
>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: gradle clean causes long-running python installs

2019-02-20 Thread Udi Meiri
The dependency of clean on setupVirtualenv has been removed.

I use gradle for daily development on python, but only to run the "lint"
task.
There are a lot of python build artifacts that "clean" doesn't delete.

On Wed, Feb 20, 2019 at 1:17 AM Michael Luckey  wrote:

> Thats said, removing this dependency on setupVirtualEnv is definitely
> a good idea.
>
> On Wed, Feb 20, 2019 at 10:14 AM Michael Luckey 
> wrote:
>
>> Is anyone using gradle for daily development of python parts? Or is this
>> just used as integration for Jenkins build? In case of the latter, clean
>> isn't called anyway, right?
>>
>> If the former, its probably worth to put more effort into python setup
>> anyway, e.g. declaring proper input/outputs... If done correctly (if thats
>> possible at all) clean will work out of the box anyway.
>>
>> On Wed, Feb 20, 2019 at 1:01 AM Udi Meiri  wrote:
>>
>>> I think I can solve this issue by removing the dependency and adding a
>>> check to see if the virtualenv was created.
>>> Otherwise, there shouldn't be anything to cleanup anyway.
>>>
>>> On Sat, Feb 16, 2019 at 8:04 PM Ryan Williams 
>>> wrote:
>>>
>>>> Thanks Michael, your assessment was correct. I needed python3.5 on my
>>>> $PATH.
>>>>
>>>> For completeness, I needed this
>>>> <https://github.com/pyenv/pyenv/wiki/Common-build-problems#build-failed-error-the-python-zlib-extension-was-not-compiled-missing-the-zlib>
>>>> to get pyenv to install python 3.5.6 on macOS:
>>>>
>>>> ```
>>>> CPPFLAGS="-I$(brew --prefix zlib)/include" pyenv install -v 3.5.6
>>>> ```
>>>>
>>>> `./gradlew clean` worked after that.
>>>>
>>>>
>>>> On Sat, Feb 16, 2019 at 7:28 PM Michael Luckey 
>>>> wrote:
>>>>
>>>>> As far as I understand, the build got bound to 3.5 [1].
>>>>>
>>>>> Could it be that you do not have python3.5 on your path? e.g. try  
>>>>> python3.5
>>>>> --version
>>>>>
>>>>> If that is missing, you will not be able to run any py3 task, I
>>>>> guess...
>>>>>
>>>>> So, to get out of this state, you have to get python3.5 command
>>>>> working.
>>>>>
>>>>> At least on my machine, './gradlew clean' is working iff python3.5 is
>>>>> on my path.
>>>>>
>>>>> michel
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1591
>>>>>
>>>>>
>>>>> On Sat, Feb 16, 2019 at 9:54 PM Ryan Williams 
>>>>> wrote:
>>>>>
>>>>>> I'm seeing the same thing as Thomas above: `./gradlew clean` fails on
>>>>>> two py3 setupVirtualEnv tasks,
>>>>>> :beam-sdks-python-test-suites-direct-py3:setupVirtualenv and
>>>>>> :beam-sdks-python-test-suites-dataflow-py3:setupVirtualenv. Here's
>>>>>> full output
>>>>>> <https://gist.github.com/ryan-williams/402e20131d23905163de3a4e2b178f39>
>>>>>> .
>>>>>>
>>>>>> Any tips how to get out of this state, or what is causing it?
>>>>>>
>>>>>> On Fri, Feb 8, 2019 at 7:17 PM Kenneth Knowles 
>>>>>> wrote:
>>>>>>
>>>>>>> Maybe add that case to
>>>>>>> https://issues.apache.org/jira/browse/BEAM-6459.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Fri, Feb 8, 2019 at 9:09 AM Thomas Weise  wrote:
>>>>>>>
>>>>>>>> Probably related, a top level ./gradlew clean fails with the
>>>>>>>> following:
>>>>>>>>
>>>>>>>> > Task :beam-sdks-python-precommit-direct-py3:setupVirtualenv FAILED
>>>>>>>> The path python3.5 (from --python=python3.5) does not exist
>>>>>>>>
>>>>>>>> Can we just limit clean to do cleanup?!
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 21, 2019 at 7:58 AM Robert Bradshaw <
>>>>>>>> rober...@google.com> wrote:
>>>>>>>>
>>>>>>>>> J

Re: Added a Jira beginner's guide to the wiki.

2019-02-27 Thread Udi Meiri
My favorite way to navigate JIRA is using a Chrome search engine.
You configure it like this:
[image: Screenshot from 2019-02-27 17-11-26.png]
(URL is:
https://issues.apache.org/jira/secure/QuickSearch.jspa?searchString=%s)

And search by writing in the location bar:
"j BEAM-1234" will take you to that specific issue
"j beam unresolved udim" will show all unresolved issues assigned to udim


On Tue, Feb 26, 2019 at 9:22 PM Ahmet Altay  wrote:

> Thank you Daniel, this is great information.
>
> On Fri, Feb 22, 2019 at 11:47 AM Daniel Oliveira 
> wrote:
>
>> Hi everyone,
>>
>> In a recent thread in this list I mentioned that it might be nice to have
>> a short guide for our Jira on the wiki since there were some aspects of
>> Jira that I found a bit unintuitive or not discover-able when I was getting
>> into the project. I went ahead and wrote one up and would appreciate some
>> feedback, especially from any contributors that may be new to Beam and/or
>> Jira.
>>
>>
>> https://cwiki.apache.org/confluence/display/BEAM/Beam+Jira+Beginner%27s+Guide
>>
>> The main two aspects that I want to make sure I got right are:
>>
>> 1. Covering details that are often confusing for new contributors, such
>> as ways Beam uses Jira that might be unique, or just unintuitive features.
>>
>> 2. Keeping it very brief and duplicating as little documentation as
>> possible. I don't want this to get outdated, so I'd much rather link to a
>> source of truth when possible.
>>
>> If anyone has any details I missed that they'd like to add, or feel that
>> they could edit the guide a bit to keep it brief and cut out unnecessary
>> info, please go ahead. Also, I'm hoping that this guide could be linked
>> from the Contribution Guide  on the
>> website if people find it useful, so feedback on that front would be great
>> too.
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


precommit tests: please ignore Python_pytest

2019-02-26 Thread Udi Meiri
Hi all,

I'm testing running Python tests using pytest, and I've added a temporary
Jenkins jobs that seems to be triggering for PRs, even though I set
triggerPathPatterns to an empty list. (file

)

Please ignore any failures for this test.


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Selectively running tests?

2019-03-19 Thread Udi Meiri
At least for python, any changes under runners/ trigger a precommit:
https://github.com/apache/beam/blob/34b071c0b0f08fceb451ce9c4a4bcaab6160abcd/.test-infra/jenkins/job_PreCommit_Python.groovy#L27

On Tue, Mar 19, 2019 at 10:38 AM Pablo Estrada  wrote:

> Perhaps it's just that the rules are more extensive for Java worker
> changes. I noticed that my change runs Go, Python and Java precommits:
> https://github.com/apache/beam/pull/8080
>
> On Tue, Mar 19, 2019 at 10:27 AM Udi Meiri  wrote:
>
>> Do you have an example PR Pablo?
>>
>>
>> On Mon, Mar 18, 2019, 18:23 Alan Myrvold  wrote:
>>
>>> The includedRegions was set up as part of
>>> https://issues.apache.org/jira/browse/BEAM-4445 and there are
>>> additional paths added from
>>> https://github.com/apache/beam/blob/6bb4b2332b11bd8295ac6965be8426b9c38fa454/.test-infra/jenkins/PrecommitJobBuilder.groovy#L65
>>>
>>> Not sure why they are not working, but it would be good to get this
>>> going again. Might have stopped at the same time Jenkins was updated.
>>>
>>> On Mon, Mar 18, 2019 at 3:28 PM Pablo Estrada 
>>> wrote:
>>>
>>>> We used to have tests run selectively depending on which directories
>>>> were changes. I've just noticed that this is not the case anymore.
>>>>
>>>> Did we stop doing that? Or maybe the selector is faulty? Anyone know
>>>> what happened here?
>>>> Thanks!
>>>> -P.
>>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Python Datastore client upgrade plan

2019-03-19 Thread Udi Meiri
Update: I'm back to working on this.
To allow a smoother migration, I'm planning on having apache-beam depend on
both googledatastore and google-cloud-datastore and having 2 Beam modules.
The newer client is a bit more limited in expressing queries (only ANDs for
composite filtering).
OTOH it supports transactions so we could add inserts of incomplete
entities.

Updated plan here:
https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit

On Wed, Oct 17, 2018 at 12:49 PM Ahmet Altay  wrote:

>
>
> On Wed, Oct 17, 2018 at 11:49 AM, Chamikara Jayalath  > wrote:
>
>> Thanks Udi. Added some comments.
>>
>> On Wed, Oct 17, 2018 at 10:50 AM Ahmet Altay  wrote:
>>
>>> Udi thank you for the proposal and thank you for sharing it in plain
>>> email. My comments are below.
>>>
>>> Overall, this is a good plan to get us out of a tough situation with an
>>> old dependency.
>>>
>>> On Tue, Oct 16, 2018 at 6:59 PM, Udi Meiri  wrote:
>>>
>>>> Hi,
>>>> Sadly upgrading googledatastore -> google-cloud-datastore is
>>>> non-trivial (https://issues.apache.org/jira/browse/BEAM-4543). I wrote
>>>> a doc to summarize the plan:
>>>>
>>>> https://docs.google.com/document/d/1sL9p7NE5Z0p-5SB5uwpxWrddj_UCESKSrsvDTWNKqb4/edit?usp=sharing
>>>>
>>>> Contents pasted below:
>>>> Beam Python SDK: Datastore Client Upgrade
>>>>
>>>> eh...@google.com
>>>>
>>>> public, draft, 2018-10-16
>>>> Objective
>>>>
>>>> Upgrade Beam's Python SDK dependency to use google-cloud-datastore
>>>> v1.70 (or later), replacing googledatastore v7.0.1, providing Beam users a
>>>> migration path to a new Datastore PTransform API.
>>>> Background
>>>>
>>>> Beam currently uses the googledatastore package to provide access to
>>>> Google Cloud Datastore, however that package doesn't seem to be getting
>>>> regular releases (last release in 2017-04
>>>> <https://pypi.org/project/googledatastore/>) and it doesn't officially
>>>> support Python 3 <https://issues.apache.org/jira/browse/BEAM-4543>.
>>>>
>>>> The current Beam API for Datastore queries exposes googledatastore
>>>> types, such as using a protobuf to define a query (wordcount example
>>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/apache_beam/examples/cookbook/datastore_wordcount.py#L159>).
>>>> Conversely, google-cloud-datastore hides this implementation detail (query
>>>> API
>>>> <https://googleapis.github.io/google-cloud-python/latest/datastore/queries.html>).
>>>> Since Beam API has to change the data types it accepts, it forces users to
>>>> change their code. This makes the migration to google-cloud-datastore
>>>> non-trivial.
>>>> Proposal
>>>>
>>>> This proposal includes a period in which two Beam APIs are available to
>>>> access Datastore.
>>>>
>>>>
>>>>-
>>>>
>>>>Add a new PTransforms that use google-cloud-datastore and mark as
>>>>deprecated the existing API (ReadFromDatastore, WriteToDatastore,
>>>>DeleteFromDatastore).
>>>>-
>>>>
>>>>Implement apache_beam/io/datastore.py using google-cloud-datastore,
>>>>taking care to not expose Datastore client internals.
>>>>-
>>>>
>>>>(optional) Remove googledatastore from GCP_REQUIREMENTS
>>>>
>>>> <https://github.com/apache/beam/blob/79049b02949affe5aa2390dec9b890a04e1fde89/sdks/python/setup.py#L139>
>>>>package list, and add it to a separate list, e.g., pip install
>>>>apache-beam[gcp,googledatastore].
>>>>
>>>>
>>> I would like to avoid defining new sets of extra packages. Assuming that
>>> these two packages are not incompatible together, we could keep them both
>>> in [gcp].
>>>
>>
>> I think we might need this since googleclouddatastore package (1) does
>> not seems to be getting upgraded (2) depends on older versions of packages
>> (for example, httplib2).
>>
>> This conflicts with more recent releases of other tools (for example,
>> gsutil).
>>
>
> This is fine, if it is the only viable option. But note that it is also a
> breaking change in the way people install beam in order to use old
> datastore APIs.
>
>
>>
>>
>>>
>>>
>>>>
>>>>-
>>>>
>>>>Remove googledatastore-based API from Beam after 2 releases.
>>>>
>>>>
>>> The removal needs to wait until next major version by default. Unless,
>>> we have a way of asking our users and ensuring that nobody is really using
>>> the existing API. Removing a current API in 2 releases (~3 months period)
>>> will hurt some users.
>>>
>> +1
>>
>>>
>>>
>>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Selectively running tests?

2019-03-19 Thread Udi Meiri
Do you have an example PR Pablo?


On Mon, Mar 18, 2019, 18:23 Alan Myrvold  wrote:

> The includedRegions was set up as part of
> https://issues.apache.org/jira/browse/BEAM-4445 and there are additional
> paths added from
> https://github.com/apache/beam/blob/6bb4b2332b11bd8295ac6965be8426b9c38fa454/.test-infra/jenkins/PrecommitJobBuilder.groovy#L65
>
> Not sure why they are not working, but it would be good to get this going
> again. Might have stopped at the same time Jenkins was updated.
>
> On Mon, Mar 18, 2019 at 3:28 PM Pablo Estrada  wrote:
>
>> We used to have tests run selectively depending on which directories were
>> changes. I've just noticed that this is not the case anymore.
>>
>> Did we stop doing that? Or maybe the selector is faulty? Anyone know what
>> happened here?
>> Thanks!
>> -P.
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jenkins slowness

2019-02-07 Thread Udi Meiri
Precommits times for Python and Java have been slowly climbing:
http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1=1546974237153=1549566237153=light

On Thu, Feb 7, 2019 at 10:54 AM Udi Meiri  wrote:

> If anyone has done any investigation/is working on this please share.
>
> I'm investigating Jenkins slowness. I've noticed it happening since
> yesterday: precommits taking 3 hours to start, phrase commands similarly
> taking as much time to register.
>
> My current theory is that we have a job that's are taking much longer than
> usual to run.
>


smime.p7s
Description: S/MIME Cryptographic Signature


Jenkins slowness

2019-02-07 Thread Udi Meiri
If anyone has done any investigation/is working on this please share.

I'm investigating Jenkins slowness. I've noticed it happening since
yesterday: precommits taking 3 hours to start, phrase commands similarly
taking as much time to register.

My current theory is that we have a job that's are taking much longer than
usual to run.


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jenkins slowness

2019-02-07 Thread Udi Meiri
There is also excessive python test logging tracked here:
https://issues.apache.org/jira/browse/BEAM-6603

On Thu, Feb 7, 2019, 12:23 Udi Meiri  I suggest disabling Jacoco and re-enabling the build cache until we can
> migrate to Gradle 5. I imagine the migration to v5 is not a simple change.
> Meanwhile, I can't run postcommits on PRs on Jenkins (run seed job + run
> postcommit).
>
> On Thu, Feb 7, 2019, 12:00 Chamikara Jayalath 
>> Seems like there was a spike for all build times yesterday probably added
>> up to give slow Jenkins scheduling times for triggers. Also, seems like we
>> had three spikes that are about a week apart recently.
>>
>>
>> On Thu, Feb 7, 2019 at 11:46 AM Michael Luckey 
>> wrote:
>>
>>> What might have some influence is the implicit disabling of the build
>>> cache by activating Jacoco report. There seems to be a increase of
>>> beam_PreCommit_Java_Cron with
>>> https://builds.apache.org/job/beam_PreCommit_Java_Cron/914/ and looking
>>> into cacheable task there seems to be lots of work done now which
>>> previously was cacheable.
>>>
>>> Not sure, whether this is the culprit- or part of it -, but I d suggest
>>> to upgrade to gradle 5 pretty fast.
>>>
>>> On Thu, Feb 7, 2019 at 8:18 PM Udi Meiri  wrote:
>>>
>>>> If anyone has done any investigation/is working on this please share.
>>>>
>>>> I'm investigating Jenkins slowness. I've noticed it happening since
>>>> yesterday: precommits taking 3 hours to start, phrase commands similarly
>>>> taking as much time to register.
>>>>
>>>> My current theory is that we have a job that's are taking much longer
>>>> than usual to run.
>>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jenkins slowness

2019-02-07 Thread Udi Meiri
I suggest disabling Jacoco and re-enabling the build cache until we can
migrate to Gradle 5. I imagine the migration to v5 is not a simple change.
Meanwhile, I can't run postcommits on PRs on Jenkins (run seed job + run
postcommit).

On Thu, Feb 7, 2019, 12:00 Chamikara Jayalath  Seems like there was a spike for all build times yesterday probably added
> up to give slow Jenkins scheduling times for triggers. Also, seems like we
> had three spikes that are about a week apart recently.
>
>
> On Thu, Feb 7, 2019 at 11:46 AM Michael Luckey 
> wrote:
>
>> What might have some influence is the implicit disabling of the build
>> cache by activating Jacoco report. There seems to be a increase of
>> beam_PreCommit_Java_Cron with
>> https://builds.apache.org/job/beam_PreCommit_Java_Cron/914/ and looking
>> into cacheable task there seems to be lots of work done now which
>> previously was cacheable.
>>
>> Not sure, whether this is the culprit- or part of it -, but I d suggest
>> to upgrade to gradle 5 pretty fast.
>>
>> On Thu, Feb 7, 2019 at 8:18 PM Udi Meiri  wrote:
>>
>>> If anyone has done any investigation/is working on this please share.
>>>
>>> I'm investigating Jenkins slowness. I've noticed it happening since
>>> yesterday: precommits taking 3 hours to start, phrase commands similarly
>>> taking as much time to register.
>>>
>>> My current theory is that we have a job that's are taking much longer
>>> than usual to run.
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Signing off

2019-02-15 Thread Udi Meiri
Good luck Scott!

On Fri, Feb 15, 2019 at 9:32 AM Alex Amato  wrote:

> Thanks's for your contributions Scott. We will miss you.
>
> On Fri, Feb 15, 2019 at 7:08 AM Etienne Chauchot 
> wrote:
>
>> Thank you for your contributions Scott ! Your new project seems very fun.
>> Enjoy !
>>
>> Etienne
>>
>> Le vendredi 15 février 2019 à 15:01 +0100, Ismaël Mejía a écrit :
>>
>> Your work and willingness to make Beam better will be missed.
>>
>> Good luck for the next phase!
>>
>>
>> On Fri, Feb 15, 2019 at 1:39 PM Łukasz Gajowy  wrote:
>>
>>
>> Good luck!
>>
>>
>> pt., 15 lut 2019 o 11:24 Alexey Romanenko  
>> napisał(a):
>>
>>
>> Good luck, Scott, with your new adventure!
>>
>>
>> On 15 Feb 2019, at 11:22, Maximilian Michels  wrote:
>>
>>
>> Thank you for your contributions Scott. Best of luck!
>>
>>
>> On 15.02.19 10:48, Michael Luckey wrote:
>>
>>
>> Hi Scott,
>>
>> yes, thanks for all your time and all the best!
>>
>> michel
>>
>> On Fri, Feb 15, 2019 at 5:47 AM Kenneth Knowles > > wrote:
>>
>>+1
>>
>>Thanks for the contributions to community & code, and enjoy the new
>>
>>chapter!
>>
>>Kenn
>>
>>On Thu, Feb 14, 2019 at 3:25 PM Thomas Weise >
>>> wrote:
>>
>>Hi Scott,
>>
>>Thank you for the many contributions to Beam and best of luck
>>
>>with the new endeavor!
>>
>>Thomas
>>
>>On Thu, Feb 14, 2019 at 10:37 AM Scott Wegner >
>>> wrote:
>>
>>I wanted to let you all know that I've decided to pursue a
>>
>>new adventure in my career, which will take me away from
>>
>>Apache Beam development.
>>
>>It's been a fun and fulfilling journey. Apache Beam has been
>>
>>my first significant experience working in open source. I'm
>>
>>inspired observing how the community has come together to
>>
>>deliver something great.
>>
>>Thanks for everything. If you're curious what's next: I'll
>>
>>be working on Federated Learning at Google:
>>
>>
>> https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
>>
>>Take care,
>>
>>Scott
>>
>>Got feedback? tinyurl.com/swegner-feedback
>>
>>
>>
>>
>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: gradle clean causes long-running python installs

2019-02-19 Thread Udi Meiri
I think I can solve this issue by removing the dependency and adding a
check to see if the virtualenv was created.
Otherwise, there shouldn't be anything to cleanup anyway.

On Sat, Feb 16, 2019 at 8:04 PM Ryan Williams  wrote:

> Thanks Michael, your assessment was correct. I needed python3.5 on my
> $PATH.
>
> For completeness, I needed this
> <https://github.com/pyenv/pyenv/wiki/Common-build-problems#build-failed-error-the-python-zlib-extension-was-not-compiled-missing-the-zlib>
> to get pyenv to install python 3.5.6 on macOS:
>
> ```
> CPPFLAGS="-I$(brew --prefix zlib)/include" pyenv install -v 3.5.6
> ```
>
> `./gradlew clean` worked after that.
>
>
> On Sat, Feb 16, 2019 at 7:28 PM Michael Luckey 
> wrote:
>
>> As far as I understand, the build got bound to 3.5 [1].
>>
>> Could it be that you do not have python3.5 on your path? e.g. try  python3.5
>> --version
>>
>> If that is missing, you will not be able to run any py3 task, I guess...
>>
>> So, to get out of this state, you have to get python3.5 command working.
>>
>> At least on my machine, './gradlew clean' is working iff python3.5 is on
>> my path.
>>
>> michel
>>
>> [1]
>> https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1591
>>
>>
>> On Sat, Feb 16, 2019 at 9:54 PM Ryan Williams 
>> wrote:
>>
>>> I'm seeing the same thing as Thomas above: `./gradlew clean` fails on
>>> two py3 setupVirtualEnv tasks,
>>> :beam-sdks-python-test-suites-direct-py3:setupVirtualenv and
>>> :beam-sdks-python-test-suites-dataflow-py3:setupVirtualenv. Here's full
>>> output
>>> <https://gist.github.com/ryan-williams/402e20131d23905163de3a4e2b178f39>
>>> .
>>>
>>> Any tips how to get out of this state, or what is causing it?
>>>
>>> On Fri, Feb 8, 2019 at 7:17 PM Kenneth Knowles  wrote:
>>>
>>>> Maybe add that case to https://issues.apache.org/jira/browse/BEAM-6459.
>>>>
>>>> Kenn
>>>>
>>>> On Fri, Feb 8, 2019 at 9:09 AM Thomas Weise  wrote:
>>>>
>>>>> Probably related, a top level ./gradlew clean fails with the following:
>>>>>
>>>>> > Task :beam-sdks-python-precommit-direct-py3:setupVirtualenv FAILED
>>>>> The path python3.5 (from --python=python3.5) does not exist
>>>>>
>>>>> Can we just limit clean to do cleanup?!
>>>>>
>>>>>
>>>>> On Mon, Jan 21, 2019 at 7:58 AM Robert Bradshaw 
>>>>> wrote:
>>>>>
>>>>>> Just some background, grpcio-tools is what's used to do the proto
>>>>>> generation. Unfortunately it's expensive to compile and doesn't
>>>>>> provide very many wheels, so we want to install it once, not every
>>>>>> time. (It's also used in more than just tests; one needs it every time
>>>>>> the .proto files change.)
>>>>>>
>>>>>> That being said, we could probably do a much cheaper clean.
>>>>>>
>>>>>> On Fri, Jan 18, 2019 at 8:56 PM Udi Meiri  wrote:
>>>>>> >
>>>>>> > grpcio-tools could probably be moved under the "test" tag in
>>>>>> setup.py. Not sure why it has to be specified in gradle configs.
>>>>>> >
>>>>>> > On Fri, Jan 18, 2019 at 11:43 AM Kenneth Knowles 
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Can you `setupVirtualEnv` just enough to run `setup.py clean`
>>>>>> without installing gcpio-tools, etc?
>>>>>> >>
>>>>>> >> Kenn
>>>>>> >>
>>>>>> >> On Fri, Jan 18, 2019 at 11:20 AM Udi Meiri 
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>> setup.py has requirements like setuptools, which are installed in
>>>>>> the virtual environment.
>>>>>> >>> So even running the clean command requires the virtualenv to be
>>>>>> set up.
>>>>>> >>>
>>>>>> >>> A possible fix could be to skip :beam-sdks-python:cleanPython if
>>>>>> setupVirtualenv has not been run. (perhaps by checking for the existence 
>>>>>> of
>>>>>> its output directory)
>>>>>> >>>
>>>

Re: Findbugs -> Spotbugs ?

2019-01-31 Thread Udi Meiri
+1 for spotbugs

On Thu, Jan 31, 2019 at 5:09 AM Gleb Kanterov  wrote:

> Agree, spotbugs brings static checks that aren't covered in error-prone,
> it's a good addition. There are few conflicts between error-prone and
> spotbugs, for instance, the approach to enum switch exhaustiveness, but it
> can be configured.
>
> On Thu, Jan 31, 2019 at 10:53 AM Ismaël Mejía  wrote:
>
>> Not a blocker but there is not a spotbugs plugin for IntelliJ.
>>
>> On Thu, Jan 31, 2019 at 10:45 AM Ismaël Mejía  wrote:
>> >
>> > YES PLEASE let's move to spotbugs !
>> > Findbugs has not had a new release in ages, and does not support Java
>> > 11 either, so this will address another possible issue.
>> >
>> > On Thu, Jan 31, 2019 at 8:28 AM Kenneth Knowles 
>> wrote:
>> > >
>> > > Over the last few hours I activated findbugs on the Dataflow Java
>> worker and fixed or suppressed the errors. They started around 60 but
>> fixing some uncovered others, etc. You can see the result at
>> https://github.com/apache/beam/pull/7684.
>> > >
>> > > It has convinced me that findbugs still adds value, beyond errorprone
>> and nullaway/checker/infer. Quite a few of the issues were not nullability
>> related, though nullability remains the most obvious low-hanging fruit
>> where a different tool would do even better than findbugs. I have not yet
>> enable "non null by default" which exposes 100+ new bugs in the worker, at
>> minimum.
>> > >
>> > > Are there known blockers for upgrading to spotbugs so we are
>> depending on an active project?
>> > >
>> > > Kenn
>>
>
>
> --
> Cheers,
> Gleb
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PROPOSAL] Custom JVM initialization for Beam workers

2019-04-15 Thread Udi Meiri
Is this like the way Python SDK allows for a custom setup.py?
example:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py

On Fri, Apr 12, 2019 at 10:51 AM Lukasz Cwik  wrote:

> +1 on the use cases that Ahmet pointed out and the solution that Brian put
> forth. I like how the change is being applied to the Beam Java SDK harness
> and not just Dataflow so all portable runner users get this as well.
>
> On Wed, Apr 10, 2019 at 9:03 PM Kenneth Knowles  wrote:
>
>>
>>
>> On Wed, Apr 10, 2019 at 8:18 PM Ahmet Altay  wrote:
>>
>>>
>>>
>>> On Wed, Apr 10, 2019 at 7:59 PM Kenneth Knowles  wrote:
>>>
 TL;DR I like the simple approach better than the ServiceLoader solution
 when a particular DoFn depends on the result. The ServiceLoader solution
 fits when it is somewhat independent of a particular DoFn (I'm not sure the
 use case(s)).

 On Wed, Apr 10, 2019 at 4:10 PM Brian Hulette 
 wrote:

> - Each DoFn that depends on that initialization needs to include the
> same initialization
>

 What if a DoFn that depends on the initialization is used in a new
 context? Then it is relying on initialization done elsewhere, and it will
 break or, worse, give wrong results. So I think this bullet point is a
 feature, not a bug. And if the initialization is built as a static method
 of some third class, referenced by all the DoFns that need it, it is a
 one-liner to declare the dependency explicitly.


> - There is no way for users to know which workers executed a
> particular DoFn - users could have workers with different configurations
>

 What is a worker? j/k. Each runner has different notions of what a
 worker is, including the Java SDK Harness. But they all do require one or
 more JVMs. It is true that you can't easily predict which DoFn classes are
 loaded on a particular JVM. This bullet is a strong case against
 initialization at a distance. I think your proposed solution and also the
 simple static block approach avoid this pitfall, so all is good.

 You could perhaps argue that these are actually good things - we only
> run the initialization when it's needed - but it could also lead to
> confusing behavior.
>

 FWIW my argument above is not about only running when needed. The
 opposite - it is about being certain it is run when needed.


> So I'd like to a propose an addition to the Java SDK that provides
> hooks for JVM initialization that is guaranteed to execute once across all
> worker workers. I've written up a PR [1] that implements this. It adds a
> service interface, BeamWorkerInitializer, that users can implement to
> define some initialization, and modifies workers (currently just the
> portable worker and the dataflow worker) to find and execute these
> implementations using ServiceLoader. BeamWorkerInitializer has two methods
> that can be overriden: onStartup, which workers run immediately after
> starting, and beforeProcessing, which workers run after initializing 
> things
> like logging, but before beginning to process data.
>
> Since this is a pretty fundamental change I wanted to have a quick
> discussion here before merging, in case there are any comments or 
> concerns.
>

 FWIW (again) I have no objection to the general idea and don't have any
 problem with making such a fundamental change. I actually think your change
 is probably useful. But if a particular DoFn depends on the JVM being
 configured a certain way, a static block in that DoFn class seems more
 readable and reliable.

 Are there use cases for more generic JVM initialization that,
 presumably, a user would want to affect all their DoFns?

>>>
>>> A few things I can recall from recent user interactions are a need for
>>> setting a custom ssl providers, time zone rules providers. Users would want
>>> such settings to apply for all their dofns in a pipeline.
>>>
>>
>> This makes sense. Another perspective is whether the
>> initialization/configuration might be orthogonal to the DoFns in the
>> pipeline. These seem to fit that description.
>>
>> Kenn
>>
>>
>>>
>>>

 Kenn


> Thanks!
> Brian
>
> [1] https://github.com/apache/beam/pull/8104
>



smime.p7s
Description: S/MIME Cryptographic Signature


pickler.py issue with nested classes

2019-04-16 Thread Udi Meiri
I was looking at migrating unit tests to pytest and found this test which
doesn't pass:
https://gist.github.com/udim/a71fcb278b56a9a5b7962f4588e14efb (stack
overflow)
(requires installing python3.7 and "python3.7 -m pip install pytest".)
The same command passes with python2.7 and python3.5.

I tried isolating the issue and created a test case which fails similarly
on all Python versions I tried (2.7, 3.5, 3.7):

def test_local_nested_class(self):
  class LocalNestedClass(object):
def __init__(self, data):
  # TODO: commenting out the call to __init__ makes the test pass
  super(LocalNestedClass, self).__init__()
  self.data = data

  self.assertEqual('abc', loads(dumps(LocalNestedClass('abc'))).data)

(added to PicklerTest)

Any ideas why this fails, and why removing the call to
super(...).__init__() makes a difference?
Is DataflowRunnerTest::test_remote_runner_display_data trying to do
something that's not supposed to work?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: pickler.py issue with nested classes

2019-04-16 Thread Udi Meiri
Not sure: my case is using a nested class and the error is a stack overflow
(or infinite recursion detection is triggered).

It is odd though that they have the same workaround.


smime.p7s
Description: S/MIME Cryptographic Signature


pytest migration progress

2019-04-12 Thread Udi Meiri
Hi,
I'm making progress on the pytest migration here:
https://github.com/apache/beam/pull/7949

The PR does not replace nose (yet) - that would require more work and a
verification effort to make sure no test gets left behind.

- Udi


smime.p7s
Description: S/MIME Cryptographic Signature


Re: python integration tests flake detection

2019-06-25 Thread Udi Meiri
Yes. It only outputs to one filename though, so it'd need some working
around (our ITs might have more than one nose run).
Some tests run in docker, so that might need work to get the xml out.

On Tue, Jun 25, 2019 at 10:11 AM Ahmet Altay  wrote:

> There is a nose plugin [1] for outputting test results in xunit format.
> Would that work?
>
> [1] https://nose.readthedocs.io/en/latest/plugins/xunit.html
>
> On Tue, Jun 25, 2019 at 10:04 AM Udi Meiri  wrote:
>
>> The current state of Python post-commit tests is pretty flaky.
>> I was wondering if we had any stats for integration tests, to help
>> identify which tests are causing the most failures. Jenkins keeps some
>> history for tests (example
>> <https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/testReport/apache_beam.coders.avro_coder_test/CodersTest/test_avro_record_coder/history/>),
>> but it requires junit-style .xml output.
>>
>> Would it be possible to get our integration test results into Jenkins?
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


python integration tests flake detection

2019-06-25 Thread Udi Meiri
The current state of Python post-commit tests is pretty flaky.
I was wondering if we had any stats for integration tests, to help identify
which tests are causing the most failures. Jenkins keeps some history for
tests (example
),
but it requires junit-style .xml output.

Would it be possible to get our integration test results into Jenkins?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUNCE] New committer: Mikhail Gryzykhin

2019-06-25 Thread Udi Meiri
Congrats Mikhail!

On Tue, Jun 25, 2019 at 2:32 AM Gleb Kanterov  wrote:

> Congratulations!
>
> On Tue, Jun 25, 2019 at 2:03 AM Connell O'Callaghan 
> wrote:
>
>> Thomas thank you for sharing this
>>
>> Congratulations on this Mikhail!!!
>>
>> On Mon, Jun 24, 2019 at 3:19 PM Kai Jiang  wrote:
>>
>>> Congrats!
>>>
>>> On Mon, Jun 24, 2019 at 1:46 PM Chamikara Jayalath 
>>> wrote:
>>>
 Congrats!!

 On Mon, Jun 24, 2019 at 11:12 AM Mikhail Gryzykhin 
 wrote:

> Thank you everyone.
>
> On Mon, Jun 24, 2019 at 2:28 AM Aizhamal Nurmamat kyzy <
> aizha...@google.com> wrote:
>
>> Congrats Misha!
>>
>> On Mon, Jun 24, 2019 at 11:23 Łukasz Gajowy 
>> wrote:
>>
>>> Congratulations Mikhail!
>>>
>>> pt., 21 cze 2019 o 22:09 Ruoyun Huang 
>>> napisał(a):
>>>
 Congratulations! Mikhail!


 On Fri, Jun 21, 2019 at 1:00 PM Yichi Zhang 
 wrote:

> Congrats!
>
> On Fri, Jun 21, 2019 at 11:55 AM Tanay Tummalapalli <
> ttanay...@gmail.com> wrote:
>
>> Congratulations!
>>
>> On Fri, Jun 21, 2019 at 10:35 PM Rui Wang 
>> wrote:
>>
>>> Congrats!
>>>
>>>
>>> -Rui
>>>
>>> On Fri, Jun 21, 2019 at 9:58 AM Robin Qiu 
>>> wrote:
>>>
 Congrats, Mikhail!

 On Fri, Jun 21, 2019 at 9:12 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Congrats, Mikhail!
>
> On 21 Jun 2019, at 18:01, Anton Kedin 
> wrote:
>
> Congrats!
>
> On Fri, Jun 21, 2019 at 3:55 AM Reza Rokni 
> wrote:
>
>> Congratulations!
>>
>> On Fri, 21 Jun 2019, 12:37 Robert Burke, 
>> wrote:
>>
>>> Congrats
>>>
>>> On Fri, Jun 21, 2019, 12:29 PM Thomas Weise 
>>> wrote:
>>>
 Hi,

 Please join me and the rest of the Beam PMC in welcoming a
 new committer: Mikhail Gryzykhin.

 Mikhail has been contributing to Beam and actively involved
 in the community for over a year. He developed the community 
 build
 dashboard [1] and added substantial improvements to our build
 infrastructure. Mikhail's work also covers metrics, contributor
 documentation, development process improvements and other 
 areas.

 In consideration of Mikhail's contributions, the Beam PMC
 trusts him with the responsibilities of a Beam committer [2].

 Thank you, Mikhail, for your contributions and looking
 forward to many more!

 Thomas, on behalf of the Apache Beam PMC

 [1] https://s.apache.org/beam-community-metrics
 
 [2]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
 


>

 --
 
 Ruoyun  Huang


>
> --
> Cheers,
> Gleb
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: GitHub checks not running

2019-06-17 Thread Udi Meiri
I think we reached an upper limit on the Jenkins queue length (the grey
flat line):
[image: graph.png]
(https://builds.apache.org/label/beam/load-statistics?type=sec10)


On Mon, Jun 17, 2019 at 9:27 AM Anton Kedin  wrote:

> They are getting triggered now.
>
> On Mon, Jun 17, 2019 at 9:10 AM Anton Kedin  wrote:
>
>> Hi dev@,
>>
>> Does anyone has context on why the checks might not get triggered on pull
>> requests today? E.g. https://github.com/apache/beam/pull/8822
>>
>> Regards,
>> Anton
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


pickling typing types in Python 3.5+

2019-05-13 Thread Udi Meiri
It seems like pickling of typing types is broken in 3.5 and 3.6, fixed in
3.7:
https://github.com/python/typing/issues/511

Here are my attempts:
https://gist.github.com/udim/ec213305ca865390c391001e8778e91d


My ideas:
1. I know that we override type object handling in pickler.py
(_nested_type_wrapper), and perhaps this mechanism can be used to pickle
typing classes correctly. The question is how.

2. Exclude/stub out these classes when pickling a pipeline - they are only
used for verification during pipeline construction anyway. This could be a
temporary solution for versions 3.5 and 3.6.

Any ideas / opinions?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Quota: In use IP-adresses

2019-05-24 Thread Udi Meiri
We're running up against this limit: "Quota 'IN_USE_ADDRESSES' exceeded.
Limit: 750.0 in region us-central1."

On Fri, May 24, 2019 at 8:36 AM Valentyn Tymofieiev 
wrote:

> I did this for a few other resources recently (CPU, Disk). If this keeps
> being a problem we can lower test parallelism.
>
> On Thu, May 23, 2019, 3:48 PM Mikhail Gryzykhin  wrote:
>
>> Hello everybody,
>>
>> Some of our jobs fail with 1/0 in use IP-addresses quota exception.
>>
>> Seems that we spin-up too many VMs and run out of IP-addresses. Should we
>> bump the quota to mitigate the issue?
>>
>> Regards,
>> Mikhail.
>>
>> ---
>> https://issues.apache.org/jira/browse/BEAM-7410
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Quota: In use IP-adresses

2019-05-24 Thread Udi Meiri
I opened a support request to increase the quota.

On Fri, May 24, 2019 at 9:59 AM Udi Meiri  wrote:

> We're running up against this limit: "Quota 'IN_USE_ADDRESSES' exceeded.
> Limit: 750.0 in region us-central1."
>
> On Fri, May 24, 2019 at 8:36 AM Valentyn Tymofieiev 
> wrote:
>
>> I did this for a few other resources recently (CPU, Disk). If this keeps
>> being a problem we can lower test parallelism.
>>
>> On Thu, May 23, 2019, 3:48 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hello everybody,
>>>
>>> Some of our jobs fail with 1/0 in use IP-addresses quota exception.
>>>
>>> Seems that we spin-up too many VMs and run out of IP-addresses. Should
>>> we bump the quota to mitigate the issue?
>>>
>>> Regards,
>>> Mikhail.
>>>
>>> ---
>>> https://issues.apache.org/jira/browse/BEAM-7410
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Plans for Python type hints

2019-05-08 Thread Udi Meiri
Hi,
I've written a document, with input from robertwb@, detailing the direction
forward I want to take type hints in Python 3. The document contains
background, a survey of existing type tools, and example usage.
The summary of proposed changes is:


   1.

   Update Beam's type hinting support to work with Python 3, with minimal
   changes and keeping backwards compatibility.
   1.

  Support Py3 type hints.
  2.

  Fix trivial_inference module to work with Py3 bytecode.
  2.

   Migrate to standard typing module types, to make it easier to migrate to
   using external packages later on.
   3.

   Start using external typing packages to simplify maintenance and add
   features (such as better inference).


Any comments would be welcome here or on the doc.

doc:
https://docs.google.com/document/d/15bsOL3YcUWuIjnxqhi9nanhj2eh9S6-QlLYuL7ufcXY/edit?usp=sharing
JIRA: https://issues.apache.org/jira/browse/BEAM-7060


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [DISCUSS] Portability representation of schemas

2019-05-08 Thread Udi Meiri
>From a Python type hints perspective, how do schemas fit? Type hints are
currently used to determine which coder to use.
It seems that given a schema field, it would be useful to be able to
convert it to a coder (using URNs?), and to convert the coder into a typing
type.
This would allow for pipeline-construction-time type compatibility checks.

Some questions:
1. Why are there 4 types of int (byte, int16, int32, int64)? Is it to
maintain type fidelity when writing back? If so, what happens in languages
that only have "int"?
2. What is encoding_position? How does it differ from id (which is also a
position)?
3. When are schema protos constructed? Are they available during pipeline
construction or afterwards?
4. Once data is read into a Beam pipeline and a schema inferred, do we
maintain the schema types throughout the pipeline or use language-local
types?


On Wed, May 8, 2019 at 6:39 PM Robert Bradshaw  wrote:

> From: Reuven Lax 
> Date: Wed, May 8, 2019 at 10:36 PM
> To: dev
>
> > On Wed, May 8, 2019 at 1:23 PM Robert Bradshaw 
> wrote:
> >>
> >> Very excited to see this. In particular, I think this will be very
> >> useful for cross-language pipelines (not just SQL, but also for
> >> describing non-trivial data (e.g. for source and sink reuse).
> >>
> >> The proto specification makes sense to me. The only thing that looks
> >> like it's missing (other than possibly iterable, for arbitrarily-large
> >> support) is multimap. Another basic type, should we want to support
> >> it, is union (though this of course can get messy).
> >
> > multimap is an interesting suggestion. Do you have a use case in mind?
> >
> > union (or oneof) is also a good suggestion. There are good use cases for
> this, but this is a more fundamental change.
>
> No specific usecase, they just seemed to round out the options.
>
> >> I'm curious what the rational was for going with a oneof for type_info
> >> rather than an repeated components like we do with coders.
> >
> > No strong reason. Do you think repeated components is better than oneof?
>
> It's more consistent with how we currently do coders (which has pros and
> cons).
>
> >> Removing DATETIME as a logical coder on top of INT64 may cause issues
> >> of insufficient resolution and/or timespan. Similarly with DECIMAL (or
> >> would it be backed by string?)
> >
> > There could be multiple TIMESTAMP types for different resolutions, and
> they don't all need the same backing field type. E.g. the backing type for
> nanoseconds could by Row(INT64, INT64), or it could just be a byte array.
>
> Hmm What would the value be in supporting different types of
> timestamps? Would all SDKs have to support all of them? Can one
> compare, take differences, etc. across timestamp types? (As Luke
> points out, the other conversation on timestamps is likely relevant
> here as well.)
>
> >> The biggest question, as far as portability is concerned at least, is
> >> the notion of logical types. serialized_class is clearly not portable,
> >> and I also think we'll want a way to share semantic meaning across
> >> SDKs (especially if things like dates become logical types). Perhaps
> >> URNs (+payloads) would be a better fit here?
> >
> > Yes, URN + payload is probably the better fit for portability.
> >
> >> Taking a step back, I think it's worth asking why we have different
> >> types, rather than simply making everything a LogicalType of bytes
> >> (aka coder). Other than encoding format, the answer I can come up with
> >> is that the type decides the kinds of operations that can be done on
> >> it, e.g. does it support comparison? Arithmetic? Containment?
> >> Higher-level date operations? Perhaps this should be used to guide the
> >> set of types we provide.
> >
> > Also even though we could make everything a LogicalType (though at least
> byte array would have to stay primitive), I think  it's useful to have a
> slightly larger set of primitive types.  It makes things easier to
> understand and debug, and it makes it simpler for the various SDKs to map
> them to their types (e.g. mapping to POJOs).
>
>  This would be the case if one didn't have LogicalType at all, but
> once one introduces that one now has this more complicated two-level
> hierarchy of types which doesn't seem simpler to me.
>
> I'm trying to understand what information Schema encodes that a
> NamedTupleCoder (or RowCoder) would/could not. (Coders have the
> disadvantage that there are multiple encodings of a single value, e.g.
> BigEndian vs. VarInt, but if we have multiple resolutions of timestamp
> that would still seem to be an issue. Possibly another advantage is
> encoding into non-record-oriented formats, e.g. Parquet or Arrow, that
> have a set of primitives.)
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUNCE] New PMC Member: Pablo Estrada

2019-05-16 Thread Udi Meiri
Congrats Pablo!

On Thu, May 16, 2019 at 9:27 AM Thomas Weise  wrote:

> Congratulations, Pablo!
>
>
> On Thu, May 16, 2019 at 5:03 AM Katarzyna Kucharczyk <
> ka.kucharc...@gmail.com> wrote:
>
>> Wow, great news!  Congratulations, Pablo!
>>
>> On Thu, May 16, 2019 at 1:28 PM Michał Walenia <
>> michal.wale...@polidea.com> wrote:
>>
>>> Congratulations, Pablo!
>>>
>>> On Thu, May 16, 2019 at 1:55 AM Rose Nguyen  wrote:
>>>
 Congrats, Pablo!!

 On Wed, May 15, 2019 at 4:43 PM Heejong Lee  wrote:

> Congratulations!
>
> On Wed, May 15, 2019 at 12:24 PM Niklas Hansson <
> niklas.sven.hans...@gmail.com> wrote:
>
>> Congratulations Pablo :)
>>
>> Den ons 15 maj 2019 kl 21:21 skrev Ruoyun Huang :
>>
>>> Congratulations, Pablo!
>>>
>>> *From: *Charles Chen 
>>> *Date: *Wed, May 15, 2019 at 11:04 AM
>>> *To: *dev
>>>
>>> Congrats Pablo and thank you for your contributions!

 On Wed, May 15, 2019, 10:53 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Congrats, Pablo!
>
> On Wed, May 15, 2019 at 10:41 AM Yifan Zou 
> wrote:
>
>> Congratulations, Pablo!
>>
>> *From: *Maximilian Michels 
>> *Date: *Wed, May 15, 2019 at 2:06 AM
>> *To: * 
>>
>> Congrats Pablo! Thank you for your help to grow the Beam
>>> community!
>>>
>>> On 15.05.19 10:33, Tim Robertson wrote:
>>> > Congratulations Pablo
>>> >
>>> > On Wed, May 15, 2019 at 10:22 AM Ismaël Mejía <
>>> ieme...@gmail.com
>>> > > wrote:
>>> >
>>> > Congrats Pablo, well deserved, nece to see your work
>>> recognized!
>>> >
>>> > On Wed, May 15, 2019 at 9:59 AM Pei HE >> > > wrote:
>>> >  >
>>> >  > Congrats, Pablo!
>>> >  >
>>> >  > On Tue, May 14, 2019 at 11:41 PM Tanay Tummalapalli
>>> >  > >> ttanay.apa...@gmail.com>> wrote:
>>> >  > >
>>> >  > > Congratulations Pablo!
>>> >  > >
>>> >  > > On Wed, May 15, 2019, 12:08 Michael Luckey <
>>> adude3...@gmail.com
>>> > > wrote:
>>> >  > >>
>>> >  > >> Congrats, Pablo!
>>> >  > >>
>>> >  > >> On Wed, May 15, 2019 at 8:21 AM Connell O'Callaghan
>>> > mailto:conne...@google.com>> wrote:
>>> >  > >>>
>>> >  > >>> Awesome well done Pablo!!!
>>> >  > >>>
>>> >  > >>> Kenn thank you for sharing this great news with
>>> us!!!
>>> >  > >>>
>>> >  > >>> On Tue, May 14, 2019 at 11:01 PM Ahmet Altay
>>> > mailto:al...@google.com>> wrote:
>>> >  > 
>>> >  >  Congratulations!
>>> >  > 
>>> >  >  On Tue, May 14, 2019 at 9:11 PM Robert Burke
>>> > mailto:rob...@frantil.com>> wrote:
>>> >  > >
>>> >  > > Woohoo! Well deserved.
>>> >  > >
>>> >  > > On Tue, May 14, 2019, 8:34 PM Reuven Lax <
>>> re...@google.com
>>> > > wrote:
>>> >  > >>
>>> >  > >> Congratulations!
>>> >  > >>
>>> >  > >> From: Mikhail Gryzykhin <
>>> gryzykhin.mikh...@gmail.com
>>> > >
>>> >  > >> Date: Tue, May 14, 2019 at 8:32 PM
>>> >  > >> To: >> dev@beam.apache.org>>
>>> >  > >>
>>> >  > >>> Congratulations Pablo!
>>> >  > >>>
>>> >  > >>> On Tue, May 14, 2019, 20:25 Kenneth Knowles
>>> > mailto:k...@apache.org>> wrote:
>>> >  > 
>>> >  >  Hi all,
>>> >  > 
>>> >  >  Please join me and the rest of the Beam PMC in
>>> welcoming
>>> > Pablo Estrada to join the PMC.
>>> >  > 
>>> >  >  Pablo first picked up BEAM-722 in October of
>>> 2016 and
>>> > has been a steady part of the Beam community since then.
>>> In addition
>>> > to technical work on Beam Python & Java & runners, I would
>>> highlight
>>> > how Pablo grows Beam's community by helping users, working
>>> on GSoC,
>>> > giving talks at Beam Summits and other OSS conferences
>>> including
>>> > Flink Forward, and holding training workshops. I cannot do
>>> justice
>>> > to Pablo's contributions in a single paragraph.
>>> >  > 

beam_PreCommit_Python_PVR_Flink_Commit most perma-red

2019-05-20 Thread Udi Meiri
FYI, I opened an issue here: https://issues.apache.org/jira/browse/BEAM-7378

Please triage if you know how these tests work.
Thanks!


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-26 Thread Udi Meiri
Alex, I changed my mind: I'm okay retrying single tests, just not entire
suites of tests (e.g. if precommits take an hour, retrying the run takes up
an additional hour on the Jenkins machine).
This is more of an issue in Python, where gradle does not (currently) have
insight into which tests failed and how to retry just them.



On Fri, Apr 26, 2019 at 2:17 PM Alex Amato  wrote:

> @Udi Meiri , Is this true if the specific tests are
> rerun? I don't think we should rerun all tests.
>
> On Fri, Apr 26, 2019 at 12:11 PM Valentyn Tymofieiev 
> wrote:
>
>> Preinstalling dependencies may affect the dependency resolution, and we
>> may end up testing a different configuration than a user would have after
>> installing beam into a clean environment.
>>
>> I do think pip uses cache, unless one specifies "--no-cache-dir". By
>> default the cache is ~/.cache/pip. Looking up the log message in OP, we can
>> see several "Using cached..." log entries. Not sure why futures was not
>> fetched from cache or PyPi. Perhaps it is also a pip flake.
>>
>> I would be against wiping flakes under the rug by rerunning the whole
>> suite after an error, but re-rerunning parts of the test environment set
>> up, that are prone to environmental flakes, such as setupVirtualEnv seems
>> reasonable. I agree with Udi that care should be taken to not overload
>> Jenkins (e.g. retries should be limited)
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-29 Thread Udi Meiri
Pip has a --cache-dir which should be safe with concurrent writes.

On Fri, Apr 26, 2019 at 3:59 PM Ahmet Altay  wrote:

> It is possible to download dependencies with pip to a local directory and
> install from there [1]. As a side benefit this is supposed to speed up the
> installation process. Since we setup virtualenv multiple times, this could
> actually help us in a single run. And if we can keep this cache across test
> runs we can reduce flakiness.
>
> [1]
> https://pip.pypa.io/en/latest/user_guide/#installing-from-local-packages
>
> On Fri, Apr 26, 2019 at 3:42 PM Valentyn Tymofieiev 
> wrote:
>
>> We do retry certain inherently flaky tests, for example, see[1]. This
>> practice should be used with caution, see discussion [2].
>>
>> However retrying an individual test would not avoid the flake that Alex
>> brought up in this thread, we'd have to retry setupVirtualEnv task that is
>> executed once per suite of tests. Retrying just that task is different from
>> retrying the whole suite.
>>
>> [1]
>> https://github.com/apache/beam/blob/516cdb6401d9fb7adb004de472771fb1fb3a92af/sdks/python/apache_beam/runners/worker/statesampler_test.py#L41,
>> this was discussed
>> [2]
>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af@%3Cdev.beam.apache.org%3E
>>  discussed
>>
>> On Fri, Apr 26, 2019 at 3:30 PM Udi Meiri  wrote:
>>
>>> Alex, I changed my mind: I'm okay retrying single tests, just not entire
>>> suites of tests (e.g. if precommits take an hour, retrying the run takes up
>>> an additional hour on the Jenkins machine).
>>> This is more of an issue in Python, where gradle does not (currently)
>>> have insight into which tests failed and how to retry just them.
>>>
>>>
>>>
>>> On Fri, Apr 26, 2019 at 2:17 PM Alex Amato  wrote:
>>>
>>>> @Udi Meiri , Is this true if the specific tests are
>>>> rerun? I don't think we should rerun all tests.
>>>>
>>>> On Fri, Apr 26, 2019 at 12:11 PM Valentyn Tymofieiev <
>>>> valen...@google.com> wrote:
>>>>
>>>>> Preinstalling dependencies may affect the dependency resolution, and
>>>>> we may end up testing a different configuration than a user would have
>>>>> after installing beam into a clean environment.
>>>>>
>>>>> I do think pip uses cache, unless one specifies "--no-cache-dir". By
>>>>> default the cache is ~/.cache/pip. Looking up the log message in OP, we 
>>>>> can
>>>>> see several "Using cached..." log entries. Not sure why futures was not
>>>>> fetched from cache or PyPi. Perhaps it is also a pip flake.
>>>>>
>>>>> I would be against wiping flakes under the rug by rerunning the whole
>>>>> suite after an error, but re-rerunning parts of the test environment set
>>>>> up, that are prone to environmental flakes, such as setupVirtualEnv seems
>>>>> reasonable. I agree with Udi that care should be taken to not overload
>>>>> Jenkins (e.g. retries should be limited)
>>>>>
>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: investigating python precommit wordcount_it failure

2019-04-18 Thread Udi Meiri
Correction: it's a postcommit failure

On Thu, Apr 18, 2019 at 5:43 PM Udi Meiri  wrote:

> in https://issues.apache.org/jira/browse/BEAM-7111
>
> If anyone has state please lmk
>


smime.p7s
Description: S/MIME Cryptographic Signature


investigating python precommit wordcount_it failure

2019-04-18 Thread Udi Meiri
in https://issues.apache.org/jira/browse/BEAM-7111

If anyone has state please lmk


smime.p7s
Description: S/MIME Cryptographic Signature


Re: investigating python precommit wordcount_it failure

2019-04-19 Thread Udi Meiri
I believe these are separate issues. BEAM-7111 is about wordcount_it_test
failing on direct runner in streaming mode

On Thu, Apr 18, 2019 at 8:09 PM Valentyn Tymofieiev 
wrote:

> I am working on a postcommit worcount it failure in BEAM-7063.
>
> On Thu, Apr 18, 2019 at 6:05 PM Udi Meiri  wrote:
>
>> Correction: it's a postcommit failure
>>
>> On Thu, Apr 18, 2019 at 5:43 PM Udi Meiri  wrote:
>>
>>> in https://issues.apache.org/jira/browse/BEAM-7111
>>>
>>> If anyone has state please lmk
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-02 Thread Udi Meiri
Congrats everyone!

On Thu, May 2, 2019 at 9:55 AM Ahmet Altay  wrote:

> Congratulations!
>
> On Thu, May 2, 2019 at 9:54 AM Yifan Zou  wrote:
>
>> Congratulations! Well deserved!
>>
>> On Thu, May 2, 2019 at 9:37 AM Rui Wang  wrote:
>>
>>> Congratulations!
>>>
>>>
>>> -Rui
>>>
>>> On Thu, May 2, 2019 at 8:23 AM Michael Luckey 
>>> wrote:
>>>
 Congrats! Well deserved!

 On Thu, May 2, 2019 at 3:29 PM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Congrats!
>
> On 2 May 2019, at 10:06, Gleb Kanterov  wrote:
>
> Congratulations! Well deserved!
>
> On Thu, May 2, 2019 at 10:00 AM Ismaël Mejía 
> wrote:
>
>> Congrats everyone !
>>
>> On Thu, May 2, 2019 at 9:14 AM Robert Bradshaw 
>> wrote:
>>
>>> Congratulation, and thanks for all the great contributions each one
>>> of you has made to Beam!
>>>
>>> On Thu, May 2, 2019 at 5:51 AM Ruoyun Huang 
>>> wrote:
>>>
 Congratulations everyone!  Well deserved!

 On Wed, May 1, 2019 at 8:38 PM Kenneth Knowles 
 wrote:

> Congrats! All well deserved!
>
> Kenn
>
> On Wed, May 1, 2019 at 8:09 PM Reza Rokni  wrote:
>
>> Congratulations!
>>
>> On Thu, 2 May 2019 at 10:53, Connell O'Callaghan <
>> conne...@google.com> wrote:
>>
>>> Well done - congratulations to you all!!! Rose thank you for
>>> sharing this news!!!
>>>
>>> On Wed, May 1, 2019 at 19:45 Rose Nguyen 
>>> wrote:
>>>
 Matthias Baetens, Lukazs Gajowy, Suneel Marthi, Maximilian
 Michels, Alex Van Boxel, and Thomas Weise:

 Thank you for your exceptional contributions to Apache
 Beam. I'm looking forward to seeing this project grow and for 
 more folks
 to contribute and be recognized! Everyone can read more about this 
 award on
 the Google Open Source blog:
 https://opensource.googleblog.com/2019/04/google-open-source-peer-bonus-winners.html

 Cheers,
 --
 Rose Thị Nguyễn

>>>
>>
>> --
>> This email may be confidential and privileged. If you received
>> this communication by mistake, please don't forward it to anyone 
>> else,
>> please erase all copies and attachments, and please let me know that 
>> it has
>> gone to the wrong person.
>> The above terms reflect a potential business arrangement, are
>> provided solely as a basis for further discussion, and are not 
>> intended to
>> be and do not constitute a legally binding obligation. No legally 
>> binding
>> obligations will be created, implied, or inferred until an agreement 
>> in
>> final form is executed in writing by all parties involved.
>>
>

 --
 
 Ruoyun  Huang


>
> --
> Cheers,
> Gleb
>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-26 Thread Udi Meiri
That runs the risk of overloading our test machines when the build goes red.

On Fri, Apr 26, 2019 at 11:29 AM Alex Amato  wrote:

> It would be ideal to not need manual steps. If known flakey tests can be
> auto retried that would be a great improvement.
>
> On Fri, Apr 26, 2019 at 11:24 AM Valentyn Tymofieiev 
> wrote:
>
>> We could do something along the lines of retry with a back-off. Note that
>> Java tests also have this problem as we sometimes fail to fetch packages
>> from Maven Central.
>>
>> On Fri, Apr 26, 2019 at 11:19 AM Pablo Estrada 
>> wrote:
>>
>>> hm no, these are somewhat common. Yes, I think we could have retries to
>>> try to fix this sort of problem.
>>>
>>> Perhaps a mixture of reusing a virtualenv, and having retries when
>>> creating it?
>>>
>>> On Fri, Apr 26, 2019 at 11:15 AM Alex Amato  wrote:
>>>
 Okay but this occurred on jenkins. So does the machine need an update?

 On Fri, Apr 26, 2019 at 10:43 AM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> I think you hit a pypi flake.
>
> pip install futures>=2.2.0 works fine for me.
>
> On Fri, Apr 26, 2019 at 9:41 AM Alex Amato  wrote:
>
>> Would be nice to fix this as it can slow down PRs. I am not sure if this 
>> one is fixed on retry yet or not.
>>
>>
>>
>> *https://issues.apache.org/jira/browse/BEAM-7164?filter=-2 
>> *
>>
>>
>>
>> *https://builds.apache.org/job/beam_PreCommit_Python_Commit/6035/consoleFull
>> *
>>
>>
>> *18:05:44* >* Task 
>> :beam-sdks-python-test-suites-dataflow:setupVirtualenv**18:05:44* New 
>> python executable in 
>> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python2.7*18:05:44*
>>  Also creating executable in 
>> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-410805238/bin/python*18:05:44*
>>  Installing setuptools, pkg_resources, pip, wheel...done.*18:05:44* 
>> Running virtualenv with interpreter /usr/bin/python2.7*18:05:44* 
>> DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 
>> 2020. Please upgrade your Python as Python 2.7 won't be maintained after 
>> that date. A future version of pip will drop support for Python 
>> 2.7.*18:05:44* Collecting tox==3.0.0*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/e6/41/4dcfd713282bf3213b0384320fa8841e4db032ddcb80bc08a540159d42a8/tox-3.0.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting grpcio-tools==1.3.5*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/05/f6/0296e29b1bac6f85d2a8556d48adf825307f73109a3c2c17fb734292db0a/grpcio_tools-1.3.5-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>  Collecting pluggy<1.0,>=0.3.0 (from tox==3.0.0)*18:05:44*   Using 
>> cached 
>> https://files.pythonhosted.org/packages/84/e8/4ddac125b5a0e84ea6ffc93cfccf1e7ee1924e88f53c64e98227f0af2a5f/pluggy-0.9.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting six (from tox==3.0.0)*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting virtualenv>=1.11.2 (from tox==3.0.0)*18:05:44*   Using 
>> cached 
>> https://files.pythonhosted.org/packages/4f/ba/6f9315180501d5ac3e707f19fcb1764c26cc6a9a31af05778f7c2383eadb/virtualenv-16.5.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting py>=1.4.17 (from tox==3.0.0)*18:05:44*   Using cached 
>> https://files.pythonhosted.org/packages/76/bc/394ad449851729244a97857ee14d7cba61ddb268dce3db538ba2f2ba1f0f/py-1.8.0-py2.py3-none-any.whl*18:05:44*
>>  Collecting grpcio>=1.3.5 (from grpcio-tools==1.3.5)*18:05:44*   Using 
>> cached 
>> https://files.pythonhosted.org/packages/7c/59/4da8df60a74f4af73ede9d92a75ca85c94bc2a109d5f67061496e8d496b2/grpcio-1.20.0-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>  Collecting protobuf>=3.2.0 (from grpcio-tools==1.3.5)*18:05:44*   Using 
>> cached 
>> https://files.pythonhosted.org/packages/ea/72/5eadea03b06ca1320be2433ef2236155da17806b700efc92677ee99ae119/protobuf-3.7.1-cp27-cp27mu-manylinux1_x86_64.whl*18:05:44*
>>  Collecting futures>=2.2.0; python_version < "3.2" (from 
>> grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:44*   ERROR: Could not find a 
>> version that satisfies the requirement futures>=2.2.0; python_version < 
>> "3.2" (from grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: 
>> none)*18:05:44* ERROR: No matching distribution found for 
>> futures>=2.2.0; python_version < "3.2" (from 
>> grpcio>=1.3.5->grpcio-tools==1.3.5)*18:05:46* *18:05:46* >* Task 
>> :beam-sdks-python-test-suites-dataflow:setupVirtualenv* FAILED*18:05:46*
>>

apache-beam-jenkins-15 out of disk

2019-06-27 Thread Udi Meiri
Opened a bug here: https://issues.apache.org/jira/browse/BEAM-7648

Can someone investigate what's going on?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: python integration tests flake detection

2019-06-26 Thread Udi Meiri
In lieu of doing a migration to pytest, which is a large effort, I'm trying
to do the same using nose.
Opened https://issues.apache.org/jira/browse/BEAM-7641

On Tue, Jun 25, 2019 at 4:01 PM Udi Meiri  wrote:

> I was thinking that our test infrastructure could use an upgrade to pytest.
>
> Some advantages:
> - It'd allow setting the test suite name. For example, if you look at this
> page
> <https://builds.apache.org/job/beam_PreCommit_Python_Commit/7043/testReport/apache_beam.io.fileio_test/MatchTest/>
>  you'll
> find 3 sets of 4 identically named tests with no way to tell which tox
> environment they were run on (all marked as "nosetests").
> - It will hopefully allow a degree of parallelism (if we can solve some
> pickling errors). This will make running unit tests locally much faster.
> - pytest has cleaner progress reporting
> - no more BeamTestPlugin
> - easier inclusion/exclusion of tests (using markers such as: precommit,
> postcommit, no_direct, no_dataflow, etc.)
>
>
> On Tue, Jun 25, 2019 at 10:50 AM Udi Meiri  wrote:
>
>> Yes. It only outputs to one filename though, so it'd need some working
>> around (our ITs might have more than one nose run).
>> Some tests run in docker, so that might need work to get the xml out.
>>
>> On Tue, Jun 25, 2019 at 10:11 AM Ahmet Altay  wrote:
>>
>>> There is a nose plugin [1] for outputting test results in xunit format.
>>> Would that work?
>>>
>>> [1] https://nose.readthedocs.io/en/latest/plugins/xunit.html
>>>
>>> On Tue, Jun 25, 2019 at 10:04 AM Udi Meiri  wrote:
>>>
>>>> The current state of Python post-commit tests is pretty flaky.
>>>> I was wondering if we had any stats for integration tests, to help
>>>> identify which tests are causing the most failures. Jenkins keeps some
>>>> history for tests (example
>>>> <https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/testReport/apache_beam.coders.avro_coder_test/CodersTest/test_avro_record_coder/history/>),
>>>> but it requires junit-style .xml output.
>>>>
>>>> Would it be possible to get our integration test results into Jenkins?
>>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: apache-beam-jenkins-15 out of disk

2019-06-28 Thread Udi Meiri
  2 days ago  1.305GB
>>>> 965.7MB   339.5MB 0
>>>> hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test  latest
>>>>d1cc503ebe8e2 days ago  1.305GB
>>>> 965.7MB 339.2MB 0
>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8577_testlatest
>>>>8582c6ca6e153 days ago  1.305GB
>>>> 965.7MB  339.2MB 0
>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8576_testlatest
>>>>4591e09481703 days ago  1.305GB
>>>> 965.7MB  339.2MB 0
>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8575_testlatest
>>>>ab181c49d56e4 days ago  1.305GB
>>>> 965.7MB  339.2MB 0
>>>> hdfs_it-jenkins-beam_postcommit_python_verify-8573_testlatest
>>>>2104ba0a6db74 days ago  1.305GB
>>>> 965.7MB  339.2MB 0
>>>> ...
>>>> <1000+ images>
>>>>
>>>> I removed unused the images and the beam15 is back now.
>>>>
>>>> Opened https://issues.apache.org/jira/browse/BEAM-7650.
>>>> Ankur, I assigned the issue to you. Feel free to reassign it if needed.
>>>>
>>>> Thank you.
>>>> Yifan
>>>>
>>>> On Thu, Jun 27, 2019 at 11:29 AM Yifan Zou  wrote:
>>>>
>>>>> Something were eating the disk. Disconnected the worker so jobs could
>>>>> be allocated to other nodes. Will look deeper.
>>>>> Filesystem  Size  Used  Avail Use% Mounted on
>>>>> /dev/sda1   485G  485G 96K 100%  /
>>>>>
>>>>>
>>>>> On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou 
>>>>> wrote:
>>>>>
>>>>>> I'm on it.
>>>>>>
>>>>>> On Thu, Jun 27, 2019 at 10:17 AM Udi Meiri  wrote:
>>>>>>
>>>>>>> Opened a bug here: https://issues.apache.org/jira/browse/BEAM-7648
>>>>>>>
>>>>>>> Can someone investigate what's going on?
>>>>>>>
>>>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Phrase triggering jobs problem

2019-07-10 Thread Udi Meiri
Still happening for me too.

On Wed, Jul 10, 2019 at 10:40 AM Lukasz Cwik  wrote:

> This has happened in the past. Usually there is some issue where Jenkins
> isn't notified of new PRs by Github or doesn't see the PR phrases and hence
> Jenkins sits around idle. This is usually fixed after a few hours without
> any action on our part.
>
> On Wed, Jul 10, 2019 at 10:28 AM Katarzyna Kucharczyk <
> ka.kucharc...@gmail.com> wrote:
>
>> Hi all,
>>
>> Hope it's not duplicate but I can't find if any issue with phrase
>> triggering in Jenkins was already here.
>> Currently, I started third PR and no test were triggered there. I tried
>> to trigger some tests manually, but with no effect.
>>
>> Am I missing something?
>>
>> Here are links to my problematic PRs:
>> https://github.com/apache/beam/pull/9033
>> https://github.com/apache/beam/pull/9034
>> https://github.com/apache/beam/pull/9035
>>
>> Thanks,
>> Kasia
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Phrase triggering jobs problem

2019-07-11 Thread Udi Meiri
I've opened a bug: https://issues.apache.org/jira/browse/BEAM-7723
If anyone is working on this please assign yourself

On Wed, Jul 10, 2019 at 5:57 PM Udi Meiri  wrote:

> Thanks Kenn.
>
> On Wed, Jul 10, 2019 at 3:31 PM Kenneth Knowles  wrote:
>
>> Just noticed this thread. Infra turned off one of the GitHub plugins -
>> the one we use. I forwarded the announcement. I'll see if we can get it
>> back on for a bit so we can migrate off. I'm not sure if they have
>> identical job DSL or not.
>>
>> On Wed, Jul 10, 2019 at 12:32 PM Udi Meiri  wrote:
>>
>>> Still happening for me too.
>>>
>>> On Wed, Jul 10, 2019 at 10:40 AM Lukasz Cwik  wrote:
>>>
>>>> This has happened in the past. Usually there is some issue where
>>>> Jenkins isn't notified of new PRs by Github or doesn't see the PR phrases
>>>> and hence Jenkins sits around idle. This is usually fixed after a few hours
>>>> without any action on our part.
>>>>
>>>> On Wed, Jul 10, 2019 at 10:28 AM Katarzyna Kucharczyk <
>>>> ka.kucharc...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Hope it's not duplicate but I can't find if any issue with phrase
>>>>> triggering in Jenkins was already here.
>>>>> Currently, I started third PR and no test were triggered there. I
>>>>> tried to trigger some tests manually, but with no effect.
>>>>>
>>>>> Am I missing something?
>>>>>
>>>>> Here are links to my problematic PRs:
>>>>> https://github.com/apache/beam/pull/9033
>>>>> https://github.com/apache/beam/pull/9034
>>>>> https://github.com/apache/beam/pull/9035
>>>>>
>>>>> Thanks,
>>>>> Kasia
>>>>>
>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Phrase triggering jobs problem

2019-07-11 Thread Udi Meiri
Opened https://issues.apache.org/jira/browse/BEAM-7725 for migration off
the old plugin onto the new (already deprecated I might add) plugin.
Any takers?

On Thu, Jul 11, 2019 at 10:53 AM Udi Meiri  wrote:

> Okay, phrase triggering is working again (they re-enabled the plugin). See
> notes in bug for details.
>
> On Thu, Jul 11, 2019 at 10:04 AM Udi Meiri  wrote:
>
>> I've opened a bug: https://issues.apache.org/jira/browse/BEAM-7723
>> If anyone is working on this please assign yourself
>>
>> On Wed, Jul 10, 2019 at 5:57 PM Udi Meiri  wrote:
>>
>>> Thanks Kenn.
>>>
>>> On Wed, Jul 10, 2019 at 3:31 PM Kenneth Knowles  wrote:
>>>
>>>> Just noticed this thread. Infra turned off one of the GitHub plugins -
>>>> the one we use. I forwarded the announcement. I'll see if we can get it
>>>> back on for a bit so we can migrate off. I'm not sure if they have
>>>> identical job DSL or not.
>>>>
>>>> On Wed, Jul 10, 2019 at 12:32 PM Udi Meiri  wrote:
>>>>
>>>>> Still happening for me too.
>>>>>
>>>>> On Wed, Jul 10, 2019 at 10:40 AM Lukasz Cwik  wrote:
>>>>>
>>>>>> This has happened in the past. Usually there is some issue where
>>>>>> Jenkins isn't notified of new PRs by Github or doesn't see the PR phrases
>>>>>> and hence Jenkins sits around idle. This is usually fixed after a few 
>>>>>> hours
>>>>>> without any action on our part.
>>>>>>
>>>>>> On Wed, Jul 10, 2019 at 10:28 AM Katarzyna Kucharczyk <
>>>>>> ka.kucharc...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Hope it's not duplicate but I can't find if any issue with phrase
>>>>>>> triggering in Jenkins was already here.
>>>>>>> Currently, I started third PR and no test were triggered there. I
>>>>>>> tried to trigger some tests manually, but with no effect.
>>>>>>>
>>>>>>> Am I missing something?
>>>>>>>
>>>>>>> Here are links to my problematic PRs:
>>>>>>> https://github.com/apache/beam/pull/9033
>>>>>>> https://github.com/apache/beam/pull/9034
>>>>>>> https://github.com/apache/beam/pull/9035
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kasia
>>>>>>>
>>>>>>


smime.p7s
Description: S/MIME Cryptographic Signature


python precommits failing at head

2019-07-11 Thread Udi Meiri
This is due to
https://github.com/apache/beam/pull/8969
and
https://github.com/apache/beam/pull/8934
being merged today.

Fix is here: https://github.com/apache/beam/pull/9044


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Vendored dependencies release process

2019-07-08 Thread Udi Meiri
I left some comments. Being new to the Beam releasing process, my question
might be trivial to someone actually performing the release.

On Tue, Jul 2, 2019 at 4:49 PM Lukasz Cwik  wrote:

> Please vote based on the vendored dependencies release process as
> discussed[1] and documented[2].
>
> Please vote as follows:
> +1: Adopt the vendored dependency release process
> -1: The vendored release process needs to change because ...
>
> Since many people in the US may be out due to the holiday schedule, I'll
> try to close the vote and tally the results on July 9th so please vote
> before then.
>
> 1:
> https://lists.apache.org/thread.html/e2c49a5efaee2ad416b083fbf3b9b6db60fdb04750208bfc34cecaf0@%3Cdev.beam.apache.org%3E
> 2: https://s.apache.org/beam-release-vendored-artifacts
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Dataflow IT failures on Python being investigated internally

2019-07-03 Thread Udi Meiri
It seems that at least some of the failures to start pipelines on DF were
due to a CMEK misconfiguration.

On Tue, Jul 2, 2019 at 6:45 PM Udi Meiri  wrote:

> The failures are of the type where the pipeline fails very quickly (10
> seconds) and there's a "Pipeline execution failed" or "Workflow failed"
> error.
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Dataflow IT failures on Python being investigated internally

2019-07-03 Thread Udi Meiri
https://issues.apache.org/jira/browse/BEAM-7687

On Wed, Jul 3, 2019 at 11:57 AM Udi Meiri  wrote:

> It seems that at least some of the failures to start pipelines on DF were
> due to a CMEK misconfiguration.
>
> On Tue, Jul 2, 2019 at 6:45 PM Udi Meiri  wrote:
>
>> The failures are of the type where the pipeline fails very quickly (10
>> seconds) and there's a "Pipeline execution failed" or "Workflow failed"
>> error.
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Dataflow IT failures on Python being investigated internally

2019-07-02 Thread Udi Meiri
The failures are of the type where the pipeline fails very quickly (10
seconds) and there's a "Pipeline execution failed" or "Workflow failed"
error.


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Stop using Perfkit Benchmarker tool in all tests?

2019-07-08 Thread Udi Meiri
The Python 3 incompatibility is reason enough to move off of Perfkit. (+1)

On Mon, Jul 8, 2019 at 9:49 AM Mark Liu  wrote:

> Thanks for summarizing this discussion and post in dev list. I was closely
> working on Python performance tests and those Perfkit problems are really
> painful. So +1 to remove Perfkit and also remove those tests that are no
> longer maintained.
>
> For #2 (Python performance tests), there are no special setup for them.
> The only missing part I can see is metrics collection and data upload to a
> shared storage (e.g. BigQuery), which is provided free in Perfkit
> framework. This seems common to all language, so wondering if a shared
> infra is possible.
>
> Mark
>
> On Wed, Jul 3, 2019 at 9:36 AM Lukasz Cwik  wrote:
>
>> Makes sense to me to move forward with your suggestion.
>>
>> On Wed, Jul 3, 2019 at 3:57 AM Łukasz Gajowy 
>> wrote:
>>
>>> Are there features in Perfkit that we would like to be using that we
 aren't?

>>>
>>> Besides the Kubernetes related code I mentioned above (that, I believe,
>>> can be easily replaced) I don't see any added value in having Perfkit. The
>>> Kubernetes parts could be replaced with a set of fine-grained Gradle tasks
>>> invoked by other high-level tasks and Jenkins job's steps. There also seem
>>> to be some Gradle + Kubernetes plugins out there that might prove useful
>>> here (no solid research in that area).
>>>
>>>
 Can we make the integration with Perfkit less brittle?

>>>
>>> There was an idea to move all beam benchmark's code from Perfkit (
>>> beam_benchmark_helper.py
>>> 
>>> , beam_integration_benchmark.py
>>> )
>>> to beam repository and inject it to Perfkit every time we use it. However,
>>> that would require investing time and effort in doing that and it will
>>> still not solve the problems I listed above. It will also still require
>>> knowledge of how Perfkit works from Beam developers while we can avoid that
>>> and use the existing tools (gradle, jenkins).
>>>
>>> Thanks!
>>>
>>> pt., 28 cze 2019 o 17:31 Lukasz Cwik  napisał(a):
>>>
 +1 for removing tests that are not maintained.

 Are there features in Perfkit that we would like to be using that we
 aren't?
 Can we make the integration with Perfkit less brittle?

 If we aren't getting much and don't plan to get much value in the short
 term, removal makes sense to me.

 On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy 
 wrote:

> Hi all,
>
> moving the discussion to the dev list:
> https://github.com/apache/beam/pull/8919. I think that Perfkit
> Benchmarker should be removed from all our tests.
>
> Problems that we face currently:
>
>1. Changes to Gradle tasks/build configuration in the Beam
>codebase have to be reflected in Perfkit code. This required PRs to 
> Perfkit
>which can last and the tests break due to this sometimes (no change in
>Perfkit + change already there in beam = incompatibility). This is what
>happened in PR 8919 (above),
>2. Can't run in Python3 (depends on python 2 only library like
>functools32),
>3. Black box testing which hard to collect pipeline related
>metrics,
>4. Measurement of run time is inaccurate,
>5. It offers relatively small elasticity in comparison with eg.
>Jenkins tasks in terms of setting up the testing infrastructure 
> (runners,
>databases). For example, if we'd like to setup Flink runner, and reuse 
> it
>in consequent tests in one go, that would be impossible. We can easily 
> do
>this in Jenkins.
>
> Tests that use Perfkit:
>
>1.  IO integration tests,
>2.  Python performance tests,
>3.  beam_PerformanceTests_Dataflow (disabled),
>4.  beam_PerformanceTests_Spark (failing constantly - looks not
>maintained).
>
> From the IOIT perspective (1), only the code that setups/tears down
> Kubernetes resources is useful right now but these parts can be easily
> implemented in Jenkins/Gradle code. That would make Perfkit obsolete in
> IOIT because we already collect metrics using Metrics API and store them 
> in
> BigQuery directly.
>
> As for point 2: I have no knowledge of how complex the task would be
> (help needed).
>
> Regarding 3, 4: Those tests seem to be not maintained - should we
> remove them?
>
> Opinions?
>
> Thank you,
> Łukasz
>
>
>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-08 Thread Udi Meiri
I think there are already some guidelines here:
https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives
(maybe
we could point to them from the PR template?)
Yes, it is acceptable to ask for squash or if it's ok to squash to a single
commit.

On Mon, Jul 8, 2019 at 11:14 AM Valentyn Tymofieiev 
wrote:

> I have observed a pattern where authors force-push their changes during
> every review iteration, so that a pull request always contains one commit.
> This creates the following problems:
>
> 1. It is hard to see what has changed between review iterations.
> 2. Sometimes authors  make changes in parts of pull requests that the
> reviewer did not comment on, and such changes may be unnoticed by the
> reviewer.
> 3. After a force-push, comments made by reviewers on earlier commit are
> hard to find.
>
> A better workflow may be to:
> 1. Between review iterations authors push changes in new commit(s), but
> also keep the original commit.
> 2. If a follow-up commit does not constitute a meaningful change of its
> own, it should be prefixed with "fixup: ".
> 3. Once review has finished either:
> - Authors squash fixup commits after all reviewers have approved the PR
> per request of a reviewer.
> - Committers squash fixup commits during merge.
>
> I am curious what thoughts or suggestions others have. In particular:
> 1. Should we document guidelines for iterating on PRs in our contributor
> guide?
> 2. Is it acceptable for a reviewer to ask the author to rebase squashed
> changes that were force-pushed to address review feedback onto their
> original commits to simplify the rest of the review?
>
> Thanks.
>
> Related discussion:
> [1] Committer Guidelines / Hygene before merging PRs
> https://lists.apache.org/thread.html/6d922820d6fc352479f88e5c8737f2c8893ddb706a1e578b50d28948@%3Cdev.beam.apache.org%3E
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUNCE] New committer announcement: Yifan Zou

2019-04-22 Thread Udi Meiri
Congrats Yifan!

On Mon, Apr 22, 2019 at 11:04 AM Valentyn Tymofieiev 
wrote:

> Congratulations, Yifan! Thanks a lot for your continued contributions to
> Beam.
>
> On Mon, Apr 22, 2019 at 10:24 AM Robin Qiu  wrote:
>
>> Congratulations Yifan!
>>
>> On Mon, Apr 22, 2019 at 10:17 AM Chamikara Jayalath 
>> wrote:
>>
>>> Congrats Yifan!
>>>
>>> On Mon, Apr 22, 2019 at 10:02 AM Maximilian Michels 
>>> wrote:
>>>
 Congrats! Great work.

 -Max

 On 22.04.19 19:00, Rui Wang wrote:
 > Congratulations! Thanks for your contribution!!
 >
 > -Rui
 >
 > On Mon, Apr 22, 2019 at 9:57 AM Ruoyun Huang >>> > > wrote:
 >
 > Congratulations, Yifan!
 >
 > On Mon, Apr 22, 2019 at 9:48 AM Boyuan Zhang >>> > > wrote:
 >
 > Congratulations, Yifan~
 >
 > On Mon, Apr 22, 2019 at 9:29 AM Connell O'Callaghan
 > mailto:conne...@google.com>> wrote:
 >
 > Well done Yifan!!!
 >
 > Thank you for sharing Kenn!!!
 >
 > On Mon, Apr 22, 2019 at 9:00 AM Ahmet Altay
 > mailto:al...@google.com>> wrote:
 >
 > Congratulations, Yifan!
 >
 > On Mon, Apr 22, 2019 at 8:46 AM Tim Robertson
 > >>> > > wrote:
 >
 > Congratulations Yifan!
 >
 > On Mon, Apr 22, 2019 at 5:39 PM Cyrus Maden
 > mailto:cma...@google.com>>
 wrote:
 >
 > Congratulations Yifan!!
 >
 > On Mon, Apr 22, 2019 at 11:26 AM Kenneth
 Knowles
 > mailto:k...@apache.org>>
 wrote:
 >
 > Hi all,
 >
 > Please join me and the rest of the
 Beam PMC
 > in welcoming a new committer: Yifan Zou.
 >
 > Yifan has been contributing to Beam since
 > early 2018. He has proposed 70+ pull
 > requests, adding dependency checking and
 > improving test infrastructure. But
 something
 > the numbers cannot show adequately is the
 > huge effort Yifan has put into working
 with
 > infra and keeping our Jenkins executors
 healthy.
 >
 > In consideration of Yian's contributions,
 > the Beam PMC trusts Yifan with the
 > responsibilities of a Beamcommitter[1].
 >
 > Thank you, Yifan, for your contributions.
 >
 > Kenn
 >
 > [1]
 >
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
 > <
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
 >
 >
 >
 >
 > --
 > 
 > Ruoyun  Huang
 >

>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Udi Meiri
Congrats Robert B.!

On Tue, Jul 16, 2019 at 10:23 AM Ahmet Altay  wrote:

> Hi,
>
> Please join me and the rest of the Beam PMC in welcoming a new committer: 
> Robert
> Burke.
>
> Robert has been contributing to Beam and actively involved in the
> community for over a year. He has been actively working on Go SDK, helping
> users, and making it easier for others to contribute [1].
>
> In consideration of Robert's contributions, the Beam PMC trusts him with
> the responsibilities of a Beam committer [2].
>
> Thank you, Robert, for your contributions and looking forward to many more!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
> https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-
> committer
>


smime.p7s
Description: S/MIME Cryptographic Signature


[PSA] Python: don't forget to close() your gRPC channels

2019-08-15 Thread Udi Meiri
https://github.com/grpc/grpc/issues/12531
I imagine this mostly affects GCP connectors, but perhaps grpcio is used
elsewhere? (I haven't checked)

Background is that I'm updating the minimum grpcio version to 1.12.1 (
https://github.com/grpc/grpc/releases/tag/v1.12.0).


smime.p7s
Description: S/MIME Cryptographic Signature


Python question about save_main_session

2019-08-23 Thread Udi Meiri
Hi,
I'm trying to get pytest with the xdist plugin to run Beam tests. The issue
is with save_main_session and a dependency of pytest-xdist called execnet,
which triggers this error:

*apache_beam/examples/complete/tfidf.py*:212: in run*output |
'write' >> WriteToText(known_args.output)**apache_beam/pipeline.py*:426:
in __exit__*
self.run().wait_until_finish()**apache_beam/pipeline.py*:406: in run*
  self._options).run(False)**apache_beam/pipeline.py*:416: in run*
pickler.dump_session(os.path.join(tmpdir,
'main_session.pickle'))**apache_beam/internal/pickler.py*:282: in
dump_session*
dill.load_session(file_path)**../../../../virtualenvs/beam-py35/lib/python3.5/site-packages/dill/_dill.py*:410:
in load_session*module =
unpickler.load()**../../../../virtualenvs/beam-py35/lib/python3.5/site-packages/execnet/gateway_base.py*:130:
in __getattr__*locs =
self._importdef.get(name)**../../../../virtualenvs/beam-py35/lib/python3.5/site-packages/execnet/gateway_base.py*:130:
in __getattr__*locs =
self._importdef.get(name)**../../../../virtualenvs/beam-py35/lib/python3.5/site-packages/execnet/gateway_base.py*:130:
in __getattr__*locs = self._importdef.get(name)**E
RecursionError: maximum recursion depth exceeded*
!!! Recursion detected (same locals & position)


Does anyone on this list have experience with these kinds of errors? Any
workarounds I can use? (can we handle this module specially / can we
exclude it from main session?)


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Python question about save_main_session

2019-08-23 Thread Udi Meiri
Yeah that sounds like the sanest way forward.
Lots of these tests are running standalone examples so they might need SMS
if you run them directly.
(example streaming_wordcount_it_test.py running streaming_wordcount.py)

On Fri, Aug 23, 2019 at 3:17 PM Robert Bradshaw  wrote:

> I suggest re-writing the test to avoid save_main_session.
>
> On Fri, Aug 23, 2019 at 11:57 AM Udi Meiri  wrote:
>
>> Hi,
>> I'm trying to get pytest with the xdist plugin to run Beam tests. The
>> issue is with save_main_session and a dependency of pytest-xdist called
>> execnet, which triggers this error:
>>
>> *apache_beam/examples/complete/tfidf.py*:212: in run*output | 'write' >> 
>> WriteToText(known_args.output)**apache_beam/pipeline.py*:426: in __exit__*   
>>  self.run().wait_until_finish()**apache_beam/pipeline.py*:406: in run*
>> self._options).run(False)**apache_beam/pipeline.py*:416: in run*
>> pickler.dump_session(os.path.join(tmpdir, 
>> 'main_session.pickle'))**apache_beam/internal/pickler.py*:282: in 
>> dump_session*
>> dill.load_session(file_path)**../../../../virtualenvs/beam-py35/lib/python3.5/site-packages/dill/_dill.py*:410:
>>  in load_session*module = 
>> unpickler.load()**../../../../virtualenvs/beam-py35/lib/python3.5/site-packages/execnet/gateway_base.py*:130:
>>  in __getattr__*locs = 
>> self._importdef.get(name)**../../../../virtualenvs/beam-py35/lib/python3.5/site-packages/execnet/gateway_base.py*:130:
>>  in __getattr__*locs = 
>> self._importdef.get(name)**../../../../virtualenvs/beam-py35/lib/python3.5/site-packages/execnet/gateway_base.py*:130:
>>  in __getattr__*locs = self._importdef.get(name)**E   RecursionError: 
>> maximum recursion depth exceeded*
>> !!! Recursion detected (same locals & position)
>>
>>
>> Does anyone on this list have experience with these kinds of errors? Any
>> workarounds I can use? (can we handle this module specially / can we
>> exclude it from main session?)
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUNCE] New committer: Valentyn Tymofieiev

2019-08-27 Thread Udi Meiri
Congrats!

On Tue, Aug 27, 2019 at 9:50 AM Yichi Zhang  wrote:

> Congrats Valentyn!
>
> On Tue, Aug 27, 2019 at 7:55 AM Valentyn Tymofieiev 
> wrote:
>
>> Thank you everyone!
>>
>> On Tue, Aug 27, 2019 at 2:57 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Congrats, well deserved!
>>>
>>> On 27 Aug 2019, at 11:25, Jan Lukavský  wrote:
>>>
>>> Congrats Valentyn!
>>> On 8/26/19 11:43 PM, Rui Wang wrote:
>>>
>>> Congratulations!
>>>
>>>
>>> -Rui
>>>
>>> On Mon, Aug 26, 2019 at 2:36 PM Hannah Jiang 
>>> wrote:
>>>
 Congratulations Valentyn, well deserved!

 On Mon, Aug 26, 2019 at 2:34 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Congrats Valentyn!
>
> On Mon, Aug 26, 2019 at 2:32 PM Pablo Estrada 
> wrote:
>
>> Thanks Valentyn!
>>
>> On Mon, Aug 26, 2019 at 2:29 PM Robin Qiu  wrote:
>>
>>> Thank you Valentyn! Congratulations!
>>>
>>> On Mon, Aug 26, 2019 at 2:28 PM Robert Bradshaw 
>>> wrote:
>>>
 Hi,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Valentyn Tymofieiev

 Valentyn has made numerous contributions to Beam over the last
 several
 years (including 100+ pull requests), most recently pushing through
 the effort to make Beam compatible with Python 3. He is also an
 active
 participant in design discussions on the list, participates in
 release
 candidate validation, and proactively helps keep our tests green.

 In consideration of Valentyn's contributions, the Beam PMC trusts
 him
 with the responsibilities of a Beam committer [1].

 Thank you, Valentyn, for your contributions and looking forward to
 many more!

 Robert, on behalf of the Apache Beam PMC

 [1]
 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer

>>>
>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [discuss] How we support our users on Slack / Mailing list / StackOverflow

2019-09-06 Thread Udi Meiri
I don't go on Slack, but I will be notified of mentions. It has the
advantage of being an informal space.
SO can feel just as intimidating as the mailing list IMO. Unlike the
others, it doesn't lend itself very well to discussions (you can only post
comments or answers).



On Fri, Sep 6, 2019 at 10:55 AM Pablo Estrada  wrote:

> Hello all,
>
> THE SITUATION:
> It was brought to my attention recently that Python users in Slack are not
> getting much support, because most of the Beam Python-knowledgeable people
> are not on Slack. Unfortunately, in the Beam site, we do refer people to
> Slack for assistance[1].
>
> Java users do receive reasonable support, because there are enough Beam
> Java-knowledgeable people online, and willing to answer.
>
> On the other hand, at Google we do have a number of people who are
> responsible to answer questions on StackOverflow[2], and we do our best to
> answer promptly. I think we do a reasonable job overall.
>
> SO LET'S DISCUSS:
> How should we advise the community to ask questions about Beam?
> - Perhaps we should encourage people to try the mailing list first
> - Perhaps we should encourage people to try StackOverflow first
> - Perhaps we should write a bot that encourages Python users to go to
> StackOverflow
> - something else?
>
> My personal opinion is that a mailing list is not great: It's
> intimidating, it does not provide great indexing or searchability.
>
> WHAT I PROPOSE:
>
> I think explicitly encouraging everyone to go to StackOverflow first will
> be the best alternative: It's indexed, searchable, less intimidating than
> the mailing list. We can add that they can try Slack as well - without any
> guarantees.
>
> What do others think?
> -P.
>
> [1] https://beam.apache.org/community/contact-us/
> [2] https://stackoverflow.com/questions/tagged/apache-beam?tab=Newest
>


smime.p7s
Description: S/MIME Cryptographic Signature


Python postcommits broken: crossLanguagePythonJavaFlink

2019-08-07 Thread Udi Meiri
I opened a bug here: https://issues.apache.org/jira/browse/BEAM-7924
but I don't know who's the best person to take a look.
Could someone assign this please?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Update] Beam 2.15 Release Progress

2019-08-07 Thread Udi Meiri
https://github.com/apache/beam/pull/9240 has been merged

On Wed, Aug 7, 2019 at 12:33 PM Anton Kedin  wrote:

> Perf regression is seemingly gone now. If this is caused by a PR we might
> want to find out which one and cherry-pick it into the release.
>
> Regards,
> Anton
>
> On Tue, Aug 6, 2019 at 4:52 PM Yifan Zou  wrote:
>
>> Hi,
>>
>> There is a perf regression on SQL Query3 on dataflow runner. This was
>> treated as a release blocker. We would appreciate if someone could look
>> into this issue?
>>
>> For more details, please see Anton's email [1] and JIRA [2].
>> [1]
>> https://lists.apache.org/thread.html/5441431cb2cf8fb445a2e30e6b2a8feb199d189755cf12b0c86fb1c8@%3Cdev.beam.apache.org%3E
>> [2] https://issues.apache.org/jira/browse/BEAM-7906
>>
>> Regards.
>> Yifan
>>
>>
>> On Mon, Aug 5, 2019 at 10:35 AM Yifan Zou  wrote:
>>
>>> Hi,
>>>
>>> I've verified release branch, and all Pre/Post-commits passed. The next
>>> step would be verifying the javadoc.
>>> We still have a few blocking issues,
>>> https://issues.apache.org/jira/browse/BEAM-7880?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20%22Triage%20Needed%22)%20AND%20fixVersion%20%3D%202.15.0
>>> .
>>> Please ping me once the ticket got fixed, or update them to the next
>>> version to unblock the release. Thanks.
>>>
>>> Yifan
>>>
>>> On Wed, Jul 31, 2019 at 4:33 PM Yifan Zou  wrote:
>>>
 Snapshots are published
 http://repository.apache.org/content/groups/snapshots/org/apache/beam/
 .

 On Wed, Jul 31, 2019 at 1:28 PM Yifan Zou  wrote:

> Hi,
>
> The release branch is cut
> https://github.com/apache/beam/tree/release-2.15.0.
> The next step would be building snapshots and verify release branch.
>
> Regards.
> Yifan
>



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ANNOUNCE] New committer: Kyle Weaver

2019-08-06 Thread Udi Meiri
Congrats Kyle!

On Tue, Aug 6, 2019 at 2:00 PM Melissa Pashniak 
wrote:

> Congratulations Kyle!
>
> On Tue, Aug 6, 2019 at 1:36 PM Yichi Zhang  wrote:
>
>> Congrats Kyle!
>>
>> On Tue, Aug 6, 2019 at 1:29 PM Aizhamal Nurmamat kyzy <
>> aizha...@google.com> wrote:
>>
>>> Thank you, Kyle! And congratulations :)
>>>
>>> On Tue, Aug 6, 2019 at 10:09 AM Hannah Jiang 
>>> wrote:
>>>
 Congrats Kyle!

 On Tue, Aug 6, 2019 at 9:52 AM David Morávek 
 wrote:

> Congratulations Kyle!!
>
> Sent from my iPhone
>
> On 6 Aug 2019, at 18:47, Anton Kedin  wrote:
>
> Congrats!
>
> On Tue, Aug 6, 2019, 9:37 AM Ankur Goenka  wrote:
>
>> Congratulations Kyle!
>>
>> On Tue, Aug 6, 2019 at 9:35 AM Ahmet Altay  wrote:
>>
>>> Hi,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a new
>>> committer: Kyle Weaver.
>>>
>>> Kyle has been contributing to Beam for a while now. And in that time
>>> period Kyle got the portable spark runner feature complete for batch
>>> processing. [1]
>>>
>>> In consideration of Kyle's contributions, the Beam PMC trusts him
>>> with the responsibilities of a Beam committer [2].
>>>
>>> Thank you, Kyle, for your contributions and looking forward to many
>>> more!
>>>
>>> Ahmet, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/c43678fc24c9a1dc9f48c51c51950aedcb9bc0fd3b633df16c3d595a@%3Cuser.beam.apache.org%3E
>>> [2] https://beam.apache.org/contribute/become-a-committer
>>> /#an-apache-beam-committer
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jira email notifications

2019-08-08 Thread Udi Meiri
Is your email set correctly?
You can see it if you hit the edit button for profile details.
[image: UaSgEcLBGeM.png]

On Wed, Aug 7, 2019 at 5:16 PM sridhar inuog  wrote:

> Yes, I am already on the "Watchers" list
>
> On Wed, Aug 7, 2019 at 7:13 PM Pablo Estrada  wrote:
>
>> Have you tried "watching" the particular JIRA issue? There's a "Watch"
>> thing on the right-hand side of an issue page.
>>
>> Happy to help more if that's not helpful : )
>> Best
>> -P.
>>
>> On Wed, Aug 7, 2019 at 5:09 PM sridhar inuog 
>> wrote:
>>
>>> Hi,
>>>Is there a way to get notifications whenever a jira issue is
>>> updated?  The only place I can see this can be enabled is
>>>
>>>  profile -> Preferences -> My Changes (Notify me)
>>>
>>> Even though the description seems a little bit misleading I don't see
>>> any other place to make any changes.
>>>
>>> ---
>>> Whether to email notifications of any changes you make.
>>> -
>>>
>>> Any other places I need to change to get notifications?
>>>
>>> Thanks,
>>> Sridhar
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Allowing firewalled/offline builds of Beam

2019-08-08 Thread Udi Meiri
You can download it here: https://gradle.org/releases/
and run it instead of using the wrapper.

Example:
$ cd
$ unzip Downloads/gradle-5.5.1-bin.zip
$ cd ~/src/beam
$ ~/gradle-5.5.1/bin/gradle lint


On Thu, Aug 8, 2019 at 10:52 AM Chad Dombrova  wrote:

> This topic came up in another thread, so I wanted to highlight a few
> things that we've discovered in our endeavors to build Beam behind a
> firewall.
>
> Conceptually, in order to allow this, a user needs to provide alternate
> mirrors for each "artifact" service required during build, and luckily I
> think most of the toolchains used by Beam support this. For example, the
> default PyPI mirror used by pip can be overridden via env var to an
> internal mirror, and likewise for docker and its registry service.  I'm
> currently looking into gogradle to see if we can provide an alternate
> vendor directory as a shared resource behind our firewall. (I have a bigger
> question here, which is why was it necessary to add a third language into
> the python Beam ecosystem, just for the bootstrap process?  Couldn't the
> boot code use python, or java?)
>
> But I'm getting ahead of myself.  We're actually stuck at the very
> beginning, with gradlew.  The gradlew wrapper seems to unconditionally
> download gradle, so you can't get past the first few hundred lines of code
> in the build process without requiring internet access.  I made a ticket
> here: https://issues.apache.org/jira/browse/BEAM-7931.  I'd love some
> pointers on how to fix this, because the offending code lives inside
> gradle-wrapper.jar, so I can't change it without access to the source.
>
> thanks,
> -chad
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


precommits failing on git clean:

2019-07-19 Thread Udi Meiri
https://issues.apache.org/jira/browse/BEAM-7788


smime.p7s
Description: S/MIME Cryptographic Signature


Re: precommits failing on git clean:

2019-07-19 Thread Udi Meiri
Is this a regression? Is it due to an in-progress PR? I can't figure out
where the module go.opencensus.io@v0.22.0 is included.

On Fri, Jul 19, 2019 at 11:59 AM Robert Burke  wrote:

> First time contributor Zach might have a solution in this PR, but it seems
> like it would need care since it's pretty broad.
>
> https://github.com/apache/beam/pull/9096
>
> On Fri, Jul 19, 2019, 11:53 AM Udi Meiri  wrote:
>
>> https://issues.apache.org/jira/browse/BEAM-7788
>>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Udi Meiri
Another idea would be to put default bucket preferences in a .beamrc file
so you don't have to remember to pass it every time (this could also
contain other default flag values).



On Tue, Jul 23, 2019 at 1:43 PM Robert Bradshaw  wrote:

> On Tue, Jul 23, 2019 at 10:26 PM Chamikara Jayalath
>  wrote:
> >
> > On Tue, Jul 23, 2019 at 1:10 PM Kyle Weaver  wrote:
> >>
> >> I agree with David that at least clearer log statements should be added.
> >>
> >> Udi, that's an interesting idea, but I imagine the sheer number of
> existing flags (including many SDK-specific flags) would make it difficult
> to implement. In addition, uniform argument names wouldn't necessarily
> ensure uniform implementation.
> >>
> >> Kyle Weaver | Software Engineer | github.com/ibzib |
> kcwea...@google.com
> >>
> >>
> >> On Tue, Jul 23, 2019 at 11:56 AM Udi Meiri  wrote:
> >>>
> >>> Java SDK creates one regional bucket per project and region
> combination.
> >>> So it's not a lot of buckets - no need to auto-clean.
> >
> >
> > Agree that cleanup is not a bit issue if we are only creating a single
> bucket per project and region. I assume we are creating temporary folders
> for each pipeline with the same region and project so that they don't
> conclifc (which we clean up).
> > As others mentioned we should clearly document this (including the
> naming of the bucket) and produce a log during pipeline creating.
> >
> >>>
> >>>
> >>> I agree with Robert that having less flags is better.
> >>> Perhaps what we need a unifying interface for SDKs that simplifies
> launching?
> >>>
> >>> So instead of:
> >>> mvn compile exec:java -Dexec.mainClass=
> -Dexec.args="--runner=DataflowRunner --project=
> --gcpTempLocation=gs:///tmp " -Pdataflow-runner
> >>> or
> >>> python -m  --runner DataflowRunner --project 
> --temp_location gs:///tmp/ 
> >
> > Interesting, probably this should be extended to a generalized CLI for
> Beam that can be easily installed to execute Beam pipelines ?
>
> This is starting to get somewhat off-topic from the original question,
> but I'm not sure the benefits of providing a wrapper to the end user
> would outweigh the costs of having to learn the wrapper. For Python
> developers, python -m module, or even python -m path/to/script.py is
> pretty standard. Java is a bit harder, because one needs to coordinate
> a build as well, but I don't know how a "./beam java ..." script would
> gloss over whether one is using maven, gradle, ant, or just has a pile
> of pre-compiled jara (and would probably have to know a bit about the
> project layout as well to invoke the right commands).
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: On Auto-creating GCS buckets on behalf of users

2019-07-23 Thread Udi Meiri
Java SDK creates one regional bucket per project and region combination

.
So it's not a lot of buckets - no need to auto-clean.

I agree with Robert that having less flags is better.
Perhaps what we need a unifying interface for SDKs that simplifies
launching?

So instead of:
mvn compile exec:java -Dexec.mainClass=
-Dexec.args="--runner=DataflowRunner --project=
--gcpTempLocation=gs:///tmp " -Pdataflow-runner
or
python -m  --runner DataflowRunner --project
 --temp_location gs:///tmp/ 

We could have:
./beam java run  --runner=DataflowRunner 
./beam python run  --runner=DataflowRunner 

where GCP project and temp_location are optional.

On Tue, Jul 23, 2019 at 10:31 AM David Cavazos  wrote:

> I would go for #1 since it's a better user experience. Especially for new
> users who don't understand every step involved on staging/deploying. It's
> just another (unnecessary) mental concept they don't have to be aware of.
> Anything that makes it closer to only providing the `--runner` flag without
> any additional flags (by default, but configurable if necessary) is a good
> thing in my opinion.
>
> AutoML already auto-creates a GCS bucket (not configurable, with a global
> name which has its own downfalls). Other products are already doing this to
> simplify user experience. I think as long as there's an explicit logging
> statement it should be fine.
>
> If the bucket was not specified and was created: "No --temp_location
> specified, created gs://..."
>
> If the bucket was not specified and was found: "No --temp_location
> specified, found gs://..."
>
> If the bucket was specified, the logging could be omitted since it's
> already explicit from the command line arguments.
>
> On Tue, Jul 23, 2019 at 10:25 AM Chamikara Jayalath 
> wrote:
>
>> Do we clean up auto created GCS buckets ?
>>
>> If there's no good way to cleanup, I think it might be better to make
>> this opt-in.
>>
>> Thanks,
>> Cham
>>
>> On Tue, Jul 23, 2019 at 3:25 AM Robert Bradshaw 
>> wrote:
>>
>>> I think having a single, default, auto-created temporary bucket per
>>> project for use in GCP (when running on Dataflow, or running elsewhere
>>> but using GCS such as for this BQ load files example), though not
>>> ideal, is the best user experience. If we don't want to be
>>> automatically creating such things for users by default, another
>>> option would be a single flag that opts-in to such auto-creation
>>> (which could include other resources in the future).
>>>
>>> On Tue, Jul 23, 2019 at 1:08 AM Pablo Estrada 
>>> wrote:
>>> >
>>> > Hello all,
>>> > I recently worked on a transform to load data into BigQuery by writing
>>> files to GCS, and issuing Load File jobs to BQ. I did this for the Python
>>> SDK[1].
>>> >
>>> > This option requires the user to provide a GCS bucket to write the
>>> files:
>>> >
>>> > If the user provides a bucket to the transform, the SDK will use that
>>> bucket.
>>> > If the user does not provide a bucket:
>>> >
>>> > When running in Dataflow, the SDK will borrow the temp_location of the
>>> pipeline.
>>> > When running in other runners, the pipeline will fail.
>>> >
>>> > The Java SDK has had functionality for File Loads into BQ for a long
>>> time; and particularly, when users do not provide a bucket, it attempts to
>>> create a default bucket[2]; and this bucket is used as temp_location (which
>>> then is used by the BQ File Loads transform).
>>> >
>>> > I do not really like creating GCS buckets on behalf of users. In Java,
>>> the outcome is that users will not have to pass a --tempLocation parameter
>>> when submitting jobs to Dataflow - which is a nice convenience, but I'm not
>>> sure that this is in-line with users' expectations.
>>> >
>>> > Currently, the options are:
>>> >
>>> > Adding support for bucket autocreation for Python SDK
>>> > Deprecating support for bucket autocreation in Java SDK, and printing
>>> a warning.
>>> >
>>> > I am personally inclined for #1. But what do others think?
>>> >
>>> > Best
>>> > -P.
>>> >
>>> > [1] https://github.com/apache/beam/pull/7892
>>> > [2]
>>> https://github.com/apache/beam/blob/5b3807be717277e3e6880a760b036fecec3bc95d/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L294-L343
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


[REQUEST] Python Tests (pre/post-commits) Status

2019-07-15 Thread Udi Meiri
Hi,
I've been trying to merge several Python PRs in the past weeks, but Jenkins
pre- and post-commit jobs have been red all this time due to various
reasons. I have a proposal in mind to help deal with this, but it can't
happen without cooperation from a majority of committers.

The request from Beam committers is to:
- Avoid merging when tests are red (the images in the PR template);
- Use common sense for exceptions to the above (such as if the PR has
nothing to do with a failing test);
- Act to make tests green (open a JIRA issue, find the PR with the
regression, rollback as necessary).

This stuff has already been discussed on this list and documented here:
https://beam.apache.org/contribute/postcommits-policies/
The idea is that if everyone pitches in it will spread the load.

Thanks for reading.


smime.p7s
Description: S/MIME Cryptographic Signature


<    1   2   3   4   >