Re: Community Examples Repository

2018-08-01 Thread Charles Chen
I would also prefer that examples be linked to releases so that we can
build and test them during development; i.e. if your commit breaks
wordcount, we want to know right away so we can revert.  Perhaps we can
keep these in the repo but more clearly modularize the artifacts we release?

For the Python SDK, if we separate this out in any way, there is the
separate issue of dealing with namespace packages (which are kind of broken
and poorly supported:
https://github.com/pypa/python-packaging-user-guide/issues/265), if we want
to keep the examples under the apache_beam.examples module path.  See also
https://packaging.python.org/guides/packaging-namespace-packages/.

On Wed, Aug 1, 2018 at 9:29 PM j...@nanthrax.net  wrote:

> Hi,
>
> I don't have problem to move the examples in a dedicated repository.
> However, IMHO, we have to:
>
> 1. Keep a build of examples linked to latest core release/SNAPSHOT
> 2. Include the examples in the distribution (convenient for the users)
>
> On another topic, I think it would be better to avoid usage of Google Doc
> for such kind of discussion and directly share on the mailing list (at
> least a summary/light details).
>
> Regards
> JB
>
> On Thursday, August 02, 2018 00:12 CEST, David Cavazos <
> dcava...@google.com> wrote:
>
>
> Hi everyone!
>
> We wanted to migrate the examples from the core repository to a new Beam
> community examples repository. As the number of examples grow, it makes
> sense to modularize and decouple the core functionality from the examples.
>
> We will also create some guidelines with the best practices for new
> examples to be submitted.
>
> For more details, feel free to take a look and comment on the proposal
> 
> .
>
> Cheers,
> David
>
>
>
>
>


Re: Community Examples Repository

2018-08-01 Thread jb

Hi,

I don't have problem to move the examples in a dedicated repository. However, 
IMHO, we have to:

1. Keep a build of examples linked to latest core release/SNAPSHOT
2. Include the examples in the distribution (convenient for the users)

On another topic, I think it would be better to avoid usage of Google Doc for 
such kind of discussion and directly share on the mailing list (at least a 
summary/light details).

Regards
JB

On Thursday, August 02, 2018 00:12 CEST, David Cavazos  
wrote:
 Hi everyone! We wanted to migrate the examples from the core repository to a 
new Beam community examples repository. As the number of examples grow, it 
makes sense to modularize and decouple the core functionality from the 
examples. We will also create some guidelines with the best practices for new 
examples to be submitted. For more details, feel free to take a look and 
comment on the proposal. Cheers,David


 


Re: Cleanup resources on pipeline cancelation

2018-08-01 Thread Reuven Lax
Hi Romain,

Andrew's example actually wouldn't work for that. With Google Cloud Pub/Sub
(the example source he referenced), if there is no subscription to a topic,
all publishes to that topic are dropped on the floor; if you don't want to
lose data, your are expected to keep the subscription around continuously.
In this example, leaking a subscription is probably preferable to losing
date (especially since Pub/Sub itself garbage collects subscriptions that
have been inactive for a long time).

The answer might be that Beam does not have a good lifecycle story here,
and something needs to be built.

Reuven

On Tue, Jul 31, 2018 at 10:04 PM Romain Manni-Bucau 
wrote:

> Hi Andrew,
>
> IIRC sources should clean up their resources per method since they dont
> have a better lifecycle. Readers can create anything longer and release it
> at close time.
>
>
> Le mer. 1 août 2018 00:31, Andrew Pilloud  a écrit :
>
>> Some of our IOs create external resources that need to be cleaned up when
>> a pipeline is terminated. It looks like the
>> org.apache.beam.sdk.io.UnboundedSource interface is called on creation, but
>> there is no call for cleanup. For example, PubsubIO creates a Pubsub
>> subcription in createReader()/split() and it should be deleted at shutdown.
>> Does anyone have ideas on how I might make this happen?
>>
>> (I filed https://issues.apache.org/jira/browse/BEAM-5051 tracking the
>> PubSub specific issue.)
>>
>> Andrew
>>
>


Re: Community Examples Repository

2018-08-01 Thread Davor Bonaci
>
> it makes sense to modularize


It certainly does, but somebody just had another proposal to move the
website into the main repository ;-). That proposal was also good for
~everyone. Fun times...

(I have my opinions, of course, but I'm fine with any approach.)

On Wed, Aug 1, 2018 at 4:37 PM, Ahmet Altay  wrote:

> Thank you for this initiative.
>
> How about keeping a set of core examples in the main repository as a way
> of 1) convenient testing at a PR level 2) Testing with end to end tests
> against Beam head rather than a released Beam version 3) I think there is
> some educational value in having wordcount as a simple example living along
> with the code.
>
> For anything else examples repository would be a great idea.
>
> For testing, I would also like to understand how could we test examples
> against both released versions of Beam and the code currently being
> developed in master.
>
> Ahmet
>
> On Wed, Aug 1, 2018 at 3:36 PM, Jesse Anderson 
> wrote:
>
>> The examples have to be separate from the main beam repository. This way,
>> they serve as an example of how to use them in your code instead of how to
>> do it as part of Beam. It would also you to show the dependencies in sbt or
>> Maven.
>>
>>
>> On Wed, Aug 1, 2018, 3:16 PM Charles Chen  wrote:
>>
>>> The examples we have right now serve both as examples to users and along
>>> with their unit tests, as tests of functionality.  If we move the examples
>>> out, what is a good way to make sure that we continue to have visibility
>>> and test coverage?  Can we address this in a section of the doc?
>>>
>>> On Wed, Aug 1, 2018 at 3:12 PM David Cavazos 
>>> wrote:
>>>
 Hi everyone!

 We wanted to migrate the examples from the core repository to a new
 Beam community examples repository. As the number of examples grow, it
 makes sense to modularize and decouple the core functionality from the
 examples.

 We will also create some guidelines with the best practices for new
 examples to be submitted.

 For more details, feel free to take a look and comment on the proposal
 
 .

 Cheers,
 David

>>>
>


Re: CODEOWNERS for apache/beam repo

2018-08-01 Thread Udi Meiri
Hi, so I saw mention bot working
 this week.
How was the quality of suggestions?

Holden, I would like to start testing Prow starting next week if
that's possible.
I'll be opening a ticket to INFRA to give my Github bot account read access
(for requesting reviews).

On Fri, Jul 27, 2018, 09:37 Udi Meiri  wrote:

> Summary doc for CODEOWNERS, Mention-bot, Prow:
> https://docs.google.com/document/d/1S8spggJsxDNYZ7aNwZN6VhLhNW372SVRezjblt-7lNQ/edit?usp=sharing
> This doc will get updated as we gain experience with Mention-bot and Prow.
>
> On Wed, Jul 25, 2018 at 5:15 PM Udi Meiri  wrote:
>
>> So I configured Prow using their getting started guide (and found a bug
>> in it) on a test repo.
>>
>> TLDR: Prow can work for us as a review assignment tool if all potential
>> reviewers are also added to the https://github.com/apache org.
>>
>> Some findings:
>> 1. Github doesn't allow non-collaborators to be listed as reviewers. :(
>> But! anyone added to the Apache org on Github may be added as a reviewer.
>> (no write access needed)
>> Is this something the ASF is willing to consider?
>>
>> 2. Prow works pretty well. I've configured it to just assign code
>> reviewers.
>> Here's an example of it in action:
>> https://github.com/udim-org/prow-test/pull/6
>> Essentially, the command we would use are:
>> "/cc @user" - to explicitly add a reviewer (/uncc to remove)
>>
>> Other command in the example above are not necessary.
>> We can still use our current PR approval and merge process.
>>
>> 3. Prow currently tries to assign 2 code reviewers, and hopefully that's
>> configurable.
>>
>> Still unsure:
>> 1. How does Prow select reviewers? Does it load balance?
>>
>> On Mon, Jul 23, 2018 at 9:51 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> It looks interesting but I would like to see the complete video and
>>> explanation about prow. Especially what we concretely need.
>>>
>>> Regards
>>> JB
>>> Le 24 juil. 2018, à 04:17, Udi Meiri  a écrit:

 I was recently told about Prow
 , which
 automates testing and merging for Kubernetes and other projects.
 It also automates assigning reviewers and suggesting approvers. Example
  PR, video
 explanation 
 I propose trying out Prow, since is it's a maintained and it uses
 OWNERS files to explicitly define both who should be reviewing and who
 should approve a PR.

 I'm not suggesting we use it to replace Jenkins or do our merges for us.


 On Tue, Jul 17, 2018 at 11:04 AM Udi Meiri  wrote:

> +1 to generating the file.
> I'll go ahead and file a PR to remove CODEOWNERS
>
> On Tue, Jul 17, 2018 at 9:28 AM Holden Karau 
> wrote:
>
>> So it doesn’t support doing that right now, although if we find it’s
>> a problem we can specify an exclude file with folks who haven’t 
>> contributed
>> in the past year. Would people want me to generate that first?
>>
>> On Tue, Jul 17, 2018 at 10:22 AM Ismaël Mejía 
>> wrote:
>>
>>> Is there a way to put inactive people as not reviewers for the blame
>>> case? I think it can be useful considering that a good amount of our
>>> committers are not active at the moment and auto-assigning reviews to
>>> them seem like a waste of energy/time.
>>> On Tue, Jul 17, 2018 at 1:58 AM Eugene Kirpichov <
>>> kirpic...@google.com> wrote:
>>> >
>>> > We did not, but I think we should. So far, in 100% of the PRs I've
>>> authored, the default functionality of CODEOWNERS did the wrong thing 
>>> and I
>>> had to fix something up manually.
>>> >
>>> > On Mon, Jul 16, 2018 at 3:42 PM Andrew Pilloud <
>>> apill...@google.com> wrote:
>>> >>
>>> >> This sounds like a good plan. Did we want to rename the
>>> CODEOWNERS file to disable github's mass adding of reviewers while we
>>> figure this out?
>>> >>
>>> >> Andrew
>>> >>
>>> >> On Mon, Jul 16, 2018 at 10:20 AM Jean-Baptiste Onofré <
>>> j...@nanthrax.net> wrote:
>>> >>>
>>> >>> +1
>>> >>>
>>> >>> Le 16 juil. 2018, à 19:17, Holden Karau 
>>> a écrit:
>>> 
>>>  Ok if no one objects I'll create the INFRA ticket after OSCON
>>> and we can test it for a week and decide if it helps or hinders.
>>> 
>>>  On Mon, Jul 16, 2018, 7:12 PM Jean-Baptiste Onofré <
>>> j...@nanthrax.net> wrote:
>>> >
>>> > Agree to test it for a week.
>>> >
>>> > Regards
>>> > JB
>>> > Le 16 juil. 2018, à 18:59, Holden Karau <
>>> holden.ka...@gmail.com> a écrit:
>>> >>
>>> >> Would folks be OK with me asking infra to turn on blame based
>>> suggestions for Beam and trying it out for a week?
>>> >>

Re: Community Examples Repository

2018-08-01 Thread Ahmet Altay
Thank you for this initiative.

How about keeping a set of core examples in the main repository as a way of
1) convenient testing at a PR level 2) Testing with end to end tests
against Beam head rather than a released Beam version 3) I think there is
some educational value in having wordcount as a simple example living along
with the code.

For anything else examples repository would be a great idea.

For testing, I would also like to understand how could we test examples
against both released versions of Beam and the code currently being
developed in master.

Ahmet

On Wed, Aug 1, 2018 at 3:36 PM, Jesse Anderson 
wrote:

> The examples have to be separate from the main beam repository. This way,
> they serve as an example of how to use them in your code instead of how to
> do it as part of Beam. It would also you to show the dependencies in sbt or
> Maven.
>
>
> On Wed, Aug 1, 2018, 3:16 PM Charles Chen  wrote:
>
>> The examples we have right now serve both as examples to users and along
>> with their unit tests, as tests of functionality.  If we move the examples
>> out, what is a good way to make sure that we continue to have visibility
>> and test coverage?  Can we address this in a section of the doc?
>>
>> On Wed, Aug 1, 2018 at 3:12 PM David Cavazos  wrote:
>>
>>> Hi everyone!
>>>
>>> We wanted to migrate the examples from the core repository to a new Beam
>>> community examples repository. As the number of examples grow, it makes
>>> sense to modularize and decouple the core functionality from the examples.
>>>
>>> We will also create some guidelines with the best practices for new
>>> examples to be submitted.
>>>
>>> For more details, feel free to take a look and comment on the proposal
>>> 
>>> .
>>>
>>> Cheers,
>>> David
>>>
>>


Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-01 Thread Boyuan Zhang
+1
Tested Dataflow related items in:
https://s.apache.org/beam-release-validation

On Wed, Aug 1, 2018 at 11:40 AM Yifan Zou  wrote:

> +1
> Tested Python quickstarts and mobile gaming examples against tar and wheel
> versions.
> https://builds.apache.org/job/beam_PostRelease_Python_Candidate/123/
>
> On Wed, Aug 1, 2018 at 8:27 AM Andrew Pilloud  wrote:
>
>> +1 tested the Beam SQL jar from the Maven Central repo, it worked.
>>
>> On Wed, Aug 1, 2018 at 7:37 AM Romain Manni-Bucau 
>> wrote:
>>
>>> Hi Pablo,
>>>
>>> +1, tested on my apps and libs and words after some fixed due to some
>>> breaking changes in ArgProvider - but guess it is not "public" to need to
>>> be reported.
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau  |  Blog
>>>  | Old Blog
>>>  | Github
>>>  | LinkedIn
>>>  | Book
>>> 
>>>
>>>
>>> Le mer. 1 août 2018 à 01:50, Pablo Estrada  a
>>> écrit :
>>>
 Hello everyone!

 I have been able to prepare a release candidate for Beam 2.6.0. : D

 Please review and vote on the release candidate #1 for the version
 2.6.0, as follows:

 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)

 The complete staged set of artifacts is available for your review,
 which includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to dist.apache.org
 [2], which is signed with the key with fingerprint
 2F1FEDCDF6DD7990422F482F65224E0292DD8A51 [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.6.0-RC1" [5],
 * website pull request listing the release and publishing the API
 reference manual [6].
 * Python artifacts are deployed along with the source release to the
 dist.apache.org [2].

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 Regards
 -Pablo.

 [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12343392
 [2] https://dist.apache.org/repos/dist/dev/beam/2.6.0/
 [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
 [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1044/
 [5] https://github.com/apache/beam/tree/v2.6.0-RC1
 [6] https://github.com/apache/beam-site/pull/518

 --
 Got feedback? go/pabloem-feedback
 

>>>


Re: Community Examples Repository

2018-08-01 Thread Jesse Anderson
The examples have to be separate from the main beam repository. This way,
they serve as an example of how to use them in your code instead of how to
do it as part of Beam. It would also you to show the dependencies in sbt or
Maven.

On Wed, Aug 1, 2018, 3:16 PM Charles Chen  wrote:

> The examples we have right now serve both as examples to users and along
> with their unit tests, as tests of functionality.  If we move the examples
> out, what is a good way to make sure that we continue to have visibility
> and test coverage?  Can we address this in a section of the doc?
>
> On Wed, Aug 1, 2018 at 3:12 PM David Cavazos  wrote:
>
>> Hi everyone!
>>
>> We wanted to migrate the examples from the core repository to a new Beam
>> community examples repository. As the number of examples grow, it makes
>> sense to modularize and decouple the core functionality from the examples.
>>
>> We will also create some guidelines with the best practices for new
>> examples to be submitted.
>>
>> For more details, feel free to take a look and comment on the proposal
>> 
>> .
>>
>> Cheers,
>> David
>>
>


Re: Community Examples Repository

2018-08-01 Thread David Cavazos
For visibility, we can have a link on both the beam.apache.org website and
in the core repository's README file.

Regarding testing *could* be a little trickier. Any unit test should
continue to live in the core repository, and the examples from the examples
repository could serve as end-to-end tests. This means the examples
repository also needs a testing infrastructure, which should be triggerable
from the root directory. Having it this way would allow us to have a test
in the core repository that clones the examples repository in a temporary
directory and runs the tests from there.

This way, users don't *need* to have the examples as a strict dependency,
but developers modifying the core repository will have both (just like it
is today).

On Wed, Aug 1, 2018 at 3:16 PM Charles Chen  wrote:

> The examples we have right now serve both as examples to users and along
> with their unit tests, as tests of functionality.  If we move the examples
> out, what is a good way to make sure that we continue to have visibility
> and test coverage?  Can we address this in a section of the doc?
>
> On Wed, Aug 1, 2018 at 3:12 PM David Cavazos  wrote:
>
>> Hi everyone!
>>
>> We wanted to migrate the examples from the core repository to a new Beam
>> community examples repository. As the number of examples grow, it makes
>> sense to modularize and decouple the core functionality from the examples.
>>
>> We will also create some guidelines with the best practices for new
>> examples to be submitted.
>>
>> For more details, feel free to take a look and comment on the proposal
>> 
>> .
>>
>> Cheers,
>> David
>>
>


Re: Community Examples Repository

2018-08-01 Thread Charles Chen
The examples we have right now serve both as examples to users and along
with their unit tests, as tests of functionality.  If we move the examples
out, what is a good way to make sure that we continue to have visibility
and test coverage?  Can we address this in a section of the doc?

On Wed, Aug 1, 2018 at 3:12 PM David Cavazos  wrote:

> Hi everyone!
>
> We wanted to migrate the examples from the core repository to a new Beam
> community examples repository. As the number of examples grow, it makes
> sense to modularize and decouple the core functionality from the examples.
>
> We will also create some guidelines with the best practices for new
> examples to be submitted.
>
> For more details, feel free to take a look and comment on the proposal
> 
> .
>
> Cheers,
> David
>


Community Examples Repository

2018-08-01 Thread David Cavazos
Hi everyone!

We wanted to migrate the examples from the core repository to a new Beam
community examples repository. As the number of examples grow, it makes
sense to modularize and decouple the core functionality from the examples.

We will also create some guidelines with the best practices for new
examples to be submitted.

For more details, feel free to take a look and comment on the proposal

.

Cheers,
David


Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-01 Thread Yifan Zou
+1
Tested Python quickstarts and mobile gaming examples against tar and wheel
versions.
https://builds.apache.org/job/beam_PostRelease_Python_Candidate/123/

On Wed, Aug 1, 2018 at 8:27 AM Andrew Pilloud  wrote:

> +1 tested the Beam SQL jar from the Maven Central repo, it worked.
>
> On Wed, Aug 1, 2018 at 7:37 AM Romain Manni-Bucau 
> wrote:
>
>> Hi Pablo,
>>
>> +1, tested on my apps and libs and words after some fixed due to some
>> breaking changes in ArgProvider - but guess it is not "public" to need to
>> be reported.
>>
>> Romain Manni-Bucau
>> @rmannibucau  |  Blog
>>  | Old Blog
>>  | Github
>>  | LinkedIn
>>  | Book
>> 
>>
>>
>> Le mer. 1 août 2018 à 01:50, Pablo Estrada  a écrit :
>>
>>> Hello everyone!
>>>
>>> I have been able to prepare a release candidate for Beam 2.6.0. : D
>>>
>>> Please review and vote on the release candidate #1 for the version
>>> 2.6.0, as follows:
>>>
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> The complete staged set of artifacts is available for your review, which
>>> includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint
>>> 2F1FEDCDF6DD7990422F482F65224E0292DD8A51 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.6.0-RC1" [5],
>>> * website pull request listing the release and publishing the API
>>> reference manual [6].
>>> * Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> Regards
>>> -Pablo.
>>>
>>> [1]
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12343392
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.6.0/
>>> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
>>> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1044/
>>> [5] https://github.com/apache/beam/tree/v2.6.0-RC1
>>> [6] https://github.com/apache/beam-site/pull/518
>>>
>>> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>


Re: Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #127

2018-08-01 Thread Chamikara Jayalath
Created https://issues.apache.org/jira/browse/BEAM-5057

On Wed, Aug 1, 2018 at 1:20 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See <
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/127/display/redirect?page=changes
> >
>
> Changes:
>
> [github] [BEAM-4852] Only read symbol table when required.
>
> [github] Update symbols.go
>
> [github] Don't rely on order of elements in a PCollection after GBK in
>
> [devinduan] Spelling mistakes
>
> [rober] Avoid overwritting user changes to Resolver
>
> [rober] Clean up deferedResolver
>
> [pablo] Fix scheduling for jobs
>
> [relax] Add convenience methods for pojo and javabean schema registration.
>
> [relax] Address code-review comments.
>
> --
> [...truncated 18.85 MB...]
>
> > Task :beam-sdks-java-maven-archetypes-starter:processTestResources
> UP-TO-DATE
> Build cache key for task
> ':beam-sdks-java-maven-archetypes-starter:processTestResources' is
> f74f3200edf284b276c50da93794d928
> Caching disabled for task
> ':beam-sdks-java-maven-archetypes-starter:processTestResources': Caching
> has not been enabled for the task
> Skipping task
> ':beam-sdks-java-maven-archetypes-starter:processTestResources' as it is
> up-to-date.
> :beam-sdks-java-maven-archetypes-starter:processTestResources
> (Thread[Daemon worker,5,main]) completed. Took 0.002 secs.
> :beam-sdks-java-maven-archetypes-starter:testClasses (Thread[Daemon
> worker,5,main]) started.
>
> > Task :beam-sdks-java-maven-archetypes-starter:testClasses UP-TO-DATE
> Skipping task ':beam-sdks-java-maven-archetypes-starter:testClasses' as it
> has no actions.
> :beam-sdks-java-maven-archetypes-starter:testClasses (Thread[Daemon
> worker,5,main]) completed. Took 0.0 secs.
> :beam-sdks-java-maven-archetypes-starter:shadowTestJar (Thread[Task worker
> for ':',5,main]) started.
>
> > Task :beam-sdks-java-maven-archetypes-starter:shadowTestJar
> Build cache key for task
> ':beam-sdks-java-maven-archetypes-starter:shadowTestJar' is
> df2c278f7c412c8cac98a11ccfcec622
> Caching disabled for task
> ':beam-sdks-java-maven-archetypes-starter:shadowTestJar': Caching has not
> been enabled for the task
> Task ':beam-sdks-java-maven-archetypes-starter:shadowTestJar' is not
> up-to-date because:
>   No history is available.
> ***
> GRADLE SHADOW STATS
>
> Total Jars: 1 (includes project)
> Total Time: 0.0s [0ms]
> Average Time/Jar: 0.0s [0.0ms]
> ***
> :beam-sdks-java-maven-archetypes-starter:shadowTestJar (Thread[Task worker
> for ':',5,main]) completed. Took 0.007 secs.
> :beam-sdks-java-maven-archetypes-starter:sourcesJar (Thread[Task worker
> for ':',5,main]) started.
>
> > Task :beam-sdks-java-maven-archetypes-starter:sourcesJar
> file or directory '<
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/ws/src/sdks/java/maven-archetypes/starter/src/main/java',>
> not found
> Build cache key for task
> ':beam-sdks-java-maven-archetypes-starter:sourcesJar' is
> a106f15937cacfee668e25636b705e03
> Caching disabled for task
> ':beam-sdks-java-maven-archetypes-starter:sourcesJar': Caching has not been
> enabled for the task
> Task ':beam-sdks-java-maven-archetypes-starter:sourcesJar' is not
> up-to-date because:
>   No history is available.
> file or directory '<
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/ws/src/sdks/java/maven-archetypes/starter/src/main/java',>
> not found
> :beam-sdks-java-maven-archetypes-starter:sourcesJar (Thread[Task worker
> for ':',5,main]) completed. Took 0.003 secs.
> :beam-sdks-java-maven-archetypes-starter:testSourcesJar (Thread[Task
> worker for ':',5,main]) started.
>
> > Task :beam-sdks-java-maven-archetypes-starter:testSourcesJar
> file or directory '<
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/ws/src/sdks/java/maven-archetypes/starter/src/test/java',>
> not found
> Build cache key for task
> ':beam-sdks-java-maven-archetypes-starter:testSourcesJar' is
> 58715d6b8e221cace68f230ccfd69fd4
> Caching disabled for task
> ':beam-sdks-java-maven-archetypes-starter:testSourcesJar': Caching has not
> been enabled for the task
> Task ':beam-sdks-java-maven-archetypes-starter:testSourcesJar' is not
> up-to-date because:
>   No history is available.
> file or directory '<
> https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/ws/src/sdks/java/maven-archetypes/starter/src/test/java',>
> not found
> :beam-sdks-java-maven-archetypes-starter:testSourcesJar (Thread[Task
> worker for ':',5,main]) completed. Took 0.003 secs.
> :beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication
> (Thread[Task worker for ':',5,main]) started.
>
> > Task :beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication
> Build cache key for task
> ':beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication' is
> e88836a5bca732f78522d2de5a70d4e6
> Caching disabled for task
> 

Jenkins build is back to normal : beam_SeedJob #2346

2018-08-01 Thread Apache Jenkins Server
See 



Build failed in Jenkins: beam_SeedJob #2345

2018-08-01 Thread Apache Jenkins Server
See 

--
GitHub pull request #4943 of commit 73965f43d84ed30cba50c6802783d10df6fef9d4, 
no merge conflicts.
Setting status of 73965f43d84ed30cba50c6802783d10df6fef9d4 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/2345/ and message: 'Build started 
for merge commit.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam12 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/4943/*:refs/remotes/origin/pr/4943/*
 > git rev-parse refs/remotes/origin/pr/4943/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/4943/merge^{commit} # timeout=10
Checking out Revision 988910196d44938b517629fa71e93f0cd78b9a5a 
(refs/remotes/origin/pr/4943/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 988910196d44938b517629fa71e93f0cd78b9a5a
Commit message: "Merge 73965f43d84ed30cba50c6802783d10df6fef9d4 into 
4310cbf78b62b2d4948978df95197f5347f08c02"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
Processing DSL script job_Inventory.groovy
Processing DSL script job_PerformanceTests_Dataflow.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_PerformanceTests_JDBC.groovy
Processing DSL script job_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_PerformanceTests_Python.groovy
Processing DSL script job_PerformanceTests_Spark.groovy
Processing DSL script job_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_Nexmark_Dataflow.groovy
Processing DSL script job_PostCommit_Java_Nexmark_Direct.groovy
Processing DSL script job_PostCommit_Java_Nexmark_Flink.groovy
Processing DSL script job_PostCommit_Java_Nexmark_Spark.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Samza.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script job_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_PostCommit_Python_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Python_Verify.groovy
Processing DSL script job_PostRelease_NightlySnapshot.groovy
Processing DSL script job_PreCommit_Go.groovy
Processing DSL script job_PreCommit_Java.groovy
Processing DSL script job_PreCommit_Python.groovy
Processing DSL script job_PreCommit_Website_Merge.groovy
Processing DSL script job_PreCommit_Website_Stage.groovy
Processing DSL script job_PreCommit_Website_Test.groovy
Processing DSL script job_ReleaseCandidate_Python.groovy
ERROR: (job_ReleaseCandidate_Python.groovy, line 38) No such property: 
common_job_properties for class: javaposse.jobdsl.dsl.helpers.step.StepContext


Build failed in Jenkins: beam_SeedJob #2344

2018-08-01 Thread Apache Jenkins Server
See 

--
GitHub pull request #4943 of commit 901cdcf0d8d4264035c5da668cec9a39743317cf, 
no merge conflicts.
Setting status of 901cdcf0d8d4264035c5da668cec9a39743317cf to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/2344/ and message: 'Build started 
for merge commit.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam10 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/4943/*:refs/remotes/origin/pr/4943/*
 > git rev-parse refs/remotes/origin/pr/4943/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/4943/merge^{commit} # timeout=10
Checking out Revision 313bb76b02828f781b97047cf02ca76edb136217 
(refs/remotes/origin/pr/4943/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 313bb76b02828f781b97047cf02ca76edb136217
Commit message: "Merge 901cdcf0d8d4264035c5da668cec9a39743317cf into 
4310cbf78b62b2d4948978df95197f5347f08c02"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
Processing DSL script job_Inventory.groovy
Processing DSL script job_PerformanceTests_Dataflow.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT.groovy
Processing DSL script job_PerformanceTests_FileBasedIO_IT_HDFS.groovy
Processing DSL script job_PerformanceTests_HadoopInputFormat.groovy
Processing DSL script job_PerformanceTests_JDBC.groovy
Processing DSL script job_PerformanceTests_MongoDBIO_IT.groovy
Processing DSL script job_PerformanceTests_Python.groovy
Processing DSL script job_PerformanceTests_Spark.groovy
Processing DSL script job_PostCommit_Go_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_GradleBuild.groovy
Processing DSL script job_PostCommit_Java_Nexmark_Dataflow.groovy
Processing DSL script job_PostCommit_Java_Nexmark_Direct.groovy
Processing DSL script job_PostCommit_Java_Nexmark_Flink.groovy
Processing DSL script job_PostCommit_Java_Nexmark_Spark.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Apex.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Flink.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Gearpump.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Samza.groovy
Processing DSL script job_PostCommit_Java_ValidatesRunner_Spark.groovy
Processing DSL script job_PostCommit_Python_ValidatesContainer_Dataflow.groovy
Processing DSL script job_PostCommit_Python_ValidatesRunner_Dataflow.groovy
Processing DSL script job_PostCommit_Python_Verify.groovy
Processing DSL script job_PostRelease_NightlySnapshot.groovy
Processing DSL script job_PreCommit_Go.groovy
Processing DSL script job_PreCommit_Java.groovy
Processing DSL script job_PreCommit_Python.groovy
Processing DSL script job_PreCommit_Website_Merge.groovy
Processing DSL script job_PreCommit_Website_Stage.groovy
Processing DSL script job_PreCommit_Website_Test.groovy
Processing DSL script job_ReleaseCandidate_Python.groovy
ERROR: (job_ReleaseCandidate_Python.groovy, line 38) No such property: 
common_job_properties for class: javaposse.jobdsl.dsl.helpers.step.StepContext


Re: Parallelizing test runs

2018-08-01 Thread Pablo Estrada
It feels to me like a peak of 60 jobs per minute is pretty high. If I
understand correctly, we run up to 20 dataflow jobs in parallel per test
suite? Or what's the number here?

It is also true that most our tests are simple NeedsRunner tests, that test
a couple elements, so the whole pipeline overhead is on startup. This may
be improved by lumping tests together (though might we lose
debuggability?).  Our average number of jobs is, I hope, muuuch smaller
than 60 per minute...

With all these considerations, I would lean more towards having a retry
policy as the immediate solution.
-P.

On Wed, Aug 1, 2018 at 9:07 AM Andrew Pilloud  wrote:

> I like 1 and 2. How do credentials get into Jenkins? Could we create a
> user per Jenkins host?
>
> On Tue, Jul 31, 2018 at 4:33 PM Reuven Lax  wrote:
>
>> There was also a proposal to lump multiple tests into a single Dataflow
>> job instead of spinning up a separate Dataflow job for each test.
>>
>> On Tue, Jul 31, 2018 at 4:26 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> I synced with Rafael. Below is summary of discussion.
>>>
>>> This quota is CreateRequestsPerMinutePerUser and it has 60 requests per
>>> user by default.
>>>
>>> I've created Jira [BEAM-5053](
>>> https://issues.apache.org/jira/browse/BEAM-5053) for this.
>>>
>>> I see following options we can utilize:
>>> 1. Add retry logic. Although this limits us to 1 dataflow job start per
>>> second for whole Jenkins. In long scale this can also block one test job if
>>> other jobs take all the slots.
>>> 2. Utilize different users to spin Dataflow jobs.
>>> 3. Find way to rise quota limit on Dataflow. By default the field limits
>>> value to 60 requests per minute.
>>> 4. Long run generic suggestion: limit amount of dataflow jobs we spin up
>>> and move tests to the form of unit or component tests.
>>>
>>> Please, fill in any insights or ideas you have on this.
>>>
>>> Regards,
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>>
>>> On Tue, Jul 31, 2018 at 3:55 PM Mikhail Gryzykhin 
>>> wrote:
>>>
 Hi Everyone,

 Seems that we hit quota issue again:
 https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/553/consoleFull

 Can someone share information on how was this triaged last time or
 guide me on possible follow-up actions?

 Regards,
 --Mikhail

 Have feedback ?


 On Tue, Jul 3, 2018 at 9:12 PM Rafael Fernandez 
 wrote:

> Summary for all folks following this story -- and many thanks for
> explaining configs to me and pointing me to files and such.
>
> - Scott made changes to the config and we can now run 3
> ValidatesRunner.Dataflow in parallel (each run is about 2 hours)
> - With the latest quota changes, we peaked at ~70% capacity in
> concurrent Dataflow jobs when running those
> - I've been keeping an eye on quota peaks for all resources today and
> have not seen any worryisome limits overall.
> - Also note there are improvements planned to the
> ValidatesRunner.Dataflow test so various items get batched and the test
> itself runs faster -- I believe it's on Alan's radar
>
> Cheers,
> r
>
> On Mon, Jul 2, 2018 at 4:23 PM Rafael Fernandez 
> wrote:
>
>> Done!
>>
>> On Mon, Jul 2, 2018 at 4:10 PM Scott Wegner  wrote:
>>
>>> Hey Rafael, looks like we need more 'INSTANCE_TEMPLATES' quota [1].
>>> Can you take a look? I've filed [BEAM-4722]:
>>> https://issues.apache.org/jira/browse/BEAM-4722
>>>
>>> [1] https://github.com/apache/beam/pull/5861#issuecomment-401963630
>>>
>>> On Mon, Jul 2, 2018 at 11:33 AM Rafael Fernandez <
>>> rfern...@google.com> wrote:
>>>
 OK, Scott just sent https://github.com/apache/beam/pull/5860 .
 Quotas should not be a problem, if they are, please file a JIRA under
 gcp-quota.

 Cheers,
 r

 On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles 
 wrote:

> One thing that is nice when you do this is to be able to share
> your results. Though if all you are sharing is "they passed" then I 
> guess
> we don't have to insist on evidence.
>
> Kenn
>
> On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner 
> wrote:
>
>> A few thoughts:
>>
>> * The Jenkins job getting backed up
>> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since
>> Mikhail refactored Jenkins jobs, this only runs when explicitly 
>> requested
>> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So 
>> this job
>> is idle more often than backlogged.
>>
>> * It's difficult to reason about our exact quota needs because
>> Dataflow jobs get launched from various Jenkins jobs that have 
>> different

Re: Parallelizing test runs

2018-08-01 Thread Andrew Pilloud
I like 1 and 2. How do credentials get into Jenkins? Could we create a user
per Jenkins host?

On Tue, Jul 31, 2018 at 4:33 PM Reuven Lax  wrote:

> There was also a proposal to lump multiple tests into a single Dataflow
> job instead of spinning up a separate Dataflow job for each test.
>
> On Tue, Jul 31, 2018 at 4:26 PM Mikhail Gryzykhin 
> wrote:
>
>> I synced with Rafael. Below is summary of discussion.
>>
>> This quota is CreateRequestsPerMinutePerUser and it has 60 requests per
>> user by default.
>>
>> I've created Jira [BEAM-5053](
>> https://issues.apache.org/jira/browse/BEAM-5053) for this.
>>
>> I see following options we can utilize:
>> 1. Add retry logic. Although this limits us to 1 dataflow job start per
>> second for whole Jenkins. In long scale this can also block one test job if
>> other jobs take all the slots.
>> 2. Utilize different users to spin Dataflow jobs.
>> 3. Find way to rise quota limit on Dataflow. By default the field limits
>> value to 60 requests per minute.
>> 4. Long run generic suggestion: limit amount of dataflow jobs we spin up
>> and move tests to the form of unit or component tests.
>>
>> Please, fill in any insights or ideas you have on this.
>>
>> Regards,
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Tue, Jul 31, 2018 at 3:55 PM Mikhail Gryzykhin 
>> wrote:
>>
>>> Hi Everyone,
>>>
>>> Seems that we hit quota issue again:
>>> https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/553/consoleFull
>>>
>>> Can someone share information on how was this triaged last time or guide
>>> me on possible follow-up actions?
>>>
>>> Regards,
>>> --Mikhail
>>>
>>> Have feedback ?
>>>
>>>
>>> On Tue, Jul 3, 2018 at 9:12 PM Rafael Fernandez 
>>> wrote:
>>>
 Summary for all folks following this story -- and many thanks for
 explaining configs to me and pointing me to files and such.

 - Scott made changes to the config and we can now run 3
 ValidatesRunner.Dataflow in parallel (each run is about 2 hours)
 - With the latest quota changes, we peaked at ~70% capacity in
 concurrent Dataflow jobs when running those
 - I've been keeping an eye on quota peaks for all resources today and
 have not seen any worryisome limits overall.
 - Also note there are improvements planned to the
 ValidatesRunner.Dataflow test so various items get batched and the test
 itself runs faster -- I believe it's on Alan's radar

 Cheers,
 r

 On Mon, Jul 2, 2018 at 4:23 PM Rafael Fernandez 
 wrote:

> Done!
>
> On Mon, Jul 2, 2018 at 4:10 PM Scott Wegner  wrote:
>
>> Hey Rafael, looks like we need more 'INSTANCE_TEMPLATES' quota [1].
>> Can you take a look? I've filed [BEAM-4722]:
>> https://issues.apache.org/jira/browse/BEAM-4722
>>
>> [1] https://github.com/apache/beam/pull/5861#issuecomment-401963630
>>
>> On Mon, Jul 2, 2018 at 11:33 AM Rafael Fernandez 
>> wrote:
>>
>>> OK, Scott just sent https://github.com/apache/beam/pull/5860 .
>>> Quotas should not be a problem, if they are, please file a JIRA under
>>> gcp-quota.
>>>
>>> Cheers,
>>> r
>>>
>>> On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles 
>>> wrote:
>>>
 One thing that is nice when you do this is to be able to share your
 results. Though if all you are sharing is "they passed" then I guess we
 don't have to insist on evidence.

 Kenn

 On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner 
 wrote:

> A few thoughts:
>
> * The Jenkins job getting backed up
> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since
> Mikhail refactored Jenkins jobs, this only runs when explicitly 
> requested
> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So 
> this job
> is idle more often than backlogged.
>
> * It's difficult to reason about our exact quota needs because
> Dataflow jobs get launched from various Jenkins jobs that have 
> different
> parallelism configurations. If we have budget, we could enable 
> concurrent
> execution of this job and increase our quota enough to give some 
> breathing
> room. If we do this, I recommend limiting the max concurrency via
> throttleConcurrentBuilds [2] to some reasonable limit.
>
> * This test suite is meant to be an exhaustive post-commit
> validation of Dataflow runner, and tests a lot of different aspects 
> of a
> runner. It would be more efficient to run locally only the tests 
> affected
> by your change. Note that this requires having access to a GCP 
> project with
> billing, but most Dataflow developers probably have access to this 
> already.
> The 

Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-01 Thread Andrew Pilloud
+1 tested the Beam SQL jar from the Maven Central repo, it worked.

On Wed, Aug 1, 2018 at 7:37 AM Romain Manni-Bucau 
wrote:

> Hi Pablo,
>
> +1, tested on my apps and libs and words after some fixed due to some
> breaking changes in ArgProvider - but guess it is not "public" to need to
> be reported.
>
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github
>  | LinkedIn
>  | Book
> 
>
>
> Le mer. 1 août 2018 à 01:50, Pablo Estrada  a écrit :
>
>> Hello everyone!
>>
>> I have been able to prepare a release candidate for Beam 2.6.0. : D
>>
>> Please review and vote on the release candidate #1 for the version 2.6.0,
>> as follows:
>>
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> The complete staged set of artifacts is available for your review, which
>> includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint
>> 2F1FEDCDF6DD7990422F482F65224E0292DD8A51 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.6.0-RC1" [5],
>> * website pull request listing the release and publishing the API
>> reference manual [6].
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>>
>> Regards
>> -Pablo.
>>
>> [1]
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12343392
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.6.0/
>> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1044/
>> [5] https://github.com/apache/beam/tree/v2.6.0-RC1
>> [6] https://github.com/apache/beam-site/pull/518
>>
>> --
>> Got feedback? go/pabloem-feedback
>> 
>>
>


Re: [VOTE] Apache Beam, version 2.6.0, release candidate #1

2018-08-01 Thread Romain Manni-Bucau
Hi Pablo,

+1, tested on my apps and libs and words after some fixed due to some
breaking changes in ArgProvider - but guess it is not "public" to need to
be reported.

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le mer. 1 août 2018 à 01:50, Pablo Estrada  a écrit :

> Hello everyone!
>
> I have been able to prepare a release candidate for Beam 2.6.0. : D
>
> Please review and vote on the release candidate #1 for the version 2.6.0,
> as follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staged set of artifacts is available for your review, which
> includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint
> 2F1FEDCDF6DD7990422F482F65224E0292DD8A51 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.6.0-RC1" [5],
> * website pull request listing the release and publishing the API
> reference manual [6].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Regards
> -Pablo.
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12343392
> [2] https://dist.apache.org/repos/dist/dev/beam/2.6.0/
> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1044/
> [5] https://github.com/apache/beam/tree/v2.6.0-RC1
> [6] https://github.com/apache/beam-site/pull/518
>
> --
> Got feedback? go/pabloem-feedback
>


Re: Cleanup resources on pipeline cancelation

2018-08-01 Thread Chamikara Jayalath
Hi Andrew,

Beam currently does not have a generalized cleanup story so answer usually
has been ad-hoc. For bounded source we can (1) cleanup any resources
created for splitting after splitting (2) cleanup resources created for a
given reader when the reader exists (last advaince() call).

I'm not sure what the proper solution for UnboundedSources is and it might
not even make sense to to add cleanup logic to an unbounded source that is
never expected to end. We might need something more generic (for example, a
mechanism to collect temporary resources and delete such resources at
pipeline termination).

Thanks,
Cham


On Tue, Jul 31, 2018 at 10:04 PM Romain Manni-Bucau 
wrote:

> Hi Andrew,
>
> IIRC sources should clean up their resources per method since they dont
> have a better lifecycle. Readers can create anything longer and release it
> at close time.
>
>
> Le mer. 1 août 2018 00:31, Andrew Pilloud  a écrit :
>
>> Some of our IOs create external resources that need to be cleaned up when
>> a pipeline is terminated. It looks like the
>> org.apache.beam.sdk.io.UnboundedSource interface is called on creation, but
>> there is no call for cleanup. For example, PubsubIO creates a Pubsub
>> subcription in createReader()/split() and it should be deleted at shutdown.
>> Does anyone have ideas on how I might make this happen?
>>
>> (I filed https://issues.apache.org/jira/browse/BEAM-5051 tracking the
>> PubSub specific issue.)
>>
>> Andrew
>>
>


Re: pipeline with parquet and sql

2018-08-01 Thread Chamikara Jayalath
On Wed, Aug 1, 2018 at 1:12 AM Akanksha Sharma B <
akanksha.b.sha...@ericsson.com> wrote:

> Hi,
>
>
> Thanks. I understood the Parquet point. I will wait for couple of days on
> this topic. Even if this scenario cannot be achieved now, any design
> document or future plans towards this direction will also be helpful to me.
>
>
> To summarize, I do not understand beam well enough, can someone please
> help me and comment whether the following fits with beam's model and
> future direction :-
>
> "read parquet (along with inferred schema) into something like dataframe
> or Beam Rows. And vice versa for write i.e. get rows and write parquet
> based on Row's schema."
>

Beam currently does not have a standard message format. A Beam pipeline
consists of PCollections and transforms (that converts PCollections to
other PCollections). You can transform the PCollection read from Parquet
using a ParDo and writing the resulting transform back to Parquet format. I
think Schema aware PCollections [1] might be close to what you need but not
sure if it fulfills your exact requirement.

Thanks,
Cham

[1]
https://lists.apache.org/thread.html/fe327866c6c81b7e55af28f81cedd9b2e588279def330940e8b8ebd7@%3Cdev.beam.apache.org%3E


>
>
>
> Regards,
>
> Akanksha
>
>
> --
> *From:* Łukasz Gajowy 
> *Sent:* Tuesday, July 31, 2018 12:43:32 PM
> *To:* u...@beam.apache.org
> *Cc:* dev@beam.apache.org
> *Subject:* Re: pipeline with parquet and sql
>
> In terms of schema and ParquetIO source/sink, there was an answer in some
> previous thread:
>
> Currently (without introducing any change in ParquetIO) there is no way to
> not pass the avro schema. It will probably be replaced with Beam's schema
> in the future ()
>
> [1]
> https://lists.apache.org/thread.html/a466ddeb55e47fd780be3bcd8eec9d6b6eaf1dfd566ae5278b5fb9e8@%3Cuser.beam.apache.org%3E
>
>
> wt., 31 lip 2018 o 10:19 Akanksha Sharma B 
> napisał(a):
>
> Hi,
>
>
> I am hoping to get some hints/pointers from the experts here.
>
> I hope the scenario described below was understandable. I hope it is a
> valid use-case. Please let me know if I need to explain the scenario
> better.
>
>
> Regards,
>
> Akanksha
>
> --
> *From:* Akanksha Sharma B
> *Sent:* Friday, July 27, 2018 9:44 AM
> *To:* dev@beam.apache.org
> *Subject:* Re: pipeline with parquet and sql
>
>
> Hi,
>
>
> Please consider following pipeline:-
>
>
> Source is Parquet file, having hundreds of columns.
>
> Sink is Parquet. Multiple output parquet files are generated after
> applying some sql joins. Sql joins to be applied differ for each output
> parquet file. Lets assume we have a sql queries generator or some
> configuration file with the needed info.
>
>
> Can this be implemented generically, such that there is no need of the
> schema of the parquet files involved or any intermediate POJO or beam
> schema.
>
> i.e. the way spark can handle it - read parquet into dataframe, create
> temp view and apply sql queries to it, and write it back to parquet.
>
> As I understand, beam SQL needs (Beam Schema or POJOs) and parquetIO needs
> avro schemas. Ideally we dont want to see POJOs or schemas.
> If there is a way we can achieve this with beam, please do help.
>
> Regards,
> Akanksha
>
> --
> *From:* Akanksha Sharma B
> *Sent:* Tuesday, July 24, 2018 4:47:25 PM
> *To:* u...@beam.apache.org
> *Subject:* pipeline with parquet and sql
>
>
> Hi,
>
>
> Please consider following pipeline:-
>
>
> Source is Parquet file, having hundreds of columns.
>
> Sink is Parquet. Multiple output parquet files are generated after
> applying some sql joins. Sql joins to be applied differ for each output
> parquet file. Lets assume we have a sql queries generator or some
> configuration file with the needed info.
>
>
> Can this be implemented generically, such that there is no need of the
> schema of the parquet files involved or any intermediate POJO or beam
> schema.
>
> i.e. the way spark can handle it - read parquet into dataframe, create
> temp view and apply sql queries to it, and write it back to parquet.
>
> As I understand, beam SQL needs (Beam Schema or POJOs) and parquetIO needs
> avro schemas. Ideally we dont want to see POJOs or schemas.
> If there is a way we can achieve this with beam, please do help.
>
> Regards,
> Akanksha
>
>
>
>


Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #127

2018-08-01 Thread Apache Jenkins Server
See 


Changes:

[github] [BEAM-4852] Only read symbol table when required.

[github] Update symbols.go

[github] Don't rely on order of elements in a PCollection after GBK in

[devinduan] Spelling mistakes

[rober] Avoid overwritting user changes to Resolver

[rober] Clean up deferedResolver

[pablo] Fix scheduling for jobs

[relax] Add convenience methods for pojo and javabean schema registration.

[relax] Address code-review comments.

--
[...truncated 18.85 MB...]

> Task :beam-sdks-java-maven-archetypes-starter:processTestResources UP-TO-DATE
Build cache key for task 
':beam-sdks-java-maven-archetypes-starter:processTestResources' is 
f74f3200edf284b276c50da93794d928
Caching disabled for task 
':beam-sdks-java-maven-archetypes-starter:processTestResources': Caching has 
not been enabled for the task
Skipping task ':beam-sdks-java-maven-archetypes-starter:processTestResources' 
as it is up-to-date.
:beam-sdks-java-maven-archetypes-starter:processTestResources (Thread[Daemon 
worker,5,main]) completed. Took 0.002 secs.
:beam-sdks-java-maven-archetypes-starter:testClasses (Thread[Daemon 
worker,5,main]) started.

> Task :beam-sdks-java-maven-archetypes-starter:testClasses UP-TO-DATE
Skipping task ':beam-sdks-java-maven-archetypes-starter:testClasses' as it has 
no actions.
:beam-sdks-java-maven-archetypes-starter:testClasses (Thread[Daemon 
worker,5,main]) completed. Took 0.0 secs.
:beam-sdks-java-maven-archetypes-starter:shadowTestJar (Thread[Task worker for 
':',5,main]) started.

> Task :beam-sdks-java-maven-archetypes-starter:shadowTestJar
Build cache key for task 
':beam-sdks-java-maven-archetypes-starter:shadowTestJar' is 
df2c278f7c412c8cac98a11ccfcec622
Caching disabled for task 
':beam-sdks-java-maven-archetypes-starter:shadowTestJar': Caching has not been 
enabled for the task
Task ':beam-sdks-java-maven-archetypes-starter:shadowTestJar' is not up-to-date 
because:
  No history is available.
***
GRADLE SHADOW STATS

Total Jars: 1 (includes project)
Total Time: 0.0s [0ms]
Average Time/Jar: 0.0s [0.0ms]
***
:beam-sdks-java-maven-archetypes-starter:shadowTestJar (Thread[Task worker for 
':',5,main]) completed. Took 0.007 secs.
:beam-sdks-java-maven-archetypes-starter:sourcesJar (Thread[Task worker for 
':',5,main]) started.

> Task :beam-sdks-java-maven-archetypes-starter:sourcesJar
file or directory 
'
 not found
Build cache key for task ':beam-sdks-java-maven-archetypes-starter:sourcesJar' 
is a106f15937cacfee668e25636b705e03
Caching disabled for task 
':beam-sdks-java-maven-archetypes-starter:sourcesJar': Caching has not been 
enabled for the task
Task ':beam-sdks-java-maven-archetypes-starter:sourcesJar' is not up-to-date 
because:
  No history is available.
file or directory 
'
 not found
:beam-sdks-java-maven-archetypes-starter:sourcesJar (Thread[Task worker for 
':',5,main]) completed. Took 0.003 secs.
:beam-sdks-java-maven-archetypes-starter:testSourcesJar (Thread[Task worker for 
':',5,main]) started.

> Task :beam-sdks-java-maven-archetypes-starter:testSourcesJar
file or directory 
'
 not found
Build cache key for task 
':beam-sdks-java-maven-archetypes-starter:testSourcesJar' is 
58715d6b8e221cace68f230ccfd69fd4
Caching disabled for task 
':beam-sdks-java-maven-archetypes-starter:testSourcesJar': Caching has not been 
enabled for the task
Task ':beam-sdks-java-maven-archetypes-starter:testSourcesJar' is not 
up-to-date because:
  No history is available.
file or directory 
'
 not found
:beam-sdks-java-maven-archetypes-starter:testSourcesJar (Thread[Task worker for 
':',5,main]) completed. Took 0.003 secs.
:beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication (Thread[Task 
worker for ':',5,main]) started.

> Task :beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication
Build cache key for task 
':beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication' is 
e88836a5bca732f78522d2de5a70d4e6
Caching disabled for task 
':beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication': Caching has 
not been enabled for the task
Task ':beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication' is not 
up-to-date because:
  Task.upToDateWhen is false.
:beam-sdks-java-nexmark:generatePomFileForMavenJavaPublication (Thread[Task 
worker for ':',5,main]) completed. Took 0.008 secs.