Re: compileJava broken on master see: BEAM-6495

2019-02-07 Thread Ryan Williams
After your last message, Alex, I saw the issue on a branch with minimal
unrelated changes on top of b4b5495307

(January
28).

I ran `./gradlew clean` for the first time in a while at that time, and
worked happily for 11 days, but just saw the issue again on a branch
rebased on top of 381ab55b59
 (today):

> Task :beam-sdks-java-extensions-sql:compileJava FAILED
…/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java:29:
error: cannot find symbol
import
org.apache.beam.sdk.extensions.sql.impl.parser.impl.BeamSqlParserImpl;
  ^
  symbol:   class BeamSqlParserImpl
  location: package org.apache.beam.sdk.extensions.sql.impl.parser.impl
1 error

Every time I've seen it, I immediately re-run the compile, and it succeeds.

Perhaps something is still wrong with my environment, but otherwise it
would seem that something is still flaky here.

I'm compiling on an 8-core macOS machine, fwiw, and usually running
`./gradlew compileTestJava` which compiles many projects concurrently.

On Mon, Jan 28, 2019 at 4:12 PM Alex Amato  wrote:

> If it continues to occur, maybe it is an environmental issue, be sure to
> try to clean as well.
> ./gradlew clean
>
> On Mon, Jan 28, 2019 at 11:12 AM Ryan Williams 
> wrote:
>
>> Yea I was rebased on top of a more recent master than your previous
>> message, when I saw it again. Perhaps I was mistaken. I'll ping here if I
>> see it again, thanks.
>>
>> On Mon, Jan 28, 2019 at 1:52 PM Alex Amato  wrote:
>>
>>> After I did a rebase, it went away for me. So I think that this should
>>> work. Are you saying that you did rebase ontop of master and it still
>>> occurred? Strange.
>>>
>>> On Sat, Jan 26, 2019 at 3:48 PM Ryan Williams 
>>> wrote:
>>>
 Hm, I just encountered this again on a branch that based on 5b46b02b49
 (top of trunk from this afternoon). Is it definitely supposed to be fixed?


 On Thu, Jan 24, 2019 at 9:19 PM Alex Amato  wrote:

> Please try rebasing from master, I believe this issue has been
> resolved.
>
> On Thu, Jan 24, 2019 at 3:29 PM Ryan Williams 
> wrote:
>
>> I'm seeing every ≈third `./gradlew compileJava` fail locally due to
>> this; re-running the commit has always succeeded, so far.
>>
>> Sounds like there is not an immediate fix in the works / no one
>> assigned on the JIRA?
>>
>> On Wed, Jan 23, 2019 at 3:17 PM Kenneth Knowles 
>> wrote:
>>
>>> This might connect to vendoring Calcite. It will be easiest, and
>>> have the best incremental build, if we separate the generated code into 
>>> its
>>> own module that has relocation to match the vendored Calcite.
>>>
>>> Kenn
>>>
>>> On Wed, Jan 23, 2019 at 11:29 AM Anton Kedin 
>>> wrote:
>>>
 We don't pre-generate the code as a separate step. Code gen from
 the SQL parser syntax spec and its compilation happens both during the 
 Beam
 SQL build task. Splitting the code generation and compilation might 
 not be
 trivial. We definitely should look into fixing this though.

 Regards,
 Anton

 On Wed, Jan 23, 2019 at 11:13 AM Alex Amato 
 wrote:

> Okay, make sense perhaps we can somehow make it fail when it fails
> to generate the dep, rather than when compiling the java code later on
>
> On Wed, Jan 23, 2019 at 11:12 AM Anton Kedin 
> wrote:
>
>> ParserImpl is autogenerated by Calcite at build time. It seems
>> that there's a race condition there and it sometimes fails. 
>> Rerunning the
>> build works for me.
>>
>> Regards,
>> Anton
>>
>> On Wed, Jan 23, 2019, 11:06 AM Alex Amato 
>> wrote:
>>
>>> https://jira.apache.org/jira/browse/BEAM-6495?filter=-2
>>>
>>> Any ideas, how this got through the precommit?
>>>
>>> > Task :beam-sdks-java-extensions-sql:compileJava FAILED
>>>
>>> /usr/local/google/home/ajamato/go/src/
>>> github.com/apache/beam/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/JdbcFactory.java:29:
>>> error: cannot find symbol
>>>
>>> import
>>> org.apache.beam.sdk.extensions.sql.impl.parser.impl.BeamSqlParserImpl;
>>>
>>>   ^
>>>
>>>   symbol:   class BeamSqlParserImpl
>>>
>>>   location: package
>>> org.apache.beam.sdk.extensions.sql.impl.parser.impl
>>>
>>> 1 error
>>>
>>>


Re: [PROPOSAL] Prepare Beam 2.11.0 release

2019-02-07 Thread Andrew Pilloud
+1 let's keep things going. Thanks for volunteering!

Andrew

On Thu, Feb 7, 2019, 4:49 PM Kenneth Knowles  wrote:

> I think this is a good idea. Even though 2.10.0 is still making it's way
> out, there's still 6 weeks of changes between the cut dates, lots of value.
> It is good to keep the rhythm and follow the calendar.
>
> Kenn
>
> On Thu, Feb 7, 2019, 15:52 Ahmet Altay 
>> Hi all,
>>
>> Beam 2.11 release branch cut date is 2/13 according to the release
>> calendar [1]. I would like to volunteer myself to do this release. I intend
>> to cut the branch on the planned 2/13 date.
>>
>> If you have releasing blocking issues for 2.11 please mark their "Fix
>> Version" as 2.11.0. (I also created a 2.12.0 release in JIRA in case you
>> would like to move any non-blocking issues to that version.)
>>
>> What do you think?
>>
>> Ahmet
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles
>>
>


Re: [PROPOSAL] Prepare Beam 2.11.0 release

2019-02-07 Thread Kenneth Knowles
I think this is a good idea. Even though 2.10.0 is still making it's way
out, there's still 6 weeks of changes between the cut dates, lots of value.
It is good to keep the rhythm and follow the calendar.

Kenn

On Thu, Feb 7, 2019, 15:52 Ahmet Altay  Hi all,
>
> Beam 2.11 release branch cut date is 2/13 according to the release
> calendar [1]. I would like to volunteer myself to do this release. I intend
> to cut the branch on the planned 2/13 date.
>
> If you have releasing blocking issues for 2.11 please mark their "Fix
> Version" as 2.11.0. (I also created a 2.12.0 release in JIRA in case you
> would like to move any non-blocking issues to that version.)
>
> What do you think?
>
> Ahmet
>
> [1]
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles
>


Re: Another another new contributor! :)

2019-02-07 Thread Joana Filipa Bernardo Carrasqueira
Welcome to the community Kyle!


On Thu, Feb 7, 2019 at 1:22 AM Alexey Romanenko 
wrote:

> Welcome, Kyle!
> Great to have more people on SparkRunner side!
>
> Btw, feel free to join “beam-spark” channel on Slack.
>
> Alexey
>
> On 7 Feb 2019, at 08:44, Reza Ardeshir Rokni  wrote:
>
> Welcome!
>
> On Tue, 5 Feb 2019 at 23:34, Kenneth Knowles  wrote:
>
>> Welcome Kyle!
>>
>> On Tue, Feb 5, 2019 at 4:34 AM Maximilian Michels  wrote:
>>
>>> Welcome Kyle! Excited to see the Spark Runner moving towards portability!
>>>
>>> On 05.02.19 01:14, Connell O'Callaghan wrote:
>>> > Welcome Kyle!
>>> >
>>> > On Mon, Feb 4, 2019 at 3:18 PM Ahmet Altay >> > > wrote:
>>> >
>>> > Welcome!
>>> >
>>> > On Mon, Feb 4, 2019 at 3:13 PM Rui Wang >> > > wrote:
>>> >
>>> > Welcome!
>>> >
>>> > -Rui
>>> >
>>> > On Mon, Feb 4, 2019 at 2:50 PM Kyle Weaver <
>>> kcwea...@google.com
>>> > > wrote:
>>> >
>>> > Hello Beam developers,
>>> >
>>> > My name is Kyle Weaver (alias "ibzib" on Github/Slack).
>>> Like
>>> > Brian, I recently switched roles at Google (I previously
>>> > worked on Prow, Kubernetes' CI system). My goal in the
>>> > coming weeks is to help begin implementing portability
>>> > support for the Spark runner. I look forward to
>>> > collaborating with all of you!
>>> >
>>> > Kyle
>>> >
>>> > Kyle Weaver |  Software Engineer |
>>> kcwea...@google.com
>>> >  | +1650203
>>> >
>>> >
>>>
>>
>

-- 

*Joana Carrasqueira*

Cloud Developer Relations Events Manager

415-602-2507 Mobile

1160 N Mathilda Ave, Sunnyvale, CA 94089


JIRA priorities explaination

2019-02-07 Thread Alex Amato
Hello Beam community, I was thinking about this and found some information
to share/discuss. Would it be possible to confirm my thinking on this:

   - There are 5 priorities in the JIRA system today (tooltip link
   

   ):
   -
  - *Blocker* Blocks development and/or testing work, production could
  not run
  - *Critical* Crashes, loss of data, severe memory leak.
  - *Major* Major loss of function.
  - *Minor* Minor loss of function, or other problem where easy
  workaround is present.
  - *Trivial* Cosmetic problem like misspelt words or misaligned text.
   - How should JIRA issues be prioritized for pre/post commit test
   failures?
  - I think *Blocker*
   - What about the flakey failures?
  - *Blocker* as well?
   - How should non test issues be prioritized? (E.g. feature to implement
   or bugs not regularly breaking tests).
  - I suggest *Minor*, but its not clear how to distinguish between
  these.

Below is my thinking: But I wanted to know what the Apache/Beam community
generally thinks about these priorities.

   - *Blocker*: Expect to be paged. Production systems are down.
   - *Critical*: Expect to be contacted by email or a bot to fix this.
   - *Major*: Some loss of function in the repository, can issues that need
   to be addressed soon are here.
   - *Minor*: Most issues will be here, important issues within this will
   get picked up and completed. FRs, bugs.
   - *Trivial*: Unlikely to be implemented, far too many issues in this
   category. FRs, bugs.

Thanks for helping to clear this up
Alex


[PROPOSAL] Prepare Beam 2.11.0 release

2019-02-07 Thread Ahmet Altay
Hi all,

Beam 2.11 release branch cut date is 2/13 according to the release calendar
[1]. I would like to volunteer myself to do this release. I intend to cut
the branch on the planned 2/13 date.

If you have releasing blocking issues for 2.11 please mark their "Fix
Version" as 2.11.0. (I also created a 2.12.0 release in JIRA in case you
would like to move any non-blocking issues to that version.)

What do you think?

Ahmet

[1]
https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles


Re: Jenkins slowness

2019-02-07 Thread Michael Luckey
Migration to gradle 5 should not be an issue. Unless something unforeseen 
happens as we did not run all possible task yet. 

Should only be a merge of an pending + one additional PR upgrading the versions.

> On 7. Feb 2019, at 21:23, Udi Meiri  wrote:
> 
> I suggest disabling Jacoco and re-enabling the build cache until we can 
> migrate to Gradle 5. I imagine the migration to v5 is not a simple change.
> Meanwhile, I can't run postcommits on PRs on Jenkins (run seed job + run 
> postcommit).
> 
> On Thu, Feb 7, 2019, 12:00 Chamikara Jayalath   wrote:
> Seems like there was a spike for all build times yesterday probably added up 
> to give slow Jenkins scheduling times for triggers. Also, seems like we had 
> three spikes that are about a week apart recently.
> 
> 
> On Thu, Feb 7, 2019 at 11:46 AM Michael Luckey  > wrote:
> What might have some influence is the implicit disabling of the build cache 
> by activating Jacoco report. There seems to be a increase of 
> beam_PreCommit_Java_Cron with 
> https://builds.apache.org/job/beam_PreCommit_Java_Cron/914/ 
>  and looking 
> into cacheable task there seems to be lots of work done now which previously 
> was cacheable.
> 
> Not sure, whether this is the culprit- or part of it -, but I d suggest to 
> upgrade to gradle 5 pretty fast.
> 
> On Thu, Feb 7, 2019 at 8:18 PM Udi Meiri  > wrote:
> If anyone has done any investigation/is working on this please share.
> 
> I'm investigating Jenkins slowness. I've noticed it happening since 
> yesterday: precommits taking 3 hours to start, phrase commands similarly 
> taking as much time to register.
> 
> My current theory is that we have a job that's are taking much longer than 
> usual to run.



Re: Jenkins slowness

2019-02-07 Thread Yifan Zou
I saw many non-beam jobs are running on our nodes. It probably be one of
the reasons which caused long waiting time.
e.g. https://builds.apache.org/computer/beam13/builds

On Thu, Feb 7, 2019 at 12:24 PM Udi Meiri  wrote:

> There is also excessive python test logging tracked here:
> https://issues.apache.org/jira/browse/BEAM-6603
>
> On Thu, Feb 7, 2019, 12:23 Udi Meiri 
>> I suggest disabling Jacoco and re-enabling the build cache until we can
>> migrate to Gradle 5. I imagine the migration to v5 is not a simple change.
>> Meanwhile, I can't run postcommits on PRs on Jenkins (run seed job + run
>> postcommit).
>>
>> On Thu, Feb 7, 2019, 12:00 Chamikara Jayalath > wrote:
>>
>>> Seems like there was a spike for all build times yesterday probably
>>> added up to give slow Jenkins scheduling times for triggers. Also, seems
>>> like we had three spikes that are about a week apart recently.
>>>
>>>
>>> On Thu, Feb 7, 2019 at 11:46 AM Michael Luckey 
>>> wrote:
>>>
 What might have some influence is the implicit disabling of the build
 cache by activating Jacoco report. There seems to be a increase of
 beam_PreCommit_Java_Cron with
 https://builds.apache.org/job/beam_PreCommit_Java_Cron/914/ and
 looking into cacheable task there seems to be lots of work done now which
 previously was cacheable.

 Not sure, whether this is the culprit- or part of it -, but I d suggest
 to upgrade to gradle 5 pretty fast.

 On Thu, Feb 7, 2019 at 8:18 PM Udi Meiri  wrote:

> If anyone has done any investigation/is working on this please share.
>
> I'm investigating Jenkins slowness. I've noticed it happening since
> yesterday: precommits taking 3 hours to start, phrase commands similarly
> taking as much time to register.
>
> My current theory is that we have a job that's are taking much longer
> than usual to run.
>



Re: Jenkins slowness

2019-02-07 Thread Udi Meiri
I suggest disabling Jacoco and re-enabling the build cache until we can
migrate to Gradle 5. I imagine the migration to v5 is not a simple change.
Meanwhile, I can't run postcommits on PRs on Jenkins (run seed job + run
postcommit).

On Thu, Feb 7, 2019, 12:00 Chamikara Jayalath  Seems like there was a spike for all build times yesterday probably added
> up to give slow Jenkins scheduling times for triggers. Also, seems like we
> had three spikes that are about a week apart recently.
>
>
> On Thu, Feb 7, 2019 at 11:46 AM Michael Luckey 
> wrote:
>
>> What might have some influence is the implicit disabling of the build
>> cache by activating Jacoco report. There seems to be a increase of
>> beam_PreCommit_Java_Cron with
>> https://builds.apache.org/job/beam_PreCommit_Java_Cron/914/ and looking
>> into cacheable task there seems to be lots of work done now which
>> previously was cacheable.
>>
>> Not sure, whether this is the culprit- or part of it -, but I d suggest
>> to upgrade to gradle 5 pretty fast.
>>
>> On Thu, Feb 7, 2019 at 8:18 PM Udi Meiri  wrote:
>>
>>> If anyone has done any investigation/is working on this please share.
>>>
>>> I'm investigating Jenkins slowness. I've noticed it happening since
>>> yesterday: precommits taking 3 hours to start, phrase commands similarly
>>> taking as much time to register.
>>>
>>> My current theory is that we have a job that's are taking much longer
>>> than usual to run.
>>>
>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jenkins slowness

2019-02-07 Thread Udi Meiri
There is also excessive python test logging tracked here:
https://issues.apache.org/jira/browse/BEAM-6603

On Thu, Feb 7, 2019, 12:23 Udi Meiri  I suggest disabling Jacoco and re-enabling the build cache until we can
> migrate to Gradle 5. I imagine the migration to v5 is not a simple change.
> Meanwhile, I can't run postcommits on PRs on Jenkins (run seed job + run
> postcommit).
>
> On Thu, Feb 7, 2019, 12:00 Chamikara Jayalath 
>> Seems like there was a spike for all build times yesterday probably added
>> up to give slow Jenkins scheduling times for triggers. Also, seems like we
>> had three spikes that are about a week apart recently.
>>
>>
>> On Thu, Feb 7, 2019 at 11:46 AM Michael Luckey 
>> wrote:
>>
>>> What might have some influence is the implicit disabling of the build
>>> cache by activating Jacoco report. There seems to be a increase of
>>> beam_PreCommit_Java_Cron with
>>> https://builds.apache.org/job/beam_PreCommit_Java_Cron/914/ and looking
>>> into cacheable task there seems to be lots of work done now which
>>> previously was cacheable.
>>>
>>> Not sure, whether this is the culprit- or part of it -, but I d suggest
>>> to upgrade to gradle 5 pretty fast.
>>>
>>> On Thu, Feb 7, 2019 at 8:18 PM Udi Meiri  wrote:
>>>
 If anyone has done any investigation/is working on this please share.

 I'm investigating Jenkins slowness. I've noticed it happening since
 yesterday: precommits taking 3 hours to start, phrase commands similarly
 taking as much time to register.

 My current theory is that we have a job that's are taking much longer
 than usual to run.

>>>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jenkins slowness

2019-02-07 Thread Chamikara Jayalath
Seems like there was a spike for all build times yesterday probably added
up to give slow Jenkins scheduling times for triggers. Also, seems like we
had three spikes that are about a week apart recently.


On Thu, Feb 7, 2019 at 11:46 AM Michael Luckey  wrote:

> What might have some influence is the implicit disabling of the build
> cache by activating Jacoco report. There seems to be a increase of
> beam_PreCommit_Java_Cron with
> https://builds.apache.org/job/beam_PreCommit_Java_Cron/914/ and looking
> into cacheable task there seems to be lots of work done now which
> previously was cacheable.
>
> Not sure, whether this is the culprit- or part of it -, but I d suggest to
> upgrade to gradle 5 pretty fast.
>
> On Thu, Feb 7, 2019 at 8:18 PM Udi Meiri  wrote:
>
>> If anyone has done any investigation/is working on this please share.
>>
>> I'm investigating Jenkins slowness. I've noticed it happening since
>> yesterday: precommits taking 3 hours to start, phrase commands similarly
>> taking as much time to register.
>>
>> My current theory is that we have a job that's are taking much longer
>> than usual to run.
>>
>


Re: [BEAM-6594] Flakey GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients - failing in precommit

2019-02-07 Thread Brian Hulette
This was already reported in BEAM-6512 [1], which Scott gave me as a
starter bug. I haven't been able to reproduce locally, so I'm trying to see
if I can get it to fail on Jenkins again with some additional logging [2].

Definitely interested in other's thoughts on this, I only vaguely
understand what's going on. So far the only headway I've made is noticing
that the "CANCELLED: Multiplexer hanging up" error seems to always occur
exactly three times in failing tests. Successful runs may have one or two
of these messages but never three.

[1] https://issues.apache.org/jira/browse/BEAM-6512
[2] https://github.com/apache/beam/pull/7767

On Tue, Feb 5, 2019 at 9:50 AM Alex Amato  wrote:

>
> org.apache.beam.runners.fnexecution.data.GrpcDataServiceTest.testMessageReceivedBySingleClientWhenThereAreMultipleClients
>
> I keep seeing this test failing in my PRs
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/
>
>
> https://builds.apache.org/job/beam_PreCommit_Java_Commit/4018/testReport/junit/org.apache.beam.runners.fnexecution.data/GrpcDataServiceTest/testMessageReceivedBySingleClientWhenThereAreMultipleClients/
>
>
> I've seen this one come and go for a few weeks or so. I am unsure exactly
> when it first occured.
>


Re: Jenkins slowness

2019-02-07 Thread Michael Luckey
What might have some influence is the implicit disabling of the build cache
by activating Jacoco report. There seems to be a increase of
beam_PreCommit_Java_Cron with
https://builds.apache.org/job/beam_PreCommit_Java_Cron/914/ and looking
into cacheable task there seems to be lots of work done now which
previously was cacheable.

Not sure, whether this is the culprit- or part of it -, but I d suggest to
upgrade to gradle 5 pretty fast.

On Thu, Feb 7, 2019 at 8:18 PM Udi Meiri  wrote:

> If anyone has done any investigation/is working on this please share.
>
> I'm investigating Jenkins slowness. I've noticed it happening since
> yesterday: precommits taking 3 hours to start, phrase commands similarly
> taking as much time to register.
>
> My current theory is that we have a job that's are taking much longer than
> usual to run.
>


Jenkins slowness

2019-02-07 Thread Udi Meiri
If anyone has done any investigation/is working on this please share.

I'm investigating Jenkins slowness. I've noticed it happening since
yesterday: precommits taking 3 hours to start, phrase commands similarly
taking as much time to register.

My current theory is that we have a job that's are taking much longer than
usual to run.


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Jenkins slowness

2019-02-07 Thread Udi Meiri
Precommits times for Python and Java have been slowly climbing:
http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1=1546974237153=1549566237153=light

On Thu, Feb 7, 2019 at 10:54 AM Udi Meiri  wrote:

> If anyone has done any investigation/is working on this please share.
>
> I'm investigating Jenkins slowness. I've noticed it happening since
> yesterday: precommits taking 3 hours to start, phrase commands similarly
> taking as much time to register.
>
> My current theory is that we have a job that's are taking much longer than
> usual to run.
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Release 2.10.0, release candidate #3

2019-02-07 Thread Scott Wegner
+1

I validated running:
* Java Quickstart (Direct)
* Java Quickstart (Apex local)
* Java Quickstart (Flink local)
* Java Quickstart (Spark local)
* Java Quickstart (Dataflow)
* Java Mobile Game (Dataflow)

On Wed, Feb 6, 2019 at 2:28 PM Kenneth Knowles  wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #3 for the version 2.10.0,
> as follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2],
> which is signed with the key with fingerprint 6ED551A8AE02461C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.10.0-RC3" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.10.0 release to help with validation
> [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Kenn
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
> [2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1058/
> [5] https://github.com/apache/beam/tree/v2.10.0-RC3
> [6] https://github.com/apache/beam/pull/7651/files
> [7] https://github.com/apache/beam-site/pull/586
> [8]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>


-- 




Got feedback? tinyurl.com/swegner-feedback


Re: [VOTE] Release 2.10.0, release candidate #3

2019-02-07 Thread Alexey Romanenko
+1 

I tested briefly new HadoopFormatIO (read/write), deprecated 
HadoopInputFormatIO (read) and new KafkaIO feature (write into multiple topics)

> On 7 Feb 2019, at 12:01, Maximilian Michels  wrote:
> 
> +1 (binding)
> 
> On 06.02.19 23:47, Reuven Lax wrote:
>> +1 (binding)
>> On Wed, Feb 6, 2019 at 2:28 PM Kenneth Knowles > > wrote:
>>Hi everyone,
>>Please review and vote on the release candidate #3 for the version
>>2.10.0, as follows:
>>[ ] +1, Approve the release
>>[ ] -1, Do not approve the release (please provide specific comments)
>>The complete staging area is available for your review, which includes:
>>* JIRA release notes [1],
>>* the official Apache source release to be deployed to
>>dist.apache.org  [2], which is signed with
>>the key with fingerprint 6ED551A8AE02461C [3],
>>* all artifacts to be deployed to the Maven Central Repository [4],
>>* source code tag "v2.10.0-RC3" [5],
>>* website pull request listing the release [6] and publishing the
>>API reference manual [7].
>>* Python artifacts are deployed along with the source release to the
>>dist.apache.org  [2].
>>* Validation sheet with a tab for 2.10.0 release to help with
>>validation [7].
>>The vote will be open for at least 72 hours. It is adopted by
>>majority approval, with at least 3 PMC affirmative votes.
>>Thanks,
>>Kenn
>>[1]
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
>>[2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/
>>
>>[3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>
>>[4]
>>https://repository.apache.org/content/repositories/orgapachebeam-1058/
>>[5] https://github.com/apache/beam/tree/v2.10.0-RC3
>>[6] https://github.com/apache/beam/pull/7651/files
>>[7] https://github.com/apache/beam-site/pull/586
>>[8]
>>
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529



Re: pipeline steps

2019-02-07 Thread Yi Pan
Shouldn't this apply to more generic scenario for any BeamIO? For example,
I am using KafkaIO and wanted to get the topic and partition from which the
message was received. Some IOContext associated with each data unit from
BeamIO may be useful here?

-Yi

On Thu, Feb 7, 2019 at 6:29 AM Kenneth Knowles  wrote:

> This comes up a lot, wanting file names alongside the data that came from
> the file. It is a historical quirk that none of our connectors used to have
> the file names. What is the change needed for FileIO + parse Avro to be
> really easy to use?
>
> Kenn
>
> On Thu, Feb 7, 2019 at 6:18 AM Jeff Klukas  wrote:
>
>> I haven't needed to do this with Beam before, but I've definitely had
>> similar needs in the past. Spark, for example, provides an input_file_name
>> function that can be applied to a dataframe to add the input file as an
>> additional column. It's not clear to me how that's implemented, though.
>>
>> Perhaps others have suggestions, but I'm not aware of a way to do this
>> conveniently in Beam today. To my knowledge, today you would have to use
>> FileIO.match() and FileIO.readMatches() to get a collection of
>> ReadableFile. You'd then have to FlatMapElements to pull out the metadata
>> and the bytes of the file, and you'd be responsible for parsing those bytes
>> into avro records. You'd  be able to output something like a KV
>> that groups the file name together with the parsed avro record.
>>
>> Seems like something worth providing better support for in Beam itself if
>> this indeed doesn't already exist.
>>
>> On Thu, Feb 7, 2019 at 7:29 AM Chaim Turkel  wrote:
>>
>>> Hi,
>>>   I am working on a pipeline that listens to a topic on pubsub to get
>>> files that have changes in the storage. Then i read avro files, and
>>> would like to write them to bigquery based on the file name (to
>>> different tables).
>>>   My problem is that the transformer that reads the avro does not give
>>> me back the files name (like a tuple or something like that). I seem
>>> to have this pattern come back a lot.
>>> Can you think of any solutions?
>>>
>>> Chaim
>>>
>>> --
>>>
>>>
>>> Loans are funded by
>>> FinWise Bank, a Utah-chartered bank located in Sandy,
>>> Utah, member FDIC, Equal
>>> Opportunity Lender. Merchant Cash Advances are
>>> made by Behalf. For more
>>> information on ECOA, click here
>>> . For important information about
>>> opening a new
>>> account, review Patriot Act procedures here
>>> .
>>> Visit Legal
>>>  to
>>> review our comprehensive program terms,
>>> conditions, and disclosures.
>>>
>>


Re: pipeline steps

2019-02-07 Thread Kenneth Knowles
This comes up a lot, wanting file names alongside the data that came from
the file. It is a historical quirk that none of our connectors used to have
the file names. What is the change needed for FileIO + parse Avro to be
really easy to use?

Kenn

On Thu, Feb 7, 2019 at 6:18 AM Jeff Klukas  wrote:

> I haven't needed to do this with Beam before, but I've definitely had
> similar needs in the past. Spark, for example, provides an input_file_name
> function that can be applied to a dataframe to add the input file as an
> additional column. It's not clear to me how that's implemented, though.
>
> Perhaps others have suggestions, but I'm not aware of a way to do this
> conveniently in Beam today. To my knowledge, today you would have to use
> FileIO.match() and FileIO.readMatches() to get a collection of
> ReadableFile. You'd then have to FlatMapElements to pull out the metadata
> and the bytes of the file, and you'd be responsible for parsing those bytes
> into avro records. You'd  be able to output something like a KV
> that groups the file name together with the parsed avro record.
>
> Seems like something worth providing better support for in Beam itself if
> this indeed doesn't already exist.
>
> On Thu, Feb 7, 2019 at 7:29 AM Chaim Turkel  wrote:
>
>> Hi,
>>   I am working on a pipeline that listens to a topic on pubsub to get
>> files that have changes in the storage. Then i read avro files, and
>> would like to write them to bigquery based on the file name (to
>> different tables).
>>   My problem is that the transformer that reads the avro does not give
>> me back the files name (like a tuple or something like that). I seem
>> to have this pattern come back a lot.
>> Can you think of any solutions?
>>
>> Chaim
>>
>> --
>>
>>
>> Loans are funded by
>> FinWise Bank, a Utah-chartered bank located in Sandy,
>> Utah, member FDIC, Equal
>> Opportunity Lender. Merchant Cash Advances are
>> made by Behalf. For more
>> information on ECOA, click here
>> . For important information about
>> opening a new
>> account, review Patriot Act procedures here
>> .
>> Visit Legal
>>  to
>> review our comprehensive program terms,
>> conditions, and disclosures.
>>
>


Re: Installing Apache Beam locally

2019-02-07 Thread Kenneth Knowles
It is a bit of a funky invocation, and our config makes it a little
stranger yet. I think you want

./gradlew -Ppublishing
-PnoSigning publishMavenJavaPublicationToMavenLocal

-Ppublishing is in our config to enable/disable the publishing plugin
-PnoSigning turns off signing which is on by default but you probably don't
want
The rest is how the publishing plugin does its thing, a concatenation of:
publish +  + Publication To + 

I didn't actually realize the `gradle install` did anything. But I think it
is worth making it so `gradle install` does exactly the above.

Kenn

On Thu, Feb 7, 2019 at 6:06 AM Mike Pedersen  wrote:

> Hi all. I have made some changes to Beam's Java SDK locally and would like
> to install that version into my local repository. Some of the modules'
> install tasks fail, so I disabled those, eventually leading me to a install
> command like this:
>
> ./gradlew install \
> -x :beam-sdks-java-maven-archetypes-examples:install \
> -x :beam-runners-reference-job-server:install \
> -x :beam-runners-flink-1.6-job-server:install \
> -x :beam-runners-flink-1.7-job-server:install \
> -x :beam-runners-flink_2.11-job-server:install
>
> This command completes without error, but seems to only install unshaded
> dependencies unlike the shaded jars in the maven repository:
>
> $ ls ~/.m2/repository/org/apache/beam/beam-sdks-java-core/2.9.0-SNAPSHOT/
> beam-sdks-java-core-2.9.0-SNAPSHOT-tests-unshaded.jar
> beam-sdks-java-core-2.9.0-SNAPSHOT-unshaded.jar
> beam-sdks-java-core-2.9.0-SNAPSHOT.pom
> maven-metadata-local.xml
>
> Whereas the ones from the maven repository is like this instead (2.8.0,
> but I assume it should be the same for 2.9.0):
>
> $ ls ~/.m2/repository/org/apache/beam/beam-sdks-java-core/2.8.0/
> _remote.repositories
> beam-sdks-java-core-2.8.0.jar
> beam-sdks-java-core-2.8.0.jar.sha1
> beam-sdks-java-core-2.8.0.pom
> beam-sdks-java-core-2.8.0.pom.sha1
>
> How do I get gradle to install the shaded jars like the ones found in the
> Beam's maven repository?
>
> Thanks in advance,
> Mike
>


Re: pipeline steps

2019-02-07 Thread Jeff Klukas
I haven't needed to do this with Beam before, but I've definitely had
similar needs in the past. Spark, for example, provides an input_file_name
function that can be applied to a dataframe to add the input file as an
additional column. It's not clear to me how that's implemented, though.

Perhaps others have suggestions, but I'm not aware of a way to do this
conveniently in Beam today. To my knowledge, today you would have to use
FileIO.match() and FileIO.readMatches() to get a collection of
ReadableFile. You'd then have to FlatMapElements to pull out the metadata
and the bytes of the file, and you'd be responsible for parsing those bytes
into avro records. You'd  be able to output something like a KV
that groups the file name together with the parsed avro record.

Seems like something worth providing better support for in Beam itself if
this indeed doesn't already exist.

On Thu, Feb 7, 2019 at 7:29 AM Chaim Turkel  wrote:

> Hi,
>   I am working on a pipeline that listens to a topic on pubsub to get
> files that have changes in the storage. Then i read avro files, and
> would like to write them to bigquery based on the file name (to
> different tables).
>   My problem is that the transformer that reads the avro does not give
> me back the files name (like a tuple or something like that). I seem
> to have this pattern come back a lot.
> Can you think of any solutions?
>
> Chaim
>
> --
>
>
> Loans are funded by
> FinWise Bank, a Utah-chartered bank located in Sandy,
> Utah, member FDIC, Equal
> Opportunity Lender. Merchant Cash Advances are
> made by Behalf. For more
> information on ECOA, click here
> . For important information about
> opening a new
> account, review Patriot Act procedures here
> .
> Visit Legal
>  to
> review our comprehensive program terms,
> conditions, and disclosures.
>


pipeline steps

2019-02-07 Thread Chaim Turkel
Hi,
  I am working on a pipeline that listens to a topic on pubsub to get
files that have changes in the storage. Then i read avro files, and
would like to write them to bigquery based on the file name (to
different tables).
  My problem is that the transformer that reads the avro does not give
me back the files name (like a tuple or something like that). I seem
to have this pattern come back a lot.
Can you think of any solutions?

Chaim

-- 


Loans are funded by
FinWise Bank, a Utah-chartered bank located in Sandy, 
Utah, member FDIC, Equal
Opportunity Lender. Merchant Cash Advances are 
made by Behalf. For more
information on ECOA, click here 
. For important information about 
opening a new
account, review Patriot Act procedures here 
.
Visit Legal 
 to
review our comprehensive program terms, 
conditions, and disclosures. 


Re: [VOTE] Release 2.10.0, release candidate #3

2019-02-07 Thread Maximilian Michels

+1 (binding)

On 06.02.19 23:47, Reuven Lax wrote:

+1 (binding)

On Wed, Feb 6, 2019 at 2:28 PM Kenneth Knowles > wrote:


Hi everyone,

Please review and vote on the release candidate #3 for the version
2.10.0, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to
dist.apache.org  [2], which is signed with
the key with fingerprint 6ED551A8AE02461C [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.10.0-RC3" [5],
* website pull request listing the release [6] and publishing the
API reference manual [7].
* Python artifacts are deployed along with the source release to the
dist.apache.org  [2].
* Validation sheet with a tab for 2.10.0 release to help with
validation [7].

The vote will be open for at least 72 hours. It is adopted by
majority approval, with at least 3 PMC affirmative votes.

Thanks,
Kenn

[1]

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344540
[2] https://dist.apache.org/repos/dist/dev/beam/2.10.0/

[3] https://dist.apache.org/repos/dist/release/beam/KEYS

[4]
https://repository.apache.org/content/repositories/orgapachebeam-1058/
[5] https://github.com/apache/beam/tree/v2.10.0-RC3
[6] https://github.com/apache/beam/pull/7651/files
[7] https://github.com/apache/beam-site/pull/586
[8]

https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529



Re: Another another new contributor! :)

2019-02-07 Thread Alexey Romanenko
Welcome, Kyle! 
Great to have more people on SparkRunner side!

Btw, feel free to join “beam-spark” channel on Slack.

Alexey

> On 7 Feb 2019, at 08:44, Reza Ardeshir Rokni  wrote:
> 
> Welcome!
> 
> On Tue, 5 Feb 2019 at 23:34, Kenneth Knowles  > wrote:
> Welcome Kyle!
> 
> On Tue, Feb 5, 2019 at 4:34 AM Maximilian Michels  > wrote:
> Welcome Kyle! Excited to see the Spark Runner moving towards portability!
> 
> On 05.02.19 01:14, Connell O'Callaghan wrote:
> > Welcome Kyle!
> > 
> > On Mon, Feb 4, 2019 at 3:18 PM Ahmet Altay  >  
> > >> wrote:
> > 
> > Welcome!
> > 
> > On Mon, Feb 4, 2019 at 3:13 PM Rui Wang  > 
> > >> wrote:
> > 
> > Welcome!
> > 
> > -Rui
> > 
> > On Mon, Feb 4, 2019 at 2:50 PM Kyle Weaver  > 
> > >> wrote:
> > 
> > Hello Beam developers,
> > 
> > My name is Kyle Weaver (alias "ibzib" on Github/Slack). Like
> > Brian, I recently switched roles at Google (I previously
> > worked on Prow, Kubernetes' CI system). My goal in the
> > coming weeks is to help begin implementing portability
> > support for the Spark runner. I look forward to
> > collaborating with all of you!
> > 
> > Kyle
> > 
> > Kyle Weaver |  Software Engineer |kcwea...@google.com 
> > 
> > > | 
> > +1650203
> > 
> > 



Re: Another another new contributor! :)

2019-02-07 Thread Etienne Chauchot
Hi,
Help much appreciated !And welcome !
Etienne
Le jeudi 07 février 2019 à 15:44 +0800, Reza Ardeshir Rokni a écrit :
> Welcome!
> On Tue, 5 Feb 2019 at 23:34, Kenneth Knowles  wrote:
> > Welcome Kyle!
> > On Tue, Feb 5, 2019 at 4:34 AM Maximilian Michels  wrote:
> > > Welcome Kyle! Excited to see the Spark Runner moving towards portability!
> > > 
> > > 
> > > 
> > > On 05.02.19 01:14, Connell O'Callaghan wrote:
> > > 
> > > > Welcome Kyle!
> > > 
> > > > 
> > > 
> > > > On Mon, Feb 4, 2019 at 3:18 PM Ahmet Altay  > > 
> > > > > wrote:
> > > 
> > > > 
> > > 
> > > > Welcome!
> > > 
> > > > 
> > > 
> > > > On Mon, Feb 4, 2019 at 3:13 PM Rui Wang  > > 
> > > > > wrote:
> > > 
> > > > 
> > > 
> > > > Welcome!
> > > 
> > > > 
> > > 
> > > > -Rui
> > > 
> > > > 
> > > 
> > > > On Mon, Feb 4, 2019 at 2:50 PM Kyle Weaver  > > 
> > > > > wrote:
> > > 
> > > > 
> > > 
> > > > Hello Beam developers,
> > > 
> > > > 
> > > 
> > > > My name is Kyle Weaver (alias "ibzib" on Github/Slack). Like
> > > 
> > > > Brian, I recently switched roles at Google (I previously
> > > 
> > > > worked on Prow, Kubernetes' CI system). My goal in the
> > > 
> > > > coming weeks is to help begin implementing portability
> > > 
> > > > support for the Spark runner. I look forward to
> > > 
> > > > collaborating with all of you!
> > > 
> > > > 
> > > 
> > > > Kyle
> > > 
> > > > 
> > > 
> > > > Kyle Weaver |  Software Engineer |
> > > > kcwea...@google.com
> > > 
> > > >  | +1650203
> > > 
> > > > 
> > > 
> > > > 
> > >