Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-12 Thread Andrew Pilloud
+1

Turns out we broke DOUBLE on purpose. Updated the demo to use DECIMAL and
it doesn't hard fail. This is a docs bug.

On Wed, Dec 12, 2018 at 3:55 PM Scott Wegner  wrote:

> +1
>
> I verified the Java examples succeed on DirectRunner.
>
> On Wed, Dec 12, 2018 at 3:30 PM Chamikara Jayalath 
> wrote:
>
>> Thanks Andrew. Please make this a blocker and -1 the thread if you think
>> we need a new RC.
>>
>> - Cham
>>
>> On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud 
>> wrote:
>>
>>> I was just running the Beam SQL demo. I found one query fails with "the
>>> keyCoder of a GroupByKey must be deterministic" and another just hangs. I
>>> opened an issue: https://issues.apache.org/jira/browse/BEAM-6224 Not
>>> sure if this calls for canceling the release or just a release note (SQL is
>>> still experimental). I'm continuing to track down the root cause, but am
>>> tied up with the Beam Meetup in SFO today.
>>>
>>> Andrew
>>>
>>> On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang  wrote:
>>>
 +1,  Looking forward to the release!

 On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath <
 chamik...@google.com> wrote:

> Hi All,
>
> I ran Beam RC verification script [1] and updated the validation
> spreadsheet [2]. I think the current release candidate looks good.
>
> So +1 for the release.
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
> [2]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>
> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía  wrote:
>
>> Looking at the dates on the Spark runner git log there was a PR
>> merged to change Spark translation from classes to URNs. I cannot see how
>> this can impact performance. Looking at the other queries in the
>> dashboards, there seems to be a great variability in the executions of 
>> the
>> Spark runner to the point of feeling we don't have guarantees anymore. I
>> wonder if this was because of other loads shared in the server(s), or
>> because our sample is too small for the standard deviation.
>>
>> I would proceed with the release, the real question is if we can
>> somehow constraint the execution of this tests to have a more consistent
>> output.
>>
>>
>> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot 
>> wrote:
>>
>>> Hi all,
>>> Regarding query7 in spark:
>>> - there doesn't seem to be a functional regression: query passes and
>>> output size is still the same
>>>
>>> - Also the performance degradation seems to be only on spark, the
>>> other runners do not seem to suffer from it.
>>>
>>> - performance degradation seems to be constant from 11/12 so we can
>>> eliminate temporary load on the jenkins server that would generate 
>>> delays
>>> in Max transform.
>>>
>>> => query7 uses Max transform, fanout and side inputs, has one of
>>> these parts recently (11/12/18) changed in spark?
>>>
>>> Etienne
>>>
>>> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :
>>>
>>> Udi or anybody else who is familiar about Nexmark,  please -1 the
>>> vote thread if you think this particular performance regression for
>>> Spark/Direct runners is a blocker. Otherwise I think we can continue the
>>> vote.
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
>>> Are either of these regressions due to known issues ? If not should
>>> they be considered release blockers ?
>>>
>>> Thanks,
>>> Cham
>>>
>>> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
>>>
>>> For DirectRunner there are regressions in query 7 sql direct runner
>>> batch mode
>>> 
>>>  (2x)
>>> and streaming mode (5x).
>>>
>>>
>>> On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:
>>>
>>> I see a regression for query 7 spark runner batch mode
>>> 
>>>  on
>>> about 2018-11-13.
>>> [image: image.png]
>>>
>>> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #1 for the version
>>> 2.9.0, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which
>>> includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release to be deployed to
>>> 

Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-12 Thread Scott Wegner
+1

I verified the Java examples succeed on DirectRunner.

On Wed, Dec 12, 2018 at 3:30 PM Chamikara Jayalath 
wrote:

> Thanks Andrew. Please make this a blocker and -1 the thread if you think
> we need a new RC.
>
> - Cham
>
> On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud 
> wrote:
>
>> I was just running the Beam SQL demo. I found one query fails with "the
>> keyCoder of a GroupByKey must be deterministic" and another just hangs. I
>> opened an issue: https://issues.apache.org/jira/browse/BEAM-6224 Not
>> sure if this calls for canceling the release or just a release note (SQL is
>> still experimental). I'm continuing to track down the root cause, but am
>> tied up with the Beam Meetup in SFO today.
>>
>> Andrew
>>
>> On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang  wrote:
>>
>>> +1,  Looking forward to the release!
>>>
>>> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 Hi All,

 I ran Beam RC verification script [1] and updated the validation
 spreadsheet [2]. I think the current release candidate looks good.

 So +1 for the release.

 Thanks,
 Cham

 [1]
 https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
 [2]
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529

 On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía  wrote:

> Looking at the dates on the Spark runner git log there was a PR merged
> to change Spark translation from classes to URNs. I cannot see how this 
> can
> impact performance. Looking at the other queries in the dashboards, there
> seems to be a great variability in the executions of the Spark runner to
> the point of feeling we don't have guarantees anymore. I wonder if this 
> was
> because of other loads shared in the server(s), or because our sample is
> too small for the standard deviation.
>
> I would proceed with the release, the real question is if we can
> somehow constraint the execution of this tests to have a more consistent
> output.
>
>
> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot 
> wrote:
>
>> Hi all,
>> Regarding query7 in spark:
>> - there doesn't seem to be a functional regression: query passes and
>> output size is still the same
>>
>> - Also the performance degradation seems to be only on spark, the
>> other runners do not seem to suffer from it.
>>
>> - performance degradation seems to be constant from 11/12 so we can
>> eliminate temporary load on the jenkins server that would generate delays
>> in Max transform.
>>
>> => query7 uses Max transform, fanout and side inputs, has one of
>> these parts recently (11/12/18) changed in spark?
>>
>> Etienne
>>
>> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :
>>
>> Udi or anybody else who is familiar about Nexmark,  please -1 the
>> vote thread if you think this particular performance regression for
>> Spark/Direct runners is a blocker. Otherwise I think we can continue the
>> vote.
>>
>> Thanks,
>> Cham
>>
>> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>> Are either of these regressions due to known issues ? If not should
>> they be considered release blockers ?
>>
>> Thanks,
>> Cham
>>
>> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
>>
>> For DirectRunner there are regressions in query 7 sql direct runner
>> batch mode
>> 
>>  (2x)
>> and streaming mode (5x).
>>
>>
>> On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:
>>
>> I see a regression for query 7 spark runner batch mode
>> 
>>  on
>> about 2018-11-13.
>> [image: image.png]
>>
>> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #1 for the version
>> 2.9.0, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which
>> includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> EEAC70DF3D0BC23B [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.9.0-RC1" [5],
>> * website pull request listing the release [6] and publishing the API
>> reference manual [7].
>> 

Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-12 Thread Chamikara Jayalath
Thanks Andrew. Please make this a blocker and -1 the thread if you think we
need a new RC.

- Cham

On Wed, Dec 12, 2018 at 3:27 PM Andrew Pilloud  wrote:

> I was just running the Beam SQL demo. I found one query fails with "the
> keyCoder of a GroupByKey must be deterministic" and another just hangs. I
> opened an issue: https://issues.apache.org/jira/browse/BEAM-6224 Not sure
> if this calls for canceling the release or just a release note (SQL is
> still experimental). I'm continuing to track down the root cause, but am
> tied up with the Beam Meetup in SFO today.
>
> Andrew
>
> On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang  wrote:
>
>> +1,  Looking forward to the release!
>>
>> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath 
>> wrote:
>>
>>> Hi All,
>>>
>>> I ran Beam RC verification script [1] and updated the validation
>>> spreadsheet [2]. I think the current release candidate looks good.
>>>
>>> So +1 for the release.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
>>> [2]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>>
>>> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía  wrote:
>>>
 Looking at the dates on the Spark runner git log there was a PR merged
 to change Spark translation from classes to URNs. I cannot see how this can
 impact performance. Looking at the other queries in the dashboards, there
 seems to be a great variability in the executions of the Spark runner to
 the point of feeling we don't have guarantees anymore. I wonder if this was
 because of other loads shared in the server(s), or because our sample is
 too small for the standard deviation.

 I would proceed with the release, the real question is if we can
 somehow constraint the execution of this tests to have a more consistent
 output.


 On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot 
 wrote:

> Hi all,
> Regarding query7 in spark:
> - there doesn't seem to be a functional regression: query passes and
> output size is still the same
>
> - Also the performance degradation seems to be only on spark, the
> other runners do not seem to suffer from it.
>
> - performance degradation seems to be constant from 11/12 so we can
> eliminate temporary load on the jenkins server that would generate delays
> in Max transform.
>
> => query7 uses Max transform, fanout and side inputs, has one of these
> parts recently (11/12/18) changed in spark?
>
> Etienne
>
> Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :
>
> Udi or anybody else who is familiar about Nexmark,  please -1 the vote
> thread if you think this particular performance regression for 
> Spark/Direct
> runners is a blocker. Otherwise I think we can continue the vote.
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
> Are either of these regressions due to known issues ? If not should
> they be considered release blockers ?
>
> Thanks,
> Cham
>
> On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:
>
> For DirectRunner there are regressions in query 7 sql direct runner
> batch mode
> 
>  (2x)
> and streaming mode (5x).
>
>
> On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:
>
> I see a regression for query 7 spark runner batch mode
> 
>  on
> about 2018-11-13.
> [image: image.png]
>
> On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version
> 2.9.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B
>  [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.9.0-RC1" [5],
> * website pull request listing the release [6] and publishing the API
> reference manual [7].
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.9.0 release to help with
> validation [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at 

Re: [VOTE] Release 2.9.0, release candidate #1

2018-12-12 Thread Andrew Pilloud
I was just running the Beam SQL demo. I found one query fails with "the
keyCoder of a GroupByKey must be deterministic" and another just hangs. I
opened an issue: https://issues.apache.org/jira/browse/BEAM-6224 Not sure
if this calls for canceling the release or just a release note (SQL is
still experimental). I'm continuing to track down the root cause, but am
tied up with the Beam Meetup in SFO today.

Andrew

On Tue, Dec 11, 2018 at 3:32 PM Ruoyun Huang  wrote:

> +1,  Looking forward to the release!
>
> On Tue, Dec 11, 2018 at 11:09 AM Chamikara Jayalath 
> wrote:
>
>> Hi All,
>>
>> I ran Beam RC verification script [1] and updated the validation
>> spreadsheet [2]. I think the current release candidate looks good.
>>
>> So +1 for the release.
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://github.com/apache/beam/blob/master/release/src/main/scripts/run_rc_validation.sh
>> [2]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529
>>
>> On Fri, Dec 7, 2018 at 7:19 AM Ismaël Mejía  wrote:
>>
>>> Looking at the dates on the Spark runner git log there was a PR merged
>>> to change Spark translation from classes to URNs. I cannot see how this can
>>> impact performance. Looking at the other queries in the dashboards, there
>>> seems to be a great variability in the executions of the Spark runner to
>>> the point of feeling we don't have guarantees anymore. I wonder if this was
>>> because of other loads shared in the server(s), or because our sample is
>>> too small for the standard deviation.
>>>
>>> I would proceed with the release, the real question is if we can somehow
>>> constraint the execution of this tests to have a more consistent output.
>>>
>>>
>>> On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot 
>>> wrote:
>>>
 Hi all,
 Regarding query7 in spark:
 - there doesn't seem to be a functional regression: query passes and
 output size is still the same

 - Also the performance degradation seems to be only on spark, the other
 runners do not seem to suffer from it.

 - performance degradation seems to be constant from 11/12 so we can
 eliminate temporary load on the jenkins server that would generate delays
 in Max transform.

 => query7 uses Max transform, fanout and side inputs, has one of these
 parts recently (11/12/18) changed in spark?

 Etienne

 Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit :

 Udi or anybody else who is familiar about Nexmark,  please -1 the vote
 thread if you think this particular performance regression for Spark/Direct
 runners is a blocker. Otherwise I think we can continue the vote.

 Thanks,
 Cham

 On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath 
 wrote:

 Are either of these regressions due to known issues ? If not should
 they be considered release blockers ?

 Thanks,
 Cham

 On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri  wrote:

 For DirectRunner there are regressions in query 7 sql direct runner
 batch mode
 
  (2x)
 and streaming mode (5x).


 On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri  wrote:

 I see a regression for query 7 spark runner batch mode
 
  on
 about 2018-11-13.
 [image: image.png]

 On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath 
 wrote:

 Hi everyone,

 Please review and vote on the release candidate #1 for the version
 2.9.0, as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)


 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release to be deployed to dist.apache.org
 [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B
  [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "v2.9.0-RC1" [5],
 * website pull request listing the release [6] and publishing the API
 reference manual [7].
 * Python artifacts are deployed along with the source release to the
 dist.apache.org [2].
 * Validation sheet with a tab for 2.9.0 release to help with
 validation [7].

 The vote will be open for at least 72 hours. It is adopted by majority
 approval, with at least 3 PMC affirmative votes.

 Thanks,
 Cham

 [1]
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12344258
 [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/
 [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 [4]
 

Re: Issue with publishing maven artefacts locally

2018-12-12 Thread Chamikara Jayalath
Not exactly sure if this is the reason but I noticed that Ismaël's command
above result in a beam-sdks-java-core-2.10.0-20181212.232426-1.jar instead
of a beam-sdks-java-core-2.10.0-SNAPSHOT.jar.

- Cham

On Wed, Dec 12, 2018 at 1:16 PM Ismaël Mejía  wrote:

> Thanks Garrett for the quick fix. I just tested and it is working now.
>
> I found a second issue (not related to Garrett's PR, it was the reason
> why I detected that local artifacts were not updated in our jenkins
> (in the other thread).
>
> To validate that our daily snapshots don't break existing code we have
> a maven project that takes these snapshots from the apache repository.
> In maven speak:
>
> 
> 
> snapshots
> Apache Development Snapshot Repository
> 
> https://repository.apache.org/content/repositories/snapshots/
> 
> false
> 
> 
> true
> 
> 
> 
>
> If we do 'mvn clean verify' in our project, it brings the SNAPSHOTS from
> Apache.
>
> Now if locally I fix something in Beam and deploy locally via:
>
> ./gradlew -Ppublishing --no-parallel
> -PdistMgmtSnapshotsUrl=file:///home/ismael/.m2/repository -p
> sdks/java/core publish -x spotlessCheck -x test -x rat
>
> It puts the generated more recent jars in the .m2 directory.
> However if you re execute the maven project, it detects and imports
> still the old jars.
>
> I think that something is missing in the way we are generating the
> files for the .m2 directory via publishing.
> But I don't really understand clearly the way SNAPSHOT resolution works.
> Anyone has any idea or can contribute a fix for this one?
>
> Thanks.
>
> ps. if someone wants to check this out of the box you can reproduce
> the case by building this project (same case):
> https://github.com/jbonofre/beam-samples/
>
> On Wed, Dec 12, 2018 at 8:55 PM Garrett Jones 
> wrote:
> >
> > Nevermind, I found a much easier fix (delete two characters):
> https://github.com/apache/beam/pull/7265
> >
> >
> > On Wed, Dec 12, 2018 at 11:03 AM Garrett Jones 
> wrote:
> >>
> >> I'm inclined to undo a particular modification I made in my PR and
> re-duplicate the repositories declaration between the Gradle plugin and the
> new BOM module. Scott, what do you think?
> >>
> >>
> >> On Wed, Dec 12, 2018 at 11:00 AM Scott Wegner  wrote:
> >>>
> >>> Thanks for pointing this out Alexy. This seems like we unintentionally
> broke something in PR#7197 [1]
> >>>
> >>> +Garrett Jones, who authored the change. Garrett can you help
> investigate?
> >>>
> >>> I went to check to see if we have any existing Jenkins jobs that
> would've caught this break. It seems the
> beam_Release_Gradle_NightlySnapshot job [2] has been failing for the last
> 10 days. Has anybody looked into this?
> >>>
> >>> [1] https://github.com/apache/beam/pull/7197
> >>> [2] https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/
> >>>
> >>> On Wed, Dec 12, 2018 at 5:57 AM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
> 
>  Hi all,
> 
>  I used to publish maven artefacts into local repository using this
> kind of command for example:
> 
>  ./gradlew -Ppublishing --no-parallel
> -PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/ -p
> sdks/java/io/kafka/ publish
> 
>  It worked fine till today. Seems like (according to "git bisect”)
> this recent commit [1] introduced new functionality and now it fails with
> an error:
> 
>  * What went wrong:
>  A problem occurred configuring project ':beam-sdks-java-io-kafka'.
>  > Exception thrown while executing model rule:
> PublishingPluginRules#publishing(ExtensionContainer)
> > Cannot set the value of read-only property 'repositories' for
> object of type
> org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.
> 
>  Does anyone know if this is a bug or I should use another command for
> the same purposes?
> 
> 
>  [1]
> https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>>
> >>>
> >>>
> >>> Got feedback? tinyurl.com/swegner-feedback
>


Re: [DISCUSS] Structuring Java based DSLs

2018-12-12 Thread Reuven Lax
I'll send an update on schemas soon. But the tl;dr is that by the end of
this month, I expect it to be generally usable across a variety of input
formats.

Reuven

On Wed, Dec 12, 2018 at 9:38 AM Xinyu Liu  wrote:

> Agree with Kenn on this. From our SamzaRunner point of view, we would like
> Beam SQL to be self-contained and flexible enough for our users to use it
> in different scenarios, e.g. pure SQL and embeded in different SDKs. We are
> also extremely interested in the DataFrame-like API mentioned above. To
> digress a little bit from this topic, this is actually the current hurdle
> of letting our users try it out in hadoop since they expect such kind of
> API with columnar data set IO support, e.g. ORC. If there are any more
> details about the status of DF API and columnar support, I will be very
> happy to learn more about it.
>
> Thanks,
> Xinyu
>
> On Wed, Dec 12, 2018 at 8:55 AM Jan Lukavský  wrote:
>
>> Hi all,
>>
>> after letting this sink for a while, I'd like to summarize the feedback
>> and emphasize some questions that appeared:
>>
>>  a) there were several 'it makes sense' opinions
>>
>>  b) there was one 'not right now' - which makes sense, but the purpose of
>> this discussion was to try to first answer the what and then the when :-)
>>
>>  c) there were several 'maybe, but':
>>
>>   i) it would be more complicated to code SQL against user-facing API,
>> because that way, each change needed by SQL would have to be first
>> implemented in this user-friendly API layer
>>
>>  I can absolutely agree with this, it would be definitely more
>> complicated and more work. I see basically two ways out. The first one
>> would suggest to move all the code from Euphoria into something similar to
>> Join library, and let Euphoria be just the user-friendly layer on top of
>> this library (basically just the builders). That way, we could reuse the
>> code and be pretty much sure, that the implementation of SQL transforms are
>> identical to what Euphoria would offer, which is one the goals of this
>> discussion. The drawback would be, that there would be no guaranties, that
>> what this underlying library would offer would be also accessible from
>> Euphoria - that is because the complexity would not disappear, it would be
>> just moved onto different component - new added feature to the shared
>> library would have to be made accessible in Euphoria. The other way around
>> would be to accept this added complexity in favor of making sure, that
>> every feature that is needed by SQL is also available in Euphoria, because
>> the user-facing API would be used by SQL itself. I'd really like to further
>> hear community opinions on pros and cons of these two (or maybe I'm
>> overlooking something and there is a third way).
>>
>>  ii) in some cases, we might want to support relational operators in SDK
>> harness for performance, and we don't want to close doors for this
>>
>>  Again, the motivation of this seems to be clear and valid, but the
>> question that arises is - under the conditions (something like we have
>> schema aware PCollection), would we want to enable code reuse between logic
>> written in SQL and Euphoria to ensure consistent behavior? That would
>> probably mean that Euphoria would have to make use of the provided scheme
>> of PCollection and switch to a different behavior on API level (more
>> DataFrame-like) and/or different operators created and passed to the SDK
>> harness. This feature is currently absolutely missing, but seems to be
>> plausible and maybe there could be benefits for both sides.
>>
>> Many thanks for any more opinions on this.
>>
>>  Jan
>>
>>
>> On 12/4/18 11:32 PM, Rui Wang wrote:
>>
>> For pure SQL users, there shouldn't be a SDK concepts. SQL shell and JDBC
>> driver should be the way to interact Beam by SQL.
>>
>>
>> For embedded SQL use case in all SDKs (Python, Go, etc.), even assume
>> there are relational algebra operators defined in SDKs, SDKs still have to
>> implement its own way to parse SQL into operators (SQL is just a string).
>> To avoid that overhead, I would imagine that SDKs should keep SQL queries
>> and wait for a later but shared processing (I don't know if Portability
>> should handle SQL or if it could).
>>
>>
>> -Rui
>>
>> On Tue, Dec 4, 2018 at 2:04 AM Jan Lukavský  wrote:
>>
>>> Hi Kenn,
>>>
>>> my intent really was not to propose any changes right now. I'm trying to
>>> create a clear understanding about what the relation between Euphoria and
>>> SQL should be in long run. In my point of view, Euphoria should be always
>>> superset of SQL, because it should support complete relational algebra (and
>>> I'm not saying it does so right now, it should just be our goal) plus more
>>> flexible UDFs (not limited to SQL standard) and stateful processing (which
>>> will probably not be part of SQL any time soon). There should be some sort
>>> of guaranties that the semantics of SQL and Euphoria are the same, because
>>> that is what 

Re: Issue with publishing maven artefacts locally

2018-12-12 Thread Ismaël Mejía
Thanks Garrett for the quick fix. I just tested and it is working now.

I found a second issue (not related to Garrett's PR, it was the reason
why I detected that local artifacts were not updated in our jenkins
(in the other thread).

To validate that our daily snapshots don't break existing code we have
a maven project that takes these snapshots from the apache repository.
In maven speak:



snapshots
Apache Development Snapshot Repository

https://repository.apache.org/content/repositories/snapshots/

false


true




If we do 'mvn clean verify' in our project, it brings the SNAPSHOTS from Apache.

Now if locally I fix something in Beam and deploy locally via:

./gradlew -Ppublishing --no-parallel
-PdistMgmtSnapshotsUrl=file:///home/ismael/.m2/repository -p
sdks/java/core publish -x spotlessCheck -x test -x rat

It puts the generated more recent jars in the .m2 directory.
However if you re execute the maven project, it detects and imports
still the old jars.

I think that something is missing in the way we are generating the
files for the .m2 directory via publishing.
But I don't really understand clearly the way SNAPSHOT resolution works.
Anyone has any idea or can contribute a fix for this one?

Thanks.

ps. if someone wants to check this out of the box you can reproduce
the case by building this project (same case):
https://github.com/jbonofre/beam-samples/

On Wed, Dec 12, 2018 at 8:55 PM Garrett Jones  wrote:
>
> Nevermind, I found a much easier fix (delete two characters): 
> https://github.com/apache/beam/pull/7265
>
>
> On Wed, Dec 12, 2018 at 11:03 AM Garrett Jones  
> wrote:
>>
>> I'm inclined to undo a particular modification I made in my PR and 
>> re-duplicate the repositories declaration between the Gradle plugin and the 
>> new BOM module. Scott, what do you think?
>>
>>
>> On Wed, Dec 12, 2018 at 11:00 AM Scott Wegner  wrote:
>>>
>>> Thanks for pointing this out Alexy. This seems like we unintentionally 
>>> broke something in PR#7197 [1]
>>>
>>> +Garrett Jones, who authored the change. Garrett can you help investigate?
>>>
>>> I went to check to see if we have any existing Jenkins jobs that would've 
>>> caught this break. It seems the beam_Release_Gradle_NightlySnapshot job [2] 
>>> has been failing for the last 10 days. Has anybody looked into this?
>>>
>>> [1] https://github.com/apache/beam/pull/7197
>>> [2] https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/
>>>
>>> On Wed, Dec 12, 2018 at 5:57 AM Alexey Romanenko  
>>> wrote:

 Hi all,

 I used to publish maven artefacts into local repository using this kind of 
 command for example:

 ./gradlew -Ppublishing --no-parallel 
 -PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/ -p 
 sdks/java/io/kafka/ publish

 It worked fine till today. Seems like (according to "git bisect”) this 
 recent commit [1] introduced new functionality and now it fails with an 
 error:

 * What went wrong:
 A problem occurred configuring project ':beam-sdks-java-io-kafka'.
 > Exception thrown while executing model rule: 
 > PublishingPluginRules#publishing(ExtensionContainer)
> Cannot set the value of read-only property 'repositories' for object 
 of type 
 org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.

 Does anyone know if this is a bug or I should use another command for the 
 same purposes?


 [1] 
 https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> Got feedback? tinyurl.com/swegner-feedback


Re: ULR Tests on commit?

2018-12-12 Thread Robert Burke
Woohoo! Thanks for the info.

On Wed, Dec 12, 2018, 2:26 PM Daniel Oliveira 
wrote:

> Yeah, this is in-progress. The tests should get in for various languages
> throughout the next two weeks and I'll add them to the PR template as I add
> them.
>
> I do have a JIRA for tracking (BEAM-5449
> ) but I
> haven't been updating it regularly. I'll try to keep it updated.
>
> On Wed, Dec 12, 2018 at 11:03 AM Scott Wegner  wrote:
>
>> +Daniel Oliveira  who has been working on the
>> ULR.
>>
>> I believe this is in-progress. Dan, do you have a JIRA for tracking?
>>
>> On Wed, Dec 12, 2018 at 10:08 AM Robert Burke  wrote:
>>
>>> In our auto populated github PR template, we have a variety of SDK
>>> languages to runner combos, but the Universal Local Runner (ULR) is absent.
>>>
>>> Do we currently run tests on the ULR as pre-commit or post commit? If
>>> not, why not?
>>>
>>> If so, can we add a ULR column to the PR template?
>>>
>>> Mostly curious. Thanks!
>>> Robert Burke
>>> @lostluck, distributed gopher wrangler
>>>
>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>


Re: Issue with publishing maven artefacts locally

2018-12-12 Thread Garrett Jones
I'm inclined to undo a particular modification I made in my PR and
re-duplicate the repositories declaration between the Gradle plugin and the
new BOM module. Scott, what do you think?


On Wed, Dec 12, 2018 at 11:00 AM Scott Wegner  wrote:

> Thanks for pointing this out Alexy. This seems like we unintentionally
> broke something in PR#7197 [1]
>
> +Garrett Jones , who authored the change.
> Garrett can you help investigate?
>
> I went to check to see if we have any existing Jenkins jobs that would've
> caught this break. It seems the beam_Release_Gradle_NightlySnapshot job [2]
> has been failing for the last 10 days. Has anybody looked into this?
>
> [1] https://github.com/apache/beam/pull/7197
> [2] https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/
>
> On Wed, Dec 12, 2018 at 5:57 AM Alexey Romanenko 
> wrote:
>
>> Hi all,
>>
>> I used to publish maven artefacts into local repository using this kind
>> of command for example:
>>
>> *./gradlew -Ppublishing --no-parallel
>> -PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/
>> -p sdks/java/io/kafka/ publish*
>>
>> It worked fine till today. Seems like (according to "git bisect”) this
>> recent commit [1] introduced new functionality and now it fails with an
>> error:
>>
>>
>>
>>
>> ** What went wrong:A problem occurred configuring project
>> ':beam-sdks-java-io-kafka'.> Exception thrown while executing model rule:
>> PublishingPluginRules#publishing(ExtensionContainer)   > Cannot set the
>> value of read-only property 'repositories' for object of type
>> org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.*
>>
>> Does anyone know if this is a bug or I should use another command for the
>> same purposes?
>>
>>
>> [1]
>> https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9
>>
>>
>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


Re: Issue with publishing maven artefacts locally

2018-12-12 Thread Garrett Jones
Nevermind, I found a much easier fix (delete two characters):
https://github.com/apache/beam/pull/7265


On Wed, Dec 12, 2018 at 11:03 AM Garrett Jones 
wrote:

> I'm inclined to undo a particular modification I made in my PR and
> re-duplicate the repositories declaration between the Gradle plugin and the
> new BOM module. Scott, what do you think?
>
>
> On Wed, Dec 12, 2018 at 11:00 AM Scott Wegner  wrote:
>
>> Thanks for pointing this out Alexy. This seems like we unintentionally
>> broke something in PR#7197 [1]
>>
>> +Garrett Jones , who authored the change.
>> Garrett can you help investigate?
>>
>> I went to check to see if we have any existing Jenkins jobs that would've
>> caught this break. It seems the beam_Release_Gradle_NightlySnapshot job [2]
>> has been failing for the last 10 days. Has anybody looked into this?
>>
>> [1] https://github.com/apache/beam/pull/7197
>> [2] https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/
>>
>> On Wed, Dec 12, 2018 at 5:57 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I used to publish maven artefacts into local repository using this kind
>>> of command for example:
>>>
>>> *./gradlew -Ppublishing --no-parallel
>>> -PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/
>>> -p sdks/java/io/kafka/ publish*
>>>
>>> It worked fine till today. Seems like (according to "git bisect”) this
>>> recent commit [1] introduced new functionality and now it fails with an
>>> error:
>>>
>>>
>>>
>>>
>>> ** What went wrong:A problem occurred configuring project
>>> ':beam-sdks-java-io-kafka'.> Exception thrown while executing model rule:
>>> PublishingPluginRules#publishing(ExtensionContainer)   > Cannot set the
>>> value of read-only property 'repositories' for object of type
>>> org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.*
>>>
>>> Does anyone know if this is a bug or I should use another command for
>>> the same purposes?
>>>
>>>
>>> [1]
>>> https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9
>>>
>>>
>>
>>
>> --
>>
>>
>>
>>
>> Got feedback? tinyurl.com/swegner-feedback
>>
>


Re: Issue with publishing maven artefacts locally

2018-12-12 Thread Udi Meiri
On Wed, Dec 12, 2018 at 11:00 AM Scott Wegner  wrote:

> Thanks for pointing this out Alexy. This seems like we unintentionally
> broke something in PR#7197 [1]
>
> +Garrett Jones , who authored the change.
> Garrett can you help investigate?
>
> I went to check to see if we have any existing Jenkins jobs that would've
> caught this break. It seems the beam_Release_Gradle_NightlySnapshot job [2]
> has been failing for the last 10 days. Has anybody looked into this?
>
See "Beam snapshots broken" thread.


>
> [1] https://github.com/apache/beam/pull/7197
> [2] https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/
>
> On Wed, Dec 12, 2018 at 5:57 AM Alexey Romanenko 
> wrote:
>
>> Hi all,
>>
>> I used to publish maven artefacts into local repository using this kind
>> of command for example:
>>
>> *./gradlew -Ppublishing --no-parallel
>> -PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/
>> -p sdks/java/io/kafka/ publish*
>>
>> It worked fine till today. Seems like (according to "git bisect”) this
>> recent commit [1] introduced new functionality and now it fails with an
>> error:
>>
>>
>>
>>
>> ** What went wrong:A problem occurred configuring project
>> ':beam-sdks-java-io-kafka'.> Exception thrown while executing model rule:
>> PublishingPluginRules#publishing(ExtensionContainer)   > Cannot set the
>> value of read-only property 'repositories' for object of type
>> org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.*
>>
>> Does anyone know if this is a bug or I should use another command for the
>> same purposes?
>>
>>
>> [1]
>> https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9
>>
>>
>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: OOO

2018-12-12 Thread Daniel Oliveira
Thanks for all the work you've been doing on Beam, Luke! Hope you have some
good bonding time and that it's not too hectic.

On Wed, Dec 12, 2018 at 10:10 AM Kenneth Knowles  wrote:

> Congrats & have a super time!
>
> Kenn
>
> On Wed, Dec 12, 2018 at 10:09 AM Robert Burke  wrote:
>
>> Have a great bonding time! I'd say "break" but I expect you'll be quite
>> busy.
>>
>> On Wed, Dec 12, 2018, 9:57 AM Etienne Chauchot 
>> wrote:
>>
>>> Enjoy your family time and take care of the little one
>>>
>>> Etienne
>>>
>>> Le mardi 11 décembre 2018 à 12:26 +0100, Maximilian Michels a écrit :
>>>
>>> Thank you for your amazing work on Beam.
>>>
>>>
>>> Enjoy the time with your kid!
>>>
>>>
>>> -Max
>>>
>>>
>>> On 11.12.18 00:55, Pablo Estrada wrote:
>>>
>>> See ya in three months! Take it easy!
>>>
>>>
>>> On Mon, Dec 10, 2018 at 3:27 PM Thomas Weise >>
>>> > wrote:
>>>
>>>
>>> Cute :)
>>>
>>>
>>> Enjoy the time with the family.
>>>
>>>
>>> Thomas
>>>
>>>
>>> On Mon, Dec 10, 2018 at 8:53 AM Ismaël Mejía >>
>>> > wrote:
>>>
>>>
>>> Thanks for the community awareness, enjoy the time with the baby and
>>>
>>> see you soon.
>>>
>>>
>>> On Fri, Dec 7, 2018 at 9:20 PM Lukasz Cwik >>
>>> > wrote:
>>>
>>>  >
>>>
>>>  > I'll be away for the next three months taking care of my little
>>>
>>> one[1] and am excited to see what happens within Apache Beam when I 
>>> return.
>>>
>>>  >
>>>
>>>  > I have been mainly focusing on the portability and SplittableDoFn
>>>
>>> efforts. If there are questions while I'm out, feel free to reach 
>>> out to
>>>
>>> this dev@ list as there are several community members that have been
>>>
>>> involved.
>>>
>>>  >
>>>
>>>  > For portability related stuff:
>>>
>>>  > Thomas Weise
>>>
>>>  > Robert Bradshaw
>>>
>>>  > Maximilian Michels
>>>
>>>  > Ankur Goenka
>>>
>>>  >
>>>
>>>  > For SplittableDoFn stuff:
>>>
>>>  > Robert Bradshaw
>>>
>>>  > Ismael Mejia
>>>
>>>  > JB Onofre
>>>
>>>  >
>>>
>>>  > 1: https://photos.app.goo.gl/sqdcgC5rxDbURPE7A
>>>
>>>
>>>


Re: ULR Tests on commit?

2018-12-12 Thread Daniel Oliveira
Yeah, this is in-progress. The tests should get in for various languages
throughout the next two weeks and I'll add them to the PR template as I add
them.

I do have a JIRA for tracking (BEAM-5449
) but I haven't
been updating it regularly. I'll try to keep it updated.

On Wed, Dec 12, 2018 at 11:03 AM Scott Wegner  wrote:

> +Daniel Oliveira  who has been working on the ULR.
>
> I believe this is in-progress. Dan, do you have a JIRA for tracking?
>
> On Wed, Dec 12, 2018 at 10:08 AM Robert Burke  wrote:
>
>> In our auto populated github PR template, we have a variety of SDK
>> languages to runner combos, but the Universal Local Runner (ULR) is absent.
>>
>> Do we currently run tests on the ULR as pre-commit or post commit? If
>> not, why not?
>>
>> If so, can we add a ULR column to the PR template?
>>
>> Mostly curious. Thanks!
>> Robert Burke
>> @lostluck, distributed gopher wrangler
>>
>
>
> --
>
>
>
>
> Got feedback? tinyurl.com/swegner-feedback
>


Re: ULR Tests on commit?

2018-12-12 Thread Scott Wegner
+Daniel Oliveira  who has been working on the ULR.

I believe this is in-progress. Dan, do you have a JIRA for tracking?

On Wed, Dec 12, 2018 at 10:08 AM Robert Burke  wrote:

> In our auto populated github PR template, we have a variety of SDK
> languages to runner combos, but the Universal Local Runner (ULR) is absent.
>
> Do we currently run tests on the ULR as pre-commit or post commit? If not,
> why not?
>
> If so, can we add a ULR column to the PR template?
>
> Mostly curious. Thanks!
> Robert Burke
> @lostluck, distributed gopher wrangler
>


-- 




Got feedback? tinyurl.com/swegner-feedback


Re: Issue with publishing maven artefacts locally

2018-12-12 Thread Scott Wegner
Thanks for pointing this out Alexy. This seems like we unintentionally
broke something in PR#7197 [1]

+Garrett Jones , who authored the change. Garrett
can you help investigate?

I went to check to see if we have any existing Jenkins jobs that would've
caught this break. It seems the beam_Release_Gradle_NightlySnapshot job [2]
has been failing for the last 10 days. Has anybody looked into this?

[1] https://github.com/apache/beam/pull/7197
[2] https://builds.apache.org/job/beam_Release_Gradle_NightlySnapshot/

On Wed, Dec 12, 2018 at 5:57 AM Alexey Romanenko 
wrote:

> Hi all,
>
> I used to publish maven artefacts into local repository using this kind of
> command for example:
>
> *./gradlew -Ppublishing --no-parallel
> -PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/
> -p sdks/java/io/kafka/ publish*
>
> It worked fine till today. Seems like (according to "git bisect”) this
> recent commit [1] introduced new functionality and now it fails with an
> error:
>
>
>
>
> ** What went wrong:A problem occurred configuring project
> ':beam-sdks-java-io-kafka'.> Exception thrown while executing model rule:
> PublishingPluginRules#publishing(ExtensionContainer)   > Cannot set the
> value of read-only property 'repositories' for object of type
> org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.*
>
> Does anyone know if this is a bug or I should use another command for the
> same purposes?
>
>
> [1]
> https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9
>
>


-- 




Got feedback? tinyurl.com/swegner-feedback


Java performance tests dashboard

2018-12-12 Thread Udi Meiri
Hi Lukasz,
I was looking for statistics on I/O performance for writes of many files
(~10k) on GCS.

I found this dashboard

and
I have some questions.
1. The tests that are "local filesystem" seem to be running on Dataflow and
writing to GCS - is it okay to rename them to be officially GCS tests?
2. Is it okay if I add additional GCS tests to this dashboard?


smime.p7s
Description: S/MIME Cryptographic Signature


Re: OOO

2018-12-12 Thread Kenneth Knowles
Congrats & have a super time!

Kenn

On Wed, Dec 12, 2018 at 10:09 AM Robert Burke  wrote:

> Have a great bonding time! I'd say "break" but I expect you'll be quite
> busy.
>
> On Wed, Dec 12, 2018, 9:57 AM Etienne Chauchot 
> wrote:
>
>> Enjoy your family time and take care of the little one
>>
>> Etienne
>>
>> Le mardi 11 décembre 2018 à 12:26 +0100, Maximilian Michels a écrit :
>>
>> Thank you for your amazing work on Beam.
>>
>>
>> Enjoy the time with your kid!
>>
>>
>> -Max
>>
>>
>> On 11.12.18 00:55, Pablo Estrada wrote:
>>
>> See ya in three months! Take it easy!
>>
>>
>> On Mon, Dec 10, 2018 at 3:27 PM Thomas Weise >
>> > wrote:
>>
>>
>> Cute :)
>>
>>
>> Enjoy the time with the family.
>>
>>
>> Thomas
>>
>>
>> On Mon, Dec 10, 2018 at 8:53 AM Ismaël Mejía >
>> > wrote:
>>
>>
>> Thanks for the community awareness, enjoy the time with the baby and
>>
>> see you soon.
>>
>>
>> On Fri, Dec 7, 2018 at 9:20 PM Lukasz Cwik >
>> > wrote:
>>
>>  >
>>
>>  > I'll be away for the next three months taking care of my little
>>
>> one[1] and am excited to see what happens within Apache Beam when I 
>> return.
>>
>>  >
>>
>>  > I have been mainly focusing on the portability and SplittableDoFn
>>
>> efforts. If there are questions while I'm out, feel free to reach 
>> out to
>>
>> this dev@ list as there are several community members that have been
>>
>> involved.
>>
>>  >
>>
>>  > For portability related stuff:
>>
>>  > Thomas Weise
>>
>>  > Robert Bradshaw
>>
>>  > Maximilian Michels
>>
>>  > Ankur Goenka
>>
>>  >
>>
>>  > For SplittableDoFn stuff:
>>
>>  > Robert Bradshaw
>>
>>  > Ismael Mejia
>>
>>  > JB Onofre
>>
>>  >
>>
>>  > 1: https://photos.app.goo.gl/sqdcgC5rxDbURPE7A
>>
>>
>>


Re: [Call for items] ❄️ December Beam Newsletter

2018-12-12 Thread Maximilian Michels

Thanks Rose! Updated the list.

On 12.12.18 18:41, Alexey Romanenko wrote:

I've just added several things that were finished.

Thanks for driving this!

On 7 Dec 2018, at 23:39, Rose Nguyen > wrote:


Hi folks:
*
*
Time for the last newsletter of the year!
*
*
*Add to [1] the highlights from November to now (or planned events and talks) 
that you want to share by 12/12 11:59 p.m. PDT.*


We will collect the notes via Google docs but send out the final version 
directly to the user mailing list. If you do not know how to format something, 
it is OK to just put down the info and I will edit. I'll ship out the 
newsletter on 12/13.


[1] 
https://docs.google.com/document/d/1q4KBkcLR7orr6n_QUHMpAVBKYzKYahlRZw_K_PaIuDo


Cheers,
--
Rose Thị Nguyễn




Re: OOO

2018-12-12 Thread Robert Burke
Have a great bonding time! I'd say "break" but I expect you'll be quite
busy.

On Wed, Dec 12, 2018, 9:57 AM Etienne Chauchot  wrote:

> Enjoy your family time and take care of the little one
>
> Etienne
>
> Le mardi 11 décembre 2018 à 12:26 +0100, Maximilian Michels a écrit :
>
> Thank you for your amazing work on Beam.
>
>
> Enjoy the time with your kid!
>
>
> -Max
>
>
> On 11.12.18 00:55, Pablo Estrada wrote:
>
> See ya in three months! Take it easy!
>
>
> On Mon, Dec 10, 2018 at 3:27 PM Thomas Weise 
> > wrote:
>
>
> Cute :)
>
>
> Enjoy the time with the family.
>
>
> Thomas
>
>
> On Mon, Dec 10, 2018 at 8:53 AM Ismaël Mejía 
> > wrote:
>
>
> Thanks for the community awareness, enjoy the time with the baby and
>
> see you soon.
>
>
> On Fri, Dec 7, 2018 at 9:20 PM Lukasz Cwik 
> > wrote:
>
>  >
>
>  > I'll be away for the next three months taking care of my little
>
> one[1] and am excited to see what happens within Apache Beam when I 
> return.
>
>  >
>
>  > I have been mainly focusing on the portability and SplittableDoFn
>
> efforts. If there are questions while I'm out, feel free to reach out 
> to
>
> this dev@ list as there are several community members that have been
>
> involved.
>
>  >
>
>  > For portability related stuff:
>
>  > Thomas Weise
>
>  > Robert Bradshaw
>
>  > Maximilian Michels
>
>  > Ankur Goenka
>
>  >
>
>  > For SplittableDoFn stuff:
>
>  > Robert Bradshaw
>
>  > Ismael Mejia
>
>  > JB Onofre
>
>  >
>
>  > 1: https://photos.app.goo.gl/sqdcgC5rxDbURPE7A
>
>
>


ULR Tests on commit?

2018-12-12 Thread Robert Burke
In our auto populated github PR template, we have a variety of SDK
languages to runner combos, but the Universal Local Runner (ULR) is absent.

Do we currently run tests on the ULR as pre-commit or post commit? If not,
why not?

If so, can we add a ULR column to the PR template?

Mostly curious. Thanks!
Robert Burke
@lostluck, distributed gopher wrangler


Re: [Call for items] ❄️ December Beam Newsletter

2018-12-12 Thread Alexey Romanenko
I've just added several things that were finished.

Thanks for driving this!

> On 7 Dec 2018, at 23:39, Rose Nguyen  wrote:
> 
> Hi folks:
> 
> Time for the last newsletter of the year!
> 
> Add to [1] the highlights from November to now (or planned events and talks) 
> that you want to share by 12/12 11:59 p.m. PDT.
> 
> We will collect the notes via Google docs but send out the final version 
> directly to the user mailing list. If you do not know how to format 
> something, it is OK to just put down the info and I will edit. I'll ship out 
> the newsletter on 12/13. 
> 
> [1] 
> https://docs.google.com/document/d/1q4KBkcLR7orr6n_QUHMpAVBKYzKYahlRZw_K_PaIuDo
>  
> 
> 
> Cheers,
> -- 
> Rose Thị Nguyễn



Re: [DISCUSS] Structuring Java based DSLs

2018-12-12 Thread Xinyu Liu
Agree with Kenn on this. From our SamzaRunner point of view, we would like
Beam SQL to be self-contained and flexible enough for our users to use it
in different scenarios, e.g. pure SQL and embeded in different SDKs. We are
also extremely interested in the DataFrame-like API mentioned above. To
digress a little bit from this topic, this is actually the current hurdle
of letting our users try it out in hadoop since they expect such kind of
API with columnar data set IO support, e.g. ORC. If there are any more
details about the status of DF API and columnar support, I will be very
happy to learn more about it.

Thanks,
Xinyu

On Wed, Dec 12, 2018 at 8:55 AM Jan Lukavský  wrote:

> Hi all,
>
> after letting this sink for a while, I'd like to summarize the feedback
> and emphasize some questions that appeared:
>
>  a) there were several 'it makes sense' opinions
>
>  b) there was one 'not right now' - which makes sense, but the purpose of
> this discussion was to try to first answer the what and then the when :-)
>
>  c) there were several 'maybe, but':
>
>   i) it would be more complicated to code SQL against user-facing API,
> because that way, each change needed by SQL would have to be first
> implemented in this user-friendly API layer
>
>  I can absolutely agree with this, it would be definitely more
> complicated and more work. I see basically two ways out. The first one
> would suggest to move all the code from Euphoria into something similar to
> Join library, and let Euphoria be just the user-friendly layer on top of
> this library (basically just the builders). That way, we could reuse the
> code and be pretty much sure, that the implementation of SQL transforms are
> identical to what Euphoria would offer, which is one the goals of this
> discussion. The drawback would be, that there would be no guaranties, that
> what this underlying library would offer would be also accessible from
> Euphoria - that is because the complexity would not disappear, it would be
> just moved onto different component - new added feature to the shared
> library would have to be made accessible in Euphoria. The other way around
> would be to accept this added complexity in favor of making sure, that
> every feature that is needed by SQL is also available in Euphoria, because
> the user-facing API would be used by SQL itself. I'd really like to further
> hear community opinions on pros and cons of these two (or maybe I'm
> overlooking something and there is a third way).
>
>  ii) in some cases, we might want to support relational operators in SDK
> harness for performance, and we don't want to close doors for this
>
>  Again, the motivation of this seems to be clear and valid, but the
> question that arises is - under the conditions (something like we have
> schema aware PCollection), would we want to enable code reuse between logic
> written in SQL and Euphoria to ensure consistent behavior? That would
> probably mean that Euphoria would have to make use of the provided scheme
> of PCollection and switch to a different behavior on API level (more
> DataFrame-like) and/or different operators created and passed to the SDK
> harness. This feature is currently absolutely missing, but seems to be
> plausible and maybe there could be benefits for both sides.
>
> Many thanks for any more opinions on this.
>
>  Jan
>
>
> On 12/4/18 11:32 PM, Rui Wang wrote:
>
> For pure SQL users, there shouldn't be a SDK concepts. SQL shell and JDBC
> driver should be the way to interact Beam by SQL.
>
>
> For embedded SQL use case in all SDKs (Python, Go, etc.), even assume
> there are relational algebra operators defined in SDKs, SDKs still have to
> implement its own way to parse SQL into operators (SQL is just a string).
> To avoid that overhead, I would imagine that SDKs should keep SQL queries
> and wait for a later but shared processing (I don't know if Portability
> should handle SQL or if it could).
>
>
> -Rui
>
> On Tue, Dec 4, 2018 at 2:04 AM Jan Lukavský  wrote:
>
>> Hi Kenn,
>>
>> my intent really was not to propose any changes right now. I'm trying to
>> create a clear understanding about what the relation between Euphoria and
>> SQL should be in long run. In my point of view, Euphoria should be always
>> superset of SQL, because it should support complete relational algebra (and
>> I'm not saying it does so right now, it should just be our goal) plus more
>> flexible UDFs (not limited to SQL standard) and stateful processing (which
>> will probably not be part of SQL any time soon). There should be some sort
>> of guaranties that the semantics of SQL and Euphoria are the same, because
>> that is what users would expect it to be. This can be for sure ensured by
>> introducing another layer between Euphoria and core SDK (e.g. the join
>> library), but the question is - what makes this solution different from
>> creating this shared library from Euphoria itself (when looking at the big
>> picture)? And it is not only about 

Re: [DISCUSS] Structuring Java based DSLs

2018-12-12 Thread Jan Lukavský

Hi all,

after letting this sink for a while, I'd like to summarize the feedback 
and emphasize some questions that appeared:


 a) there were several 'it makes sense' opinions

 b) there was one 'not right now' - which makes sense, but the purpose 
of this discussion was to try to first answer the what and then the when :-)


 c) there were several 'maybe, but':

  i) it would be more complicated to code SQL against user-facing API, 
because that way, each change needed by SQL would have to be first 
implemented in this user-friendly API layer


 I can absolutely agree with this, it would be definitely more 
complicated and more work. I see basically two ways out. The first one 
would suggest to move all the code from Euphoria into something similar 
to Join library, and let Euphoria be just the user-friendly layer on top 
of this library (basically just the builders). That way, we could reuse 
the code and be pretty much sure, that the implementation of SQL 
transforms are identical to what Euphoria would offer, which is one the 
goals of this discussion. The drawback would be, that there would be no 
guaranties, that what this underlying library would offer would be also 
accessible from Euphoria - that is because the complexity would not 
disappear, it would be just moved onto different component - new added 
feature to the shared library would have to be made accessible in 
Euphoria. The other way around would be to accept this added complexity 
in favor of making sure, that every feature that is needed by SQL is 
also available in Euphoria, because the user-facing API would be used by 
SQL itself. I'd really like to further hear community opinions on pros 
and cons of these two (or maybe I'm overlooking something and there is a 
third way).


 ii) in some cases, we might want to support relational operators in 
SDK harness for performance, and we don't want to close doors for this


 Again, the motivation of this seems to be clear and valid, but the 
question that arises is - under the conditions (something like we have 
schema aware PCollection), would we want to enable code reuse between 
logic written in SQL and Euphoria to ensure consistent behavior? That 
would probably mean that Euphoria would have to make use of the provided 
scheme of PCollection and switch to a different behavior on API level 
(more DataFrame-like) and/or different operators created and passed to 
the SDK harness. This feature is currently absolutely missing, but seems 
to be plausible and maybe there could be benefits for both sides.


Many thanks for any more opinions on this.

 Jan


On 12/4/18 11:32 PM, Rui Wang wrote:
For pure SQL users, there shouldn't be a SDK concepts. SQL shell and 
JDBC driver should be the way to interact Beam by SQL.



For embedded SQL use case in all SDKs (Python, Go, etc.), even assume 
there are relational algebra operators defined in SDKs, SDKs still 
have to implement its own way to parse SQL into operators (SQL is just 
a string).  To avoid that overhead, I would imagine that SDKs should 
keep SQL queries and wait for a later but shared processing (I don't 
know if Portability should handle SQL or if it could).



-Rui

On Tue, Dec 4, 2018 at 2:04 AM Jan Lukavský > wrote:


Hi Kenn,

my intent really was not to propose any changes right now. I'm
trying to create a clear understanding about what the relation
between Euphoria and SQL should be in long run. In my point of
view, Euphoria should be always superset of SQL, because it should
support complete relational algebra (and I'm not saying it does so
right now, it should just be our goal) plus more flexible UDFs
(not limited to SQL standard) and stateful processing (which will
probably not be part of SQL any time soon). There should be some
sort of guaranties that the semantics of SQL and Euphoria are the
same, because that is what users would expect it to be. This can
be for sure ensured by introducing another layer between Euphoria
and core SDK (e.g. the join library), but the question is - what
makes this solution different from creating this shared library
from Euphoria itself (when looking at the big picture)? And it is
not only about implementations of joins or any other operators,
but there are other techniques that could be beneficial for SQL -
e.g. pipeline sampling, automatic pipeline optimizations based on
statistics from previous runs of batch queries, etc.

The other way - that relational algebra nodes will become
essential part of (some) SDK, that is equivalent to actually
creating SQL SDK, am I right? I understand, that this approach can
bring performance benefits, but besides that - is the language
which implements SQL really important for users? Do we need SQL
implementing Go UDFs, Java UDFs, Python UDFs? How would the
resulting SQL query look like? If it is about allowing using SQL

Re: Beam snapshots broken

2018-12-12 Thread Yifan Zou
Beam9 is offline right now. But, the job also failed on beam4 and 13
with "Could
not determine the dependencies of task ':beam-sdks-python:test.".
Seems like the task dependency did not setup properly.



On Wed, Dec 12, 2018 at 2:03 AM Ismaël Mejía  wrote:

> You are right it seems that it was related to beam9 (wondering if it
> was bad luck that it was always assigned to beam9 or we can improve
> that poor balancing error).
> However it failed again today against beam13 maybe this time is just a
> build issue but seems related to python too.
>
> On Tue, Dec 11, 2018 at 7:33 PM Boyuan Zhang  wrote:
> >
> > Seems like, all failed jobs are not owing to the single task failure.
> There failed task were executed on beam9, which was rebooted yesterday
> because python tests failed continuously. +Yifan Zou may have more useful
> content here.
> >
> > On Tue, Dec 11, 2018 at 9:10 AM Ismaël Mejía  wrote:
> >>
> >> It seems that Beam snapshots are broken since Dec. 2
> >>
> https://builds.apache.org/view/A-D/view/Beam/job/beam_Release_Gradle_NightlySnapshot/
> >>
> >> It seems "The :beam-website:startDockerContainer task failed."
> >> Can somebody please take a look.
>


Re: OOO

2018-12-12 Thread Etienne Chauchot
Enjoy your family time and take care of the little one
Etienne
Le mardi 11 décembre 2018 à 12:26 +0100, Maximilian Michels a écrit :
> Thank you for your amazing work on Beam.
> Enjoy the time with your kid!
> -Max
> On 11.12.18 00:55, Pablo Estrada wrote:
> See ya in three months! Take it easy!
> On Mon, Dec 10, 2018 at 3:27 PM Thomas Weise  > wrote:
> Cute :)
> Enjoy the time with the family.
> Thomas
> On Mon, Dec 10, 2018 at 8:53 AM Ismaël Mejía  > wrote:
> Thanks for the community awareness, enjoy the time with the baby and  
>   see you soon.
> On Fri, Dec 7, 2018 at 9:20 PM Lukasz Cwik  >
> wrote: > > I'll be away for the next three months taking care 
> of my littleone[1] and am
> excited to see what happens within Apache Beam when I return. >   
>   > I have been mainly focusing on the
> portability and SplittableDoFnefforts. If there are questions while 
> I'm out, feel free to reach out
> tothis dev@ list as there are several community members that have 
> beeninvolved. > >
> For portability related stuff: > Thomas Weise > Robert 
> Bradshaw > Maximilian Michels >
> Ankur Goenka > > For SplittableDoFn stuff: > Robert 
> Bradshaw > Ismael Mejia >
> JB Onofre > > 1: https://photos.app.goo.gl/sqdcgC5rxDbURPE7A


Re: Report to the Board, December 2018

2018-12-12 Thread Etienne Chauchot
Hi Kenn,
LGTM also
EtienneLe mercredi 12 décembre 2018 à 10:13 +0100, Robert Bradshaw a écrit :
> Mostly looks good to me. I would probably omit 
> the{issues,builds}@beam.apache.org stats as "nothing significant in
> thefigures" but note that the dev list excludes the previous automatedemails 
> (or at least a reference to where you say
> this above). Is itworth noting StackOverflow activity here in this section?On 
> Wed, Dec 12, 2018 at 2:56 AM Kenneth
> Knowles  wrote:
> 
> Reminder that this is just about due. I'll likely submit today as I have a 
> busy day tomorrow.
> Kenn
> On Thu, Nov 29, 2018 at 10:14 PM Kenneth Knowles  wrote:
> 
> Hi all,
> My next (first!) project report to the ASF Board of Directors is due on 
> 12/12. I've seeded a rough draft here: 
> https://docs.google.com/document/d/1HenFg37xyNuFC7A4zkqmBPY9_Gqdi6LgPtF6wfoEix8/edit?usp=sharing
> If you have edits/content to propose, please email to this thread.
> Kenn


Re: Report to the Board, December 2018

2018-12-12 Thread Etienne Chauchot
Hi Kenn,LGTM alsoLe mercredi 12 décembre 2018 à 10:13 +0100, Robert Bradshaw a 
écrit :
> Mostly looks good to me. I would probably omit 
> the{issues,builds}@beam.apache.org stats as "nothing significant in
> thefigures" but note that the dev list excludes the previous automatedemails 
> (or at least a reference to where you say
> this above). Is itworth noting StackOverflow activity here in this section?On 
> Wed, Dec 12, 2018 at 2:56 AM Kenneth
> Knowles  wrote:
> 
> Reminder that this is just about due. I'll likely submit today as I have a 
> busy day tomorrow.
> Kenn
> On Thu, Nov 29, 2018 at 10:14 PM Kenneth Knowles  wrote:
> 
> Hi all,
> My next (first!) project report to the ASF Board of Directors is due on 
> 12/12. I've seeded a rough draft here: 
> https://docs.google.com/document/d/1HenFg37xyNuFC7A4zkqmBPY9_Gqdi6LgPtF6wfoEix8/edit?usp=sharing
> If you have edits/content to propose, please email to this thread.
> Kenn


Issue with publishing maven artefacts locally

2018-12-12 Thread Alexey Romanenko
Hi all,

I used to publish maven artefacts into local repository using this kind of 
command for example:

./gradlew -Ppublishing --no-parallel 
-PdistMgmtSnapshotsUrl=file:///path/to/.m2/repository/ -p sdks/java/io/kafka/ 
publish

It worked fine till today. Seems like (according to "git bisect”) this recent 
commit [1] introduced new functionality and now it fails with an error:

* What went wrong:
A problem occurred configuring project ':beam-sdks-java-io-kafka'.
> Exception thrown while executing model rule: 
> PublishingPluginRules#publishing(ExtensionContainer)
   > Cannot set the value of read-only property 'repositories' for object of 
type org.gradle.api.publish.internal.DeferredConfigurablePublishingExtension.

Does anyone know if this is a bug or I should use another command for the same 
purposes?


[1] 
https://github.com/apache/beam/commit/bfd1be9ae22d1ae7e732f590c448e9e5ed2894b9 

 

Re: contributor in the Beam

2018-12-12 Thread Chaim Turkel
Hi,
  I have another pull request on MongoIO:
https://github.com/apache/beam/pull/7256
thanks
chaim
On Mon, Dec 3, 2018 at 11:36 AM Jean-Baptiste Onofré  wrote:
>
> Can you please fix the conflict in the PR ?
>
> Thanks
> Regards
> JB
>
> On 03/12/2018 08:52, Chaim Turkel wrote:
> > it looks like there was a failure that is not due to the code, how can
> > i continue the process?
> > https://github.com/apache/beam/pull/7162
> >
> > On Thu, Nov 29, 2018 at 9:15 PM Chaim Turkel  wrote:
> >>
> >> hi,
> >>   i added another pr for the case of a self signed certificate ssl on
> >> the mongodb server
> >>
> >> https://github.com/apache/beam/pull/7162
> >> On Wed, Nov 28, 2018 at 5:16 PM Jean-Baptiste Onofré  
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I already upgraded locally. Let me push the PR.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 28/11/2018 16:02, Chaim Turkel wrote:
>  is there any reason that the mongo client version is still on 3.2.2?
>  can you upgrade it to 3.9.0?
>  chaim
>  On Tue, Nov 27, 2018 at 4:48 PM Jean-Baptiste Onofré  
>  wrote:
> >
> > Hi Chaim,
> >
> > The best is to create a Jira describing the new features you want to
> > add. Then, you can create a PR related to this Jira.
> >
> > As I'm the original MongoDbIO author, I would be more than happy to help
> > you and review the PR.
> >
> > Thanks !
> > Regards
> > JB
> >
> > On 27/11/2018 15:37, Chaim Turkel wrote:
> >> Hi,
> >>   I have added a few features to the MongoDbIO and would like to add
> >> them to the project.
> >> I have read https://beam.apache.org/contribute/
> >> I have added a jira user, what do i need to do next?
> >>
> >> chaim
> >>
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> 
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

-- 


Loans are funded by
FinWise Bank, a Utah-chartered bank located in Sandy, 
Utah, member FDIC, Equal
Opportunity Lender. Merchant Cash Advances are 
made by Behalf. For more
information on ECOA, click here 
. For important information about 
opening a new
account, review Patriot Act procedures here 
.
Visit Legal 
 to
review our comprehensive program terms, 
conditions, and disclosures. 


Re: Beam snapshots broken

2018-12-12 Thread Ismaël Mejía
You are right it seems that it was related to beam9 (wondering if it
was bad luck that it was always assigned to beam9 or we can improve
that poor balancing error).
However it failed again today against beam13 maybe this time is just a
build issue but seems related to python too.

On Tue, Dec 11, 2018 at 7:33 PM Boyuan Zhang  wrote:
>
> Seems like, all failed jobs are not owing to the single task failure. There 
> failed task were executed on beam9, which was rebooted yesterday because 
> python tests failed continuously. +Yifan Zou may have more useful content 
> here.
>
> On Tue, Dec 11, 2018 at 9:10 AM Ismaël Mejía  wrote:
>>
>> It seems that Beam snapshots are broken since Dec. 2
>> https://builds.apache.org/view/A-D/view/Beam/job/beam_Release_Gradle_NightlySnapshot/
>>
>> It seems "The :beam-website:startDockerContainer task failed."
>> Can somebody please take a look.


Re: Report to the Board, December 2018

2018-12-12 Thread Robert Bradshaw
Mostly looks good to me. I would probably omit the
{issues,builds}@beam.apache.org stats as "nothing significant in the
figures" but note that the dev list excludes the previous automated
emails (or at least a reference to where you say this above). Is it
worth noting StackOverflow activity here in this section?
On Wed, Dec 12, 2018 at 2:56 AM Kenneth Knowles  wrote:
>
> Reminder that this is just about due. I'll likely submit today as I have a 
> busy day tomorrow.
>
> Kenn
>
> On Thu, Nov 29, 2018 at 10:14 PM Kenneth Knowles  wrote:
>>
>> Hi all,
>>
>> My next (first!) project report to the ASF Board of Directors is due on 
>> 12/12. I've seeded a rough draft here: 
>> https://docs.google.com/document/d/1HenFg37xyNuFC7A4zkqmBPY9_Gqdi6LgPtF6wfoEix8/edit?usp=sharing
>>
>> If you have edits/content to propose, please email to this thread.
>>
>> Kenn


Re: GSOC - Summer of Code, on Beam?

2018-12-12 Thread Ismaël Mejía
Oh I had not seen that the announce was official, so time to get ready.
https://opensource.googleblog.com/2018/11/google-summer-of-code-15-years-strong.html

Mentors should have proposals ready around January 15, 2019. Remember
timeline matters.
https://developers.google.com/open-source/gsoc/timeline

On Tue, Dec 11, 2018 at 6:14 PM Ismaël Mejía  wrote:
>
> You have to register the concrete proposal, no need to register the
> project since the organization (ASF) is already part of GSoC.
> On Tue, Dec 11, 2018 at 12:42 PM Maximilian Michels  wrote:
> >
> > I think that's a great idea if we can find good candidates. Do we have to
> > register the project to be able to receive applications from students?
> >
> > On 07.12.18 16:44, Ismaël Mejía wrote:
> > > Last year we had two proposals. Kenneth proposal around SQL was
> > > accepted and if I remember correctly a success.
> > >
> > > For students interested you should do the complete process via the
> > > GSoC website and look for the ‘gsoc’ label.
> > >
> > > For committers who want to mentor students you have to subscribe into
> > > the GSoC website, but also to the mentors@ mailing list, where you
> > > should fill the proposal through a document shared there (and yes this
> > > is not documented at all, need to fix that). This document is prepared
> > > by the ASF because is the full organization (not the project) who
> > > requests the participation. Sadly I was not aware of this process and
> > > I missed the Apache deadline (some days before the GSoC deadline) last
> > > year and for that reason my proposal did not get accepted. So pay
> > > attention to that Pablo (or the others that want to mentor some
> > > students).
> > >
> > > On Thu, Dec 6, 2018 at 8:21 PM jhzgg2...@gmail.com  
> > > wrote:
> > >>
> > >>
> > >>
> > >> On 2018/12/05 00:29:48, Pablo Estrada  wrote:
> > >>> Hi Austin!
> > >>> Thanks a lot for surfacing this. I participated in GSOC as a student a
> > >>> couple times, and loved it. This being my first time around as a 
> > >>> committer,
> > >>> I'm excited to try and help.
> > >>>
> > >>> I think, for starters, it may be good to find issues in JIRA to label 
> > >>> with
> > >>> "gsoc", so please everyone who knows of good candidate project issues,
> > >>> label them with "gsoc".
> > >>>
> > >>> And then we can find mentors for these issues, and start helping 
> > >>> students
> > >>> in the application process.
> > >>>
> > >>> Best
> > >>> -P.
> > >>>
> > >>> On Tue, Dec 4, 2018 at 3:46 PM Austin Bennett 
> > >>> 
> > >>> wrote:
> > >>>
> >  Would it make sense to have any GSOC students for next summer work on
> >  Beam?  Do we have some candidate things that would be suitable and
> >  sufficiently discrete projects?
> > 
> >  Initial applications for organizations not even open for about a month,
> >  though thought worth getting a sense from the group.
> > 
> >  A bit of info:
> >  https://summerofcode.withgoogle.com/archive/
> > 
> >  https://opensource.googleblog.com/2018/11/google-summer-of-code-15-years-strong.html
> > 
> > 
> > 
> > 
> > >>> Hi Pablo!
> > >>> I am a junior majoring in CS and interested in Apache Beam and data 
> > >>> process. I hope to >participate in GSOC and work on Beam next summer. 
> > >>> Could you give me some advice on
> > >>> how to prepare for it? Thanks a lot.
> > >>