Re: Parallelizing test runs

2018-07-03 Thread Rafael Fernandez
Summary for all folks following this story -- and many thanks for
explaining configs to me and pointing me to files and such.

- Scott made changes to the config and we can now run 3
ValidatesRunner.Dataflow in parallel (each run is about 2 hours)
- With the latest quota changes, we peaked at ~70% capacity in concurrent
Dataflow jobs when running those
- I've been keeping an eye on quota peaks for all resources today and have
not seen any worryisome limits overall.
- Also note there are improvements planned to the ValidatesRunner.Dataflow
test so various items get batched and the test itself runs faster -- I
believe it's on Alan's radar

Cheers,
r

On Mon, Jul 2, 2018 at 4:23 PM Rafael Fernandez  wrote:

> Done!
>
> On Mon, Jul 2, 2018 at 4:10 PM Scott Wegner  wrote:
>
>> Hey Rafael, looks like we need more 'INSTANCE_TEMPLATES' quota [1]. Can
>> you take a look? I've filed [BEAM-4722]:
>> https://issues.apache.org/jira/browse/BEAM-4722
>>
>> [1] https://github.com/apache/beam/pull/5861#issuecomment-401963630
>>
>> On Mon, Jul 2, 2018 at 11:33 AM Rafael Fernandez 
>> wrote:
>>
>>> OK, Scott just sent https://github.com/apache/beam/pull/5860 . Quotas
>>> should not be a problem, if they are, please file a JIRA under gcp-quota.
>>>
>>> Cheers,
>>> r
>>>
>>> On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles  wrote:
>>>
 One thing that is nice when you do this is to be able to share your
 results. Though if all you are sharing is "they passed" then I guess we
 don't have to insist on evidence.

 Kenn

 On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner  wrote:

> A few thoughts:
>
> * The Jenkins job getting backed up
> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since
> Mikhail refactored Jenkins jobs, this only runs when explicitly requested
> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So this job
> is idle more often than backlogged.
>
> * It's difficult to reason about our exact quota needs because
> Dataflow jobs get launched from various Jenkins jobs that have different
> parallelism configurations. If we have budget, we could enable concurrent
> execution of this job and increase our quota enough to give some breathing
> room. If we do this, I recommend limiting the max concurrency via
> throttleConcurrentBuilds [2] to some reasonable limit.
>
> * This test suite is meant to be an exhaustive post-commit validation
> of Dataflow runner, and tests a lot of different aspects of a runner. It
> would be more efficient to run locally only the tests affected by your
> change. Note that this requires having access to a GCP project with
> billing, but most Dataflow developers probably have access to this 
> already.
> The command for this is:
>
> ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner
> -PdataflowProject=myGcpProject -PdataflowTempRoot=gs://myGcsTempRoot
> --tests "org.apache.beam.MyTestClass"
>
> [1]
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend
> [2]
> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds
>
>
> On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik  wrote:
>
>> The validates runner test parallelism is controlled here and is
>> currently set to be "unlimited":
>>
>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115
>>
>> Each test fork is run on a different gradle worker, so the number of
>> parallel test runs is limited to the max number of workers configured 
>> which
>> is controlled here:
>>
>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50
>> It is currently configured to 3 * number of CPU cores.
>>
>> We are already running up to 48 Dataflow jobs in parallel.
>>
>>
>> On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez 
>> wrote:
>>
>>> - How many resources to ValidatesRunner tests use?
>>> - Where are those settings?
>>>
>>> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax  wrote:
>>>
 The specific issue only affects Dataflow ValidatesRunner tests. We
 currently allow only one of these to run at a time, to control usage of
 Dataflow and of GCE quota. Other types of tests do not suffer from this
 issue.

 I would like to see if it's possible to increase Dataflow quota so
 we can run more of these in parallel. It took me 8 hours end to end to 
 run
 these tests (about 6 hours for the run to be scheduled). If there was a
 failure, I would have had to repeat the whole process. In the worst 
 case,
 

Re: Invite to comment on the @RequiresStableInput design doc

2018-07-03 Thread Lukasz Cwik
Does it make sense to only have some inputs be stable for a transform or
for the entire transform to require stable inputs?

On Tue, Jul 3, 2018 at 7:34 AM Kenneth Knowles  wrote:

> Since we always assume ProcessElement could have arbitrary side effects
> (esp. randomization), the state and timers set up by a call to
> ProcessElement cannot be considered stable until they are persisted. It
> seems very similar to the cost of outputting to a downstream
> @RequiresStableInput transform, if not an identical implementation.
>
> The thing timers add is a way to loop which you can't do if it is an
> output.
>
> Adding @Pure annotations might help, if the input elements are stable and
> ProcessElement is pure.
>
> Kenn
>
> On Mon, Jul 2, 2018 at 7:05 PM Reuven Lax  wrote:
>
>> The common use case for a timer is to read in data that was stored using
>> the state API in processElement. There is no guarantee that is stable, and
>> I believe no runner currently guarantees this. For example:
>>
>> class MyDoFn extends DoFn {
>>   @StateId("bag") private final StateSpec> buffer =
>> StateSpec.bag(ElementCoder.of());
>>   @TimerId("timer") private final TimerSpec =
>> TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
>>
>>   @ProcessElement public void processElement(@Element ElementT
>> element, @StateId("bag") BagState bag, @TimerId("timer") Timer
>> timer) {
>>   bag.add(element);
>>
>> timer.align(Duration.standardSeconds(30)).offset(Duration.standardSeconds(3)).setRelative();
>>   }
>>
>>   @OnTimer("timer") public void onTimer(@StateId("bag")
>> BagState bag) {
>> sendToExternalSystem(bag.read());
>>   }
>> }
>>
>> If you tagged onTimer with @RequiresStableInput, then you could guarantee
>> that if the timer retried then it would read the same elements out of the
>> bag. Today this is not guaranteed - the data written to the bag might not
>> even be persisted yet when the timer fires (for example, both the
>> processElement and the onTimer might be executed by the runner in the same
>> bundle).
>>
>> This particular example is a simplistic one of course - you could
>> accomplish the same thing with triggers. When Raghu worked on the
>> exactly-once Kafka sink this was very problematic. The final solution used
>> some specific details of Kafka to work, and is complicated and not portable
>> to other sinks.
>>
>> BTW - you can of course just have OnTimer produce the output to another
>> transform marked with RequiresStableInput. However this solution is very
>> expensive - every element must be persisted to stable storage multiple
>> times - and we tried hard to avoid doing this in the Kafka sink.
>>
>> Reuven
>>
>> On Mon, Jul 2, 2018 at 6:24 PM Robert Bradshaw 
>> wrote:
>>
>>> Could you give an example of such a usecase? (I suppose I'm not quite
>>> following what it means for a timer to be unstable...)
>>>
>>> On Mon, Jul 2, 2018 at 6:20 PM Reuven Lax  wrote:
>>>
 One issue: we definitely have some strong use cases where we want this
 on ProcessTimer but not on ProcessElement. Since both are on the same DoFn,
 I'm not sure how you would represent this as a separate transform.

 On Mon, Jul 2, 2018 at 5:05 PM Robert Bradshaw 
 wrote:

> Thanks for the writeup.
>
> I'm wondering with, rather than phrasing this as an annotation on DoFn
> methods that gets plumbed down through the portability representation, if
> it would make more sense to introduce a new, primitive "EnsureStableInput"
> transform. For those runners whose reshuffle provide stable inputs, they
> could use that as an implementation, and other runners could provide other
> suitable implementations.
>
>
>
> On Mon, Jul 2, 2018 at 3:26 PM Robin Qiu  wrote:
>
>> Hi everyone,
>>
>> Thanks for your feedback on the doc. I have revamped it according to
>> all of the comments. The major changes I have made are:
>> * The problem description should be more general and accurate now.
>> * I added more background information, such as details about
>> Reshuffle, so I should be easier to understand now.
>> * I made it clear what is the scope of my current project and what
>> could be left to future work.
>> * It now reflects the current progress of my work, and discusses how
>> it should work with the portable pipeline representation (WIP)
>>
>> Also, I forgot to mention last time that this doc may be interesting
>> to those of you interested in Reshuffle, because Reshuffle is used as a
>> current workaround for the problem described in the doc.
>>
>> More comments are always welcome.
>>
>> Best,
>> Robin
>>
>> On Fri, Jun 15, 2018 at 7:34 AM Kenneth Knowles 
>> wrote:
>>
>>> Thanks for the write up. It is great to see someone pushing this
>>> through.
>>>
>>> I wanted to bring Luke's high-level question back to the list for
>>> visibility: what about portability and 

Re: Invite to comment on the @RequiresStableInput design doc

2018-07-03 Thread Kenneth Knowles
Since we always assume ProcessElement could have arbitrary side effects
(esp. randomization), the state and timers set up by a call to
ProcessElement cannot be considered stable until they are persisted. It
seems very similar to the cost of outputting to a downstream
@RequiresStableInput transform, if not an identical implementation.

The thing timers add is a way to loop which you can't do if it is an output.

Adding @Pure annotations might help, if the input elements are stable and
ProcessElement is pure.

Kenn

On Mon, Jul 2, 2018 at 7:05 PM Reuven Lax  wrote:

> The common use case for a timer is to read in data that was stored using
> the state API in processElement. There is no guarantee that is stable, and
> I believe no runner currently guarantees this. For example:
>
> class MyDoFn extends DoFn {
>   @StateId("bag") private final StateSpec> buffer =
> StateSpec.bag(ElementCoder.of());
>   @TimerId("timer") private final TimerSpec =
> TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
>
>   @ProcessElement public void processElement(@Element ElementT
> element, @StateId("bag") BagState bag, @TimerId("timer") Timer
> timer) {
>   bag.add(element);
>
> timer.align(Duration.standardSeconds(30)).offset(Duration.standardSeconds(3)).setRelative();
>   }
>
>   @OnTimer("timer") public void onTimer(@StateId("bag")
> BagState bag) {
> sendToExternalSystem(bag.read());
>   }
> }
>
> If you tagged onTimer with @RequiresStableInput, then you could guarantee
> that if the timer retried then it would read the same elements out of the
> bag. Today this is not guaranteed - the data written to the bag might not
> even be persisted yet when the timer fires (for example, both the
> processElement and the onTimer might be executed by the runner in the same
> bundle).
>
> This particular example is a simplistic one of course - you could
> accomplish the same thing with triggers. When Raghu worked on the
> exactly-once Kafka sink this was very problematic. The final solution used
> some specific details of Kafka to work, and is complicated and not portable
> to other sinks.
>
> BTW - you can of course just have OnTimer produce the output to another
> transform marked with RequiresStableInput. However this solution is very
> expensive - every element must be persisted to stable storage multiple
> times - and we tried hard to avoid doing this in the Kafka sink.
>
> Reuven
>
> On Mon, Jul 2, 2018 at 6:24 PM Robert Bradshaw 
> wrote:
>
>> Could you give an example of such a usecase? (I suppose I'm not quite
>> following what it means for a timer to be unstable...)
>>
>> On Mon, Jul 2, 2018 at 6:20 PM Reuven Lax  wrote:
>>
>>> One issue: we definitely have some strong use cases where we want this
>>> on ProcessTimer but not on ProcessElement. Since both are on the same DoFn,
>>> I'm not sure how you would represent this as a separate transform.
>>>
>>> On Mon, Jul 2, 2018 at 5:05 PM Robert Bradshaw 
>>> wrote:
>>>
 Thanks for the writeup.

 I'm wondering with, rather than phrasing this as an annotation on DoFn
 methods that gets plumbed down through the portability representation, if
 it would make more sense to introduce a new, primitive "EnsureStableInput"
 transform. For those runners whose reshuffle provide stable inputs, they
 could use that as an implementation, and other runners could provide other
 suitable implementations.



 On Mon, Jul 2, 2018 at 3:26 PM Robin Qiu  wrote:

> Hi everyone,
>
> Thanks for your feedback on the doc. I have revamped it according to
> all of the comments. The major changes I have made are:
> * The problem description should be more general and accurate now.
> * I added more background information, such as details about
> Reshuffle, so I should be easier to understand now.
> * I made it clear what is the scope of my current project and what
> could be left to future work.
> * It now reflects the current progress of my work, and discusses how
> it should work with the portable pipeline representation (WIP)
>
> Also, I forgot to mention last time that this doc may be interesting
> to those of you interested in Reshuffle, because Reshuffle is used as a
> current workaround for the problem described in the doc.
>
> More comments are always welcome.
>
> Best,
> Robin
>
> On Fri, Jun 15, 2018 at 7:34 AM Kenneth Knowles 
> wrote:
>
>> Thanks for the write up. It is great to see someone pushing this
>> through.
>>
>> I wanted to bring Luke's high-level question back to the list for
>> visibility: what about portability and other SDKs?
>>
>> Portability is probably trivial, but the "other SDKs" question is
>> probably best answered by folks working on them who can have opinions 
>> about
>> how it works in their SDKs idioms.
>>
>> Kenn
>> ​
>>
>


Re: BiqQueryIO.write and Wait.on

2018-07-03 Thread Eugene Kirpichov
Awesome!! Thanks for the heads up, very exciting, this is going to make a
lot of people happy :)

On Tue, Jul 3, 2018, 3:40 AM Carlos Alonso  wrote:

> + dev@beam.apache.org
>
> Just a quick email to let you know that I'm starting developing this.
>
> On Fri, Apr 20, 2018 at 10:30 PM Eugene Kirpichov 
> wrote:
>
>> Hi Carlos,
>>
>> Thank you for expressing interest in taking this on! Let me give you a
>> few pointers to start, and I'll be happy to help everywhere along the way.
>>
>> Basically we want BigQueryIO.write() to return something (e.g. a
>> PCollection) that can be used as input to Wait.on().
>> Currently it returns a WriteResult, which only contains a
>> PCollection of failed inserts - that one can not be used
>> directly, instead we should add another component to WriteResult that
>> represents the result of successfully writing some data.
>>
>> Given that BQIO supports dynamic destination writes, I think it makes
>> sense for that to be a PCollection> so that in theory
>> we could sequence different destinations independently (currently Wait.on()
>> does not provide such a feature, but it could); and it will require
>> changing WriteResult to be WriteResult. As for what the "???"
>> might be - it is something that represents the result of successfully
>> writing a window of data. I think it can even be Void, or "?" (wildcard
>> type) for now, until we figure out something better.
>>
>> Implementing this would require roughly the following work:
>> - Add this PCollection> to WriteResult
>> - Modify the BatchLoads transform to provide it on both codepaths:
>> expandTriggered() and expandUntriggered()
>> ...- expandTriggered() itself writes via 2 codepaths: single-partition
>> and multi-partition. Both need to be handled - we need to get a
>> PCollection> from each of them, and Flatten these two
>> PCollections together to get the final result. The single-partition
>> codepath (writeSinglePartition) under the hood already uses WriteTables
>> that returns a KV so it's directly usable. The
>> multi-partition codepath ends in WriteRenameTriggered - unfortunately, this
>> codepath drops DestinationT along the way and will need to be refactored a
>> bit to keep it until the end.
>> ...- expandUntriggered() should be treated the same way.
>> - Modify the StreamingWriteTables transform to provide it
>> ...- Here also, the challenge is to propagate the DestinationT type all
>> the way until the end of StreamingWriteTables - it will need to be
>> refactored. After such a refactoring, returning a KV
>> should be easy.
>>
>> Another challenge with all of this is backwards compatibility in terms of
>> API and pipeline update.
>> Pipeline update is much less of a concern for the BatchLoads codepath,
>> because it's typically used in batch-mode pipelines that don't get updated.
>> I would recommend to start with this, perhaps even with only the
>> untriggered codepath (it is much more commonly used) - that will pave the
>> way for future work.
>>
>> Hope this helps, please ask more if something is unclear!
>>
>> On Fri, Apr 20, 2018 at 12:48 AM Carlos Alonso 
>> wrote:
>>
>>> Hey Eugene!!
>>>
>>> I’d gladly take a stab on it although I’m not sure how much available
>>> time I might have to put into but... yeah, let’s try it.
>>>
>>> Where should I begin? Is there a Jira issue or shall I file one?
>>>
>>> Thanks!
>>> On Thu, 12 Apr 2018 at 00:41, Eugene Kirpichov 
>>> wrote:
>>>
 Hi,

 Yes, you're both right - BigQueryIO.write() is currently not
 implemented in a way that it can be used with Wait.on(). It would certainly
 be a welcome contribution to change this - many people expressed interest
 in specifically waiting for BigQuery writes. Is any of you interested in
 helping out?

 Thanks.

 On Fri, Apr 6, 2018 at 12:36 AM Carlos Alonso 
 wrote:

> Hi Simon, I think your explanation was very accurate, at least to my
> understanding. I'd also be interested in getting batch load result's
> feedback on the pipeline... hopefully someone may suggest something,
> otherwise we could propose submitting a Jira, or even better, a PR!! :)
>
> Thanks!
>
> On Thu, Apr 5, 2018 at 2:01 PM Simon Kitching <
> simon.kitch...@unbelievable-machine.com> wrote:
>
>> Hi All,
>>
>> I need to write some data to BigQuery (batch-mode) and then send a
>> Pubsub message to trigger further processing.
>>
>> I found this thread titled "Callbacks/other functions run after a
>> PDone/output transform" on the user-list which was very relevant:
>>
>> https://lists.apache.org/thread.html/ddcdf93604396b1cbcacdff49aba60817dc90ee7c8434725ea0d26c0@%3Cuser.beam.apache.org%3E
>>
>> Thanks to the author of the Wait transform (Beam 2.4.0)!
>>
>> Unfortunately, it appears that the Wait.on transform does not work
>> with BiqQueryIO in FILE_LOADS mode - or at least I cannot get it to work.
>> Advice 

Re: BiqQueryIO.write and Wait.on

2018-07-03 Thread Carlos Alonso
+ dev@beam.apache.org

Just a quick email to let you know that I'm starting developing this.

On Fri, Apr 20, 2018 at 10:30 PM Eugene Kirpichov 
wrote:

> Hi Carlos,
>
> Thank you for expressing interest in taking this on! Let me give you a few
> pointers to start, and I'll be happy to help everywhere along the way.
>
> Basically we want BigQueryIO.write() to return something (e.g. a
> PCollection) that can be used as input to Wait.on().
> Currently it returns a WriteResult, which only contains a
> PCollection of failed inserts - that one can not be used
> directly, instead we should add another component to WriteResult that
> represents the result of successfully writing some data.
>
> Given that BQIO supports dynamic destination writes, I think it makes
> sense for that to be a PCollection> so that in theory
> we could sequence different destinations independently (currently Wait.on()
> does not provide such a feature, but it could); and it will require
> changing WriteResult to be WriteResult. As for what the "???"
> might be - it is something that represents the result of successfully
> writing a window of data. I think it can even be Void, or "?" (wildcard
> type) for now, until we figure out something better.
>
> Implementing this would require roughly the following work:
> - Add this PCollection> to WriteResult
> - Modify the BatchLoads transform to provide it on both codepaths:
> expandTriggered() and expandUntriggered()
> ...- expandTriggered() itself writes via 2 codepaths: single-partition and
> multi-partition. Both need to be handled - we need to get a
> PCollection> from each of them, and Flatten these two
> PCollections together to get the final result. The single-partition
> codepath (writeSinglePartition) under the hood already uses WriteTables
> that returns a KV so it's directly usable. The
> multi-partition codepath ends in WriteRenameTriggered - unfortunately, this
> codepath drops DestinationT along the way and will need to be refactored a
> bit to keep it until the end.
> ...- expandUntriggered() should be treated the same way.
> - Modify the StreamingWriteTables transform to provide it
> ...- Here also, the challenge is to propagate the DestinationT type all
> the way until the end of StreamingWriteTables - it will need to be
> refactored. After such a refactoring, returning a KV
> should be easy.
>
> Another challenge with all of this is backwards compatibility in terms of
> API and pipeline update.
> Pipeline update is much less of a concern for the BatchLoads codepath,
> because it's typically used in batch-mode pipelines that don't get updated.
> I would recommend to start with this, perhaps even with only the
> untriggered codepath (it is much more commonly used) - that will pave the
> way for future work.
>
> Hope this helps, please ask more if something is unclear!
>
> On Fri, Apr 20, 2018 at 12:48 AM Carlos Alonso 
> wrote:
>
>> Hey Eugene!!
>>
>> I’d gladly take a stab on it although I’m not sure how much available
>> time I might have to put into but... yeah, let’s try it.
>>
>> Where should I begin? Is there a Jira issue or shall I file one?
>>
>> Thanks!
>> On Thu, 12 Apr 2018 at 00:41, Eugene Kirpichov 
>> wrote:
>>
>>> Hi,
>>>
>>> Yes, you're both right - BigQueryIO.write() is currently not implemented
>>> in a way that it can be used with Wait.on(). It would certainly be a
>>> welcome contribution to change this - many people expressed interest in
>>> specifically waiting for BigQuery writes. Is any of you interested in
>>> helping out?
>>>
>>> Thanks.
>>>
>>> On Fri, Apr 6, 2018 at 12:36 AM Carlos Alonso 
>>> wrote:
>>>
 Hi Simon, I think your explanation was very accurate, at least to my
 understanding. I'd also be interested in getting batch load result's
 feedback on the pipeline... hopefully someone may suggest something,
 otherwise we could propose submitting a Jira, or even better, a PR!! :)

 Thanks!

 On Thu, Apr 5, 2018 at 2:01 PM Simon Kitching <
 simon.kitch...@unbelievable-machine.com> wrote:

> Hi All,
>
> I need to write some data to BigQuery (batch-mode) and then send a
> Pubsub message to trigger further processing.
>
> I found this thread titled "Callbacks/other functions run after a
> PDone/output transform" on the user-list which was very relevant:
>
> https://lists.apache.org/thread.html/ddcdf93604396b1cbcacdff49aba60817dc90ee7c8434725ea0d26c0@%3Cuser.beam.apache.org%3E
>
> Thanks to the author of the Wait transform (Beam 2.4.0)!
>
> Unfortunately, it appears that the Wait.on transform does not work
> with BiqQueryIO in FILE_LOADS mode - or at least I cannot get it to work.
> Advice appreciated.
>
> Here's (most of) the relevant test code:
> Pipeline p = Pipeline.create(options);
> PCollection lines = p.apply("Read Input",
> Create.of("line1", "line2", "line3", "line4"));
>
> TableFieldSchema f1 = 

Re: Help! Beam SQL needs more committer support

2018-07-03 Thread James
Count me in, assign the PR to me when it is appropriate.

On Tue, Jul 3, 2018 at 4:49 PM Ismaël Mejía  wrote:

> It has been long time since the last time I checked at the SQL work but in
> case you guys still need more people to review you can count on me too.
>
> On Mon, Jul 2, 2018 at 7:23 PM Rui Wang  wrote:
>
>> Haha! I don't even know who I should @ now because there are so many
>> helpful hands!
>>
>> -Rui
>>
>> On Mon, Jul 2, 2018 at 9:24 AM Andrew Pilloud 
>> wrote:
>>
>>> Thanks to everyone who's volunteered to help review SQL PRs. Sounds like
>>> we will be in good hands while Kenn is out!
>>>
>>> Andrew
>>>
>>> On Fri, Jun 29, 2018 at 7:45 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 I’d be happy to learn and help with Beam SQL as well!

 Alexey

 On 28 Jun 2018, at 22:12, Ahmet Altay  wrote:

 I am also happy to happy to help and interested in learning more about
 Beam SQL.

 On Thu, Jun 28, 2018 at 11:53 AM, Kai Jiang  wrote:

> I am very interested in supporting SQL PRs to help you reduce your
> bandwidths.
>
> Best,
> Kai
>
>
> On Thu, Jun 28, 2018, 11:52 Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> Same as Pablo - you can count me in, though am vastly less experience
>> in the Beam way, as him, etc.
>>
>>
>> On Thu, Jun 28, 2018 at 11:36 AM, Pablo Estrada 
>> wrote:
>>
>>> I am always willing to help : )
>>> I'd say I can look at Python changes, and be a second-place option
>>> for Java changes (some experience but not as much on the Java side).
>>> Best
>>> -P
>>>
>>> On Thu, Jun 28, 2018 at 10:50 AM Kenneth Knowles 
>>> wrote:
>>>
 Hi all,

 I have been reviewing a ton of PRs from Andrew (@apilloud), Anton
 (@akedin), Rui (@amaliujia), and Kai (@vectorijk) and so has Mingmin
 (@XuMingmin). Beam SQL is really moving!*

 But I worry that when I am on leave there will be a shortage of
 committer review availability. I don't think any single person can do 
 it,
 even if Mingmin decided to make it a full time job :-). So I want to 
 put a
 call out to the community.

 Would anyone be interested in being a committer supporter part time
 for SQL PRs?

 Kenn

 *There are about 350 non-merge non-website commits in 2.6.0 so far,
 and here's some flavor:
  - 66 touch sdks/python/
  - 100 touch some runner/
  - 75 touch sdks/java/extensions/sql/

>>> --
>>> Got feedback? go/pabloem-feedback
>>> 
>>>
>>
>>




Re: Help! Beam SQL needs more committer support

2018-07-03 Thread Ismaël Mejía
It has been long time since the last time I checked at the SQL work but in
case you guys still need more people to review you can count on me too.

On Mon, Jul 2, 2018 at 7:23 PM Rui Wang  wrote:

> Haha! I don't even know who I should @ now because there are so many
> helpful hands!
>
> -Rui
>
> On Mon, Jul 2, 2018 at 9:24 AM Andrew Pilloud  wrote:
>
>> Thanks to everyone who's volunteered to help review SQL PRs. Sounds like
>> we will be in good hands while Kenn is out!
>>
>> Andrew
>>
>> On Fri, Jun 29, 2018 at 7:45 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> I’d be happy to learn and help with Beam SQL as well!
>>>
>>> Alexey
>>>
>>> On 28 Jun 2018, at 22:12, Ahmet Altay  wrote:
>>>
>>> I am also happy to happy to help and interested in learning more about
>>> Beam SQL.
>>>
>>> On Thu, Jun 28, 2018 at 11:53 AM, Kai Jiang  wrote:
>>>
 I am very interested in supporting SQL PRs to help you reduce your
 bandwidths.

 Best,
 Kai


 On Thu, Jun 28, 2018, 11:52 Austin Bennett 
 wrote:

> Same as Pablo - you can count me in, though am vastly less experience
> in the Beam way, as him, etc.
>
>
> On Thu, Jun 28, 2018 at 11:36 AM, Pablo Estrada 
> wrote:
>
>> I am always willing to help : )
>> I'd say I can look at Python changes, and be a second-place option
>> for Java changes (some experience but not as much on the Java side).
>> Best
>> -P
>>
>> On Thu, Jun 28, 2018 at 10:50 AM Kenneth Knowles 
>> wrote:
>>
>>> Hi all,
>>>
>>> I have been reviewing a ton of PRs from Andrew (@apilloud), Anton
>>> (@akedin), Rui (@amaliujia), and Kai (@vectorijk) and so has Mingmin
>>> (@XuMingmin). Beam SQL is really moving!*
>>>
>>> But I worry that when I am on leave there will be a shortage of
>>> committer review availability. I don't think any single person can do 
>>> it,
>>> even if Mingmin decided to make it a full time job :-). So I want to 
>>> put a
>>> call out to the community.
>>>
>>> Would anyone be interested in being a committer supporter part time
>>> for SQL PRs?
>>>
>>> Kenn
>>>
>>> *There are about 350 non-merge non-website commits in 2.6.0 so far,
>>> and here's some flavor:
>>>  - 66 touch sdks/python/
>>>  - 100 touch some runner/
>>>  - 75 touch sdks/java/extensions/sql/
>>>
>> --
>> Got feedback? go/pabloem-feedback
>> 
>>
>
>
>>>
>>>


Re: [ANN] Apache Beam 2.5.0 has been released!

2018-07-03 Thread Jason Kuster
Excellent news, thank you JB and everyone else who helped out with the
release!

On Tue, Jul 3, 2018 at 12:00 AM Etienne Chauchot 
wrote:

> Nice!
>
> Thanks for all the work you did on the release JB !
>
> Etienne
>
> Le dimanche 01 juillet 2018 à 06:27 +0200, Jean-Baptiste Onofré a écrit :
>
> The Apache Beam team is pleased to announce the release of 2.5.0 version!
>
>
> You can download the release here:
>
>
>   https://beam.apache.org/get-started/downloads/
>
>
> This release includes the following major new features & improvements:
>
> - Go SDK support
>
> - new ParquetIO
>
> - Build migrated to Gradle (including for the release)
>
> - Improvements on Nexmark as Kafka support
>
> - Improvements on Beam SQL DSL
>
> - Improvements on Portability
>
> - New metrics support pushing generic to all runners
>
>
> You can take a look on the following blog post and Release Notes for
>
> details:
>
>
> https://beam.apache.org/blog/2018/06/26/beam-2.5.0.html
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12342847
>
>
> Enjoy !!
>
>
> --
>
> JB on behalf of The Apache Beam team
>
>
>

-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow

See something? Say something. go/jasonkuster-feedback