Re: [PROPOSAL] Prepare Beam 2.8.0 release

2018-10-03 Thread Ted Yu
+1

On Wed, Oct 3, 2018 at 9:52 AM Jean-Baptiste Onofré  wrote:

> +1
>
> but we have to be fast in release process. 2.7.0 took more than 1 month
> to be cut !
>
> If no blocker, we have to just move forward.
>
> Regards
> JB
>
> On 03/10/2018 18:25, Ahmet Altay wrote:
> > Hi all,
> >
> > Release cut date for the next release is 10/10 according to Beam release
> > calendar [1]. Since the previous release is already mostly wrapped up
> > (modulo blog post), I would like to propose starting the next release on
> > time (10/10).
> >
> > Additionally I propose designating this release as the first
> > long-term-support (LTS) release [2]. This should have no impact on the
> > release process, however it would mean that we commit to patch this
> > release for the next 12 months for major issues.
> >
> > I volunteer to perform this release.
> >
> > What do you think?
> >
> > Ahmet
> >
> > [1]
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com=America%2FLos_Angeles
> > [2] https://beam.apache.org/community/policies/#releases
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [apachecon 2018] Universal metrics with apache beam

2018-10-03 Thread Ted Yu
Very interesting talk, Etienne.

Looking forward to the audio recording.

Cheers


Re: [DISCUSS] [BEAM-4126] Deleting Maven build files (pom.xml) grace period?

2018-06-06 Thread Ted Yu
+1 on this effort
 Original message From: Chamikara Jayalath 
 Date: 6/6/18  2:09 PM  (GMT-08:00) To: 
dev@beam.apache.org, u...@beam.apache.org Subject: Re: [DISCUSS] [BEAM-4126] 
Deleting Maven build files (pom.xml) grace period? 
+1 for the overall effort. As Pablo mentioned, we need some time to migrate 
internal Dataflow build off of Maven build files. I created 
https://issues.apache.org/jira/browse/BEAM-4512 for this.

Thanks,Cham
On Wed, Jun 6, 2018 at 1:30 PM Eugene Kirpichov  wrote:
Is it possible for Dataflow to just keep a copy of the pom.xmls and delete it 
as soon as Dataflow is migrated?
Overall +1, I've been using Gradle without issues for a while and almost forgot 
pom.xml's still existed.

On Wed, Jun 6, 2018, 1:13 PM Pablo Estrada  wrote:
I agree that we should delete the pom.xml files soon, as they create a burden 
for maintainers. 
I'd like to be able to extend the grace period by a bit, to allow the internal 
build systems at Google to move away from using the Beam poms.
We use these pom files to build Dataflow workers, and thus it's critical for us 
that they are available for a few more weeks while we set up a gradle build. 
Perhaps 4 weeks? (Calling out+Chamikara Jayalath who has recently worked on 
internal Dataflow tooling.)
Best-P.
On Wed, Jun 6, 2018 at 1:05 PM Lukasz Cwik  wrote:
Note: Apache Beam will still provide pom.xml for each release it produces. This 
is only about people using Maven to build Apache Beam themselves and not 
relying on the released artifacts in Maven Central.
With the first release using Gradle as the build system is underway, I wanted 
to start this thread to remind people that we are going to delete the Maven 
pom.xml files after the 2.5.0 release is finalized plus a two week grace period.
Are there others who would like a shorter/longer grace period?

The PR to delete the pom.xml is here: https://github.com/apache/beam/pull/5571
-- 
Got feedback? go/pabloem-feedback




Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Ted Yu
I see.

I have added myself as watcher on BEAM-3788.

Thanks

On Thu, Mar 8, 2018 at 4:51 PM, Eugene Kirpichov <kirpic...@google.com>
wrote:

> Hi Ted - KafkaIO is not yet implemented using Splittable DoFn's (it was
> implemented before SDFs existed and hasn't been rewritten yet), but it will
> be, once more runners catch up with the support: currently we have Dataflow
> and Flink. +Chamikara Jayalath <chamik...@google.com> is currently
> working on implementing it using SDFs in the Python SDK.
>
> On Thu, Mar 8, 2018 at 4:34 PM Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Eugene:
>> Very informative talk.
>>
>> I looked at:
>> sdks/java/core/src/test/java/org/apache/beam/sdk/
>> transforms/splittabledofn/OffsetRangeTrackerTest.java
>>
>> Is there some example showing how OffsetRangeTracker works with Kafka
>> partition(s) ?
>>
>> Thanks
>>
>> On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov <kirpic...@google.com>
>> wrote:
>>
>>> Hi Thomas!
>>>
>>> In case of tailing a Kafka partition, the restriction would be
>>> [start_offset, infinity), and it would keep being split by checkpointing
>>> into [start_offset, end_offset) and [end_offset, infinity)
>>>
>>> On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise <t...@apache.org> wrote:
>>>
>>>> Eugene,
>>>>
>>>> I actually had one question regarding the application of SDF for the
>>>> Kafka consumer. Reading through a topic partition can be parallel by
>>>> splitting a partition into multiple restrictions (for use cases where order
>>>> does not matter). But how would the tail read be managed? I assume there
>>>> would not be a new restriction whenever new records arrive (added latency)?
>>>> The examples on slide 40 show an end offset for Kafka, but for a continuous
>>>> read there wouldn't be an end offset?
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>>
>>>> On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise <t...@apache.org> wrote:
>>>>
>>>>> Great, thanks for sharing!
>>>>>
>>>>>
>>>>> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov <
>>>>> kirpic...@google.com> wrote:
>>>>>
>>>>>> Oops that's just the template I used. Thanks for noticing, will
>>>>>> regenerate the PDF and reupload when I get to it.
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 8, 2018, 11:59 AM Dan Halperin <dhalp...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Looks like it was a good talk! Why is it Google Confidential &
>>>>>>> Proprietary, though?
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov <
>>>>>>> kirpic...@google.com> wrote:
>>>>>>>
>>>>>>>> Hey all,
>>>>>>>>
>>>>>>>> The slides for my yesterday's talk at Strata San Jose
>>>>>>>> https://conferences.oreilly.com/strata/strata-ca/
>>>>>>>> public/schedule/detail/63696 have been posted on the talk page.
>>>>>>>> They may be of interest both to users and IO authors.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>


Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Ted Yu
Eugene:
Very informative talk.

I looked at:
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/splittabledofn/OffsetRangeTrackerTest.java

Is there some example showing how OffsetRangeTracker works with Kafka
partition(s) ?

Thanks

On Thu, Mar 8, 2018 at 3:58 PM, Eugene Kirpichov 
wrote:

> Hi Thomas!
>
> In case of tailing a Kafka partition, the restriction would be
> [start_offset, infinity), and it would keep being split by checkpointing
> into [start_offset, end_offset) and [end_offset, infinity)
>
> On Thu, Mar 8, 2018 at 3:52 PM Thomas Weise  wrote:
>
>> Eugene,
>>
>> I actually had one question regarding the application of SDF for the
>> Kafka consumer. Reading through a topic partition can be parallel by
>> splitting a partition into multiple restrictions (for use cases where order
>> does not matter). But how would the tail read be managed? I assume there
>> would not be a new restriction whenever new records arrive (added latency)?
>> The examples on slide 40 show an end offset for Kafka, but for a continuous
>> read there wouldn't be an end offset?
>>
>> Thanks,
>> Thomas
>>
>>
>> On Thu, Mar 8, 2018 at 2:59 PM, Thomas Weise  wrote:
>>
>>> Great, thanks for sharing!
>>>
>>>
>>> On Thu, Mar 8, 2018 at 12:16 PM, Eugene Kirpichov 
>>> wrote:
>>>
 Oops that's just the template I used. Thanks for noticing, will
 regenerate the PDF and reupload when I get to it.


 On Thu, Mar 8, 2018, 11:59 AM Dan Halperin  wrote:

> Looks like it was a good talk! Why is it Google Confidential &
> Proprietary, though?
>
> Dan
>
> On Thu, Mar 8, 2018 at 11:49 AM, Eugene Kirpichov <
> kirpic...@google.com> wrote:
>
>> Hey all,
>>
>> The slides for my yesterday's talk at Strata San Jose
>> https://conferences.oreilly.com/strata/strata-ca/
>> public/schedule/detail/63696 have been posted on the talk page. They
>> may be of interest both to users and IO authors.
>>
>> Thanks.
>>
>
>
>>>
>>


Re: [VOTE] Release 2.3.0, release candidate #3

2018-02-12 Thread Ted Yu
bq. is OK with staging repo extension ?

+1 on using the above approach.

Cheers


Re: [DISCUSS] State of the project: Culture and governance

2018-02-04 Thread Ted Yu
bq. 1. Propose a pull request to their fix branch. This is my favorite and
I've mentioned it. Everything is straightforward and explicit.

The above makes sense. When the contributor merges the reviewer's pull
request, it signifies their willingness to adopt the suggestion, making the
combined pull request closer to being merged.

Cheers


Re: Eclipse support

2018-01-17 Thread Ted Yu
Have you tried running 'mvn eclipse:eclipse' and importing from the root of
workspace ?

On Wed, Jan 17, 2018 at 10:32 AM, Ron Gonzalez  wrote:

> Hi,
>   I've been trying this for a couple of days now, but I can't seem to get
> a clean Eclipse import.
>   I refreshed to latest master, got a clean mvn -DskipTests clean install,
> ran through the Eclipse setup steps for m2e-apt installation.
>   I'm getting errors like below. Do you have any tips to get this going?
>
> Thanks,
> Ron
>
> Description Resource Path Location Type
> ACCUMULATING cannot be resolved to a variable
> WindowingStrategyTranslation.java /beam-runners-core-
> construction-java/src/main/java/org/apache/beam/runners/core/construction line
> 56 Java Problem
> AFTER_ALL cannot be resolved to a variable TriggerStateMachines.java
> /beam-runners-core-java/src/main/java/org/apache/beam/
> runners/core/triggers line 34 Java Problem
> AFTER_ALL cannot be resolved to a variable TriggerTranslation.java
> /beam-runners-core-construction-java/src/main/
> java/org/apache/beam/runners/core/construction line 241 Java Problem
> AFTER_ANY cannot be resolved to a variable TriggerStateMachines.java
> /beam-runners-core-java/src/main/java/org/apache/beam/
> runners/core/triggers line 37 Java Problem
> AFTER_ANY cannot be resolved to a variable TriggerTranslation.java
> /beam-runners-core-construction-java/src/main/
> java/org/apache/beam/runners/core/construction line 243 Java Problem
> AFTER_EACH cannot be resolved to a variable TriggerStateMachines.java
> /beam-runners-core-java/src/main/java/org/apache/beam/
> runners/core/triggers line 59 Java Problem
> AFTER_EACH cannot be resolved to a variable TriggerTranslation.java
> /beam-runners-core-construction-java/src/main/
> java/org/apache/beam/runners/core/construction line 245 Java Problem
> AFTER_END_OF_WINDOW cannot be resolved to a variable
> TriggerStateMachines.java /beam-runners-core-java/src/
> main/java/org/apache/beam/runners/core/triggers line 40 Java Problem
> AFTER_END_OF_WINDOW cannot be resolved to a variable
> TriggerTranslation.java /beam-runners-core-construction-java/src/main/
> java/org/apache/beam/runners/core/construction line 248 Java Problem
> AFTER_PROCESSING_TIME cannot be resolved to a variable
> TriggerStateMachines.java /beam-runners-core-java/src/
> main/java/org/apache/beam/runners/core/triggers line 62 Java Problem
> AFTER_PROCESSING_TIME cannot be resolved to a variable
> TriggerTranslation.java /beam-runners-core-construction-java/src/main/
> java/org/apache/beam/runners/core/construction line 276 Java Problem
> AFTER_SYNCHRONIZED_PROCESSING_TIME cannot be resolved to a variable
> TriggerStateMachines.java /beam-runners-core-java/src/
> main/java/org/apache/beam/runners/core/triggers line 45 Java Problem
> AFTER_SYNCHRONIZED_PROCESSING_TIME cannot be resolved to a variable
> TriggerTranslation.java /beam-runners-core-construction-java/src/main/
> java/org/apache/beam/runners/core/construction line 302 Java Problem
> ALIGN_TO cannot be resolved to a variable TriggerStateMachines.java
> /beam-runners-core-java/src/main/java/org/apache/beam/
> runners/core/triggers line 94 Java Problem
> ALIGN_TO cannot be resolved to a variable TriggerTranslation.java
> /beam-runners-core-construction-java/src/main/
> java/org/apache/beam/runners/core/construction line 281 Java Problem
> ALWAYS cannot be resolved to a variable TriggerStateMachines.java
> /beam-runners-core-java/src/main/java/org/apache/beam/
> runners/core/triggers line 51 Java Problem
> ALWAYS cannot be resolved to a variable TriggerTranslation.java
> /beam-runners-core-construction-java/src/main/
> java/org/apache/beam/runners/core/construction line 304 Java Problem
> ApiServiceDescriptor cannot be resolved GrpcFnServer.java
> /beam-runners-java-fn-execution/src/main/java/org/apache/beam/runners/
> fnexecution line 37 Java Problem
>
>
>
> Thanks,
> Ron
>


Re: Strata Conference this March 6-8

2018-01-16 Thread Ted Yu
+1 to BoF

On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk 
wrote:

> Probably won't be attending the conference, but totally down for a BoF.
>
> On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau 
> wrote:
>
>> Do interested folks have any timing constraints around a BoF?
>>
>> On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson <
>> je...@bigdatainstitute.io> wrote:
>>
>>> +1 to BoF. I don't know if any Beam talks will be on the schedule.
>>>
>>> > We could do an informal BoF at the Philz nearby or similar?
>>>
>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>


Re: [DISCUSS] State of the project

2018-01-15 Thread Ted Yu
bq. are hard to detect in our unit-test framework

Looks like more integration tests would help discover bug / regression more
quickly. If committer reviewing the PR has concern in this regard, the
concern should be stated on the PR so that the contributor (and reviewer)
can spend more time in solidifying the solution.

bq. I've gone and fixed these issues myself when merging

We can make stricter checkstyle rules so that the code wouldn't pass build
without addressing commonly known issues.

Cheers

On Sun, Jan 14, 2018 at 12:37 PM, Reuven Lax <re...@google.com> wrote:

> I agree with the sentiment, but I don't completely agree with the
> criteria.
>
> I think we need to be much better about reviewing PRs. Some PRs languish
> for too long before the reviewer gets to it (and I've been guilty of this
> too), which does not send a good message. Also new PRs sometimes languish
> because there is no reviewer assigned; maybe we could write a gitbot to
> automatically assign a reviewer to every new PR?
>
> Also, I think that the bar for merging a PR from a contributor should not
> be "the PR is perfect." It's perfectly fine to merge a PR that still has
> some issues (especially if the issues are stylistic). In the past when I've
> done this, I've gone and fixed these issues myself when merging. It was a
> bit more work for me to fix these things myself, but it was a small price
> to pay in order to portray Beam as a welcoming place for contributions.
>
> On the other hand, "the build does not break" is - in my opinion - too
> weak of a criterion for merging. A few reasons for this:
>
>   * Beam is a data-processing framework, and data integrity is paramount.
> If a reviewer sees an issue that could lead to data loss (or duplication,
> or corruption), I don't think that PR should be merged. Historically many
> such issues only actually manifest at scale, and are hard to detect in our
> unit-test framework. (we also need to invest in more at-scale tests to
> catch such issues).
>
>   * Beam guarantees backwards compatibility for users (except across major
> versions). If a bad API gets merged and released (and the chances of
> "forgetting" about it before the release is cut is unfortunately high), we
> are stuck with it. This is less of an issue for many other open-source
> projects that do not make such a compatibility guarantee, as they are able
> to simply remove or fix the API in the next version.
>
> I think we still need honest review of PRs, with the criteria being
> stronger than "the build doesn't break." However reviewers also need to be
> reasonable about what they ask for.
>
> Reuven
>
> On Sun, Jan 14, 2018 at 11:19 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> bq. if a PR is basically right (it does what it should) without breaking
>> the build, then it has to be merged fast
>>
>> +1 on above.
>> This would give contributors positive feedback.
>>
>> On Sun, Jan 14, 2018 at 8:13 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Davor,
>>>
>>> Thanks a lot for this e-mail.
>>>
>>> I would like to emphasize two areas where we have to improve:
>>>
>>> 1. Apache way and community. We still have to focus and being dedicated
>>> on our communities (both user & dev). Helping, encouraging, growing our
>>> communities is key for the project. Building bridges between communities is
>>> also very important. We have to be more "accessible": sometime simplifying
>>> our discussions, showing more interest and open minded in the proposals
>>> would help as well. I think we do a good job already: we just have to
>>> improve.
>>>
>>> 2. Execution: a successful project is a project with a regular activity
>>> in term of releases, fixes, improvements.
>>> Regarding the PR, I think today we have a PR opened for long. And I
>>> think for three reasons:
>>> - some are not ready, not good enough, no question on these ones
>>> - some needs reviewer and speed up: we have to be careful on the open
>>> PRs and review asap
>>> - some are under review but we have a lot of "ping pong" and long
>>> discussion, not always justified. I already said that on the mailing list
>>> but, as for other Apache projects, if a PR is basically right (it does what
>>> it should) without breaking the build, then it has to be merged fast. If it
>>> requires additional changes (tests, polishing, improvements, ...), then it
>>> can be addressed in new PRs.
>>> As already mentioned in the Beam 2.3.0 thread, we have to adopt a
>>&

Re: Dataflow runner examples build fail

2018-01-08 Thread Ted Yu
+1
 Original message From: Jean-Baptiste Onofré 
 Date: 1/8/18  1:26 AM  (GMT-08:00) To: dev@beam.apache.org 
Subject: Dataflow runner examples build fail 
Hi guys,

The PRs and nightly builds are failing due to an issue with the dataflow 
platform: it seems we have a disk quota exceeded on the us-central1 region.

I would like to do a clean out and increase the quota a bit.

Thoughts ?

Thanks
Regards
JB
-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Choose the "new" Spark runner

2017-11-17 Thread Ted Yu
[ ] Use Spark 1 & Spark 2 Support Branch
[X] Use Spark 2 Only Branch

On Thu, Nov 16, 2017 at 5:08 AM, Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> To illustrate the current discussion about Spark versions support, you can
> take a look on:
>
> --
> Spark 1 & Spark 2 Support Branch
>
> https://github.com/jbonofre/beam/tree/BEAM-1920-SPARK2-MODULES
>
> This branch contains a Spark runner common module compatible with both
> Spark 1.x and 2.x. For convenience, we introduced spark1 & spark2
> modules/artifacts containing just a pom.xml to define the dependencies set.
>
> --
> Spark 2 Only Branch
>
> https://github.com/jbonofre/beam/tree/BEAM-1920-SPARK2-ONLY
>
> This branch is an upgrade to Spark 2.x and "drop" support of Spark 1.x.
>
> As I'm ready to merge one of the other in the PR, I would like to complete
> the vote/discussion pretty soon.
>
> Correct me if I'm wrong, but it seems that the preference is to drop Spark
> 1.x to focus only on Spark 2.x (for the Spark 2 Only Branch).
>
> I would like to call a final vote to act the merge I will do:
>
> [ ] Use Spark 1 & Spark 2 Support Branch
> [ ] Use Spark 2 Only Branch
>
> This informal vote is open for 48 hours.
>
> Please, let me know what your preference is.
>
> Thanks !
> Regards
> JB
>
> On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote:
>
>> Hi Beamers,
>>
>> I'm forwarding this discussion & vote from the dev mailing list to the
>> user mailing list.
>> The goal is to have your feedback as user.
>>
>> Basically, we have two options:
>> 1. Right now, in the PR, we support both Spark 1.x and 2.x using three
>> artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2
>> in your dependencies set depending the Spark target version you want.
>> 2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0.
>> If you still want to use Spark 1.x, then, you will be stuck up to Beam
>> 2.2.0.
>>
>> Thoughts ?
>>
>> Thanks !
>> Regards
>> JB
>>
>>
>>  Forwarded Message 
>> Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
>> Date: Wed, 8 Nov 2017 08:27:58 +0100
>> From: Jean-Baptiste Onofré 
>> Reply-To: dev@beam.apache.org
>> To: dev@beam.apache.org
>>
>> Hi all,
>>
>> as you might know, we are working on Spark 2.x support in the Spark
>> runner.
>>
>> I'm working on a PR about that:
>>
>> https://github.com/apache/beam/pull/3808
>>
>> Today, we have something working with both Spark 1.x and 2.x from a code
>> standpoint, but I have to deal with dependencies. It's the first step of
>> the update as I'm still using RDD, the second step would be to support
>> dataframe (but for that, I would need PCollection elements with schemas,
>> that's another topic on which Eugene, Reuven and I are discussing).
>>
>> However, as all major distributions now ship Spark 2.x, I don't think
>> it's required anymore to support Spark 1.x.
>>
>> If we agree, I will update and cleanup the PR to only support and focus
>> on Spark 2.x.
>>
>> So, that's why I'm calling for a vote:
>>
>>[ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>>[ ] 0 (I don't care ;))
>>[ ] -1, I would like to still support Spark 1.x, and so having support
>> of both Spark 1.x and 2.x (please provide specific comment)
>>
>> This vote is open for 48 hours (I have the commits ready, just waiting
>> the end of the vote to push on the PR).
>>
>> Thanks !
>> Regards
>> JB
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [VOTE] Release 2.2.0, release candidate #3

2017-11-10 Thread Ted Yu
Considering that the holiday is around the corner, it would be nice to release 
2.2.0 sooner. 
Cheers 
 Original message From: Chamikara Jayalath 
 Date: 11/10/17  12:22 PM  (GMT-08:00) To: 
dev@beam.apache.org Subject: Re: [VOTE] Release 2.2.0, release candidate #3 
We found another issue that should probably be fixed in 2.2.0 release:
https://issues.apache.org/jira/browse/BEAM-3172

A fix is out for review and will be merged soon.

Thanks,
Cham

On Fri, Nov 10, 2017 at 10:43 AM Eugene Kirpichov
 wrote:

> Unfortunately I think I found a data loss bug - it was there since 2.0.0
> but I think it's serious enough that delaying a fix until the next release
> would be irresponsible.
> See https://issues.apache.org/jira/browse/BEAM-3169
>
> On Thu, Nov 9, 2017 at 3:57 PM Robert Bradshaw  >
> wrote:
>
> > Our release notes look like nothing more than a query for the closed
> > jira issues. Do we have a top-level summary to highlight the big
> > ticket items in the release? And in particular somewhere to mention
> > that this is likely the last release to support Java 7 that'll get
> > widely read?
> >
> > On Thu, Nov 9, 2017 at 3:39 PM, Reuven Lax 
> > wrote:
> > > Thanks,
> > >
> > > This RC is currently failing on a number of validation steps, so we
> need
> > to
> > > cut at least one more RC. Fingers crossed that it will be the last one.
> > >
> > > Reuven
> > >
> > > On Thu, Nov 9, 2017 at 3:36 PM, Konstantinos Katsiapis <
> > > katsia...@google.com.invalid> wrote:
> > >
> > >> Just a remark: Release of Tensorflow Transform
> > >>  0.4.0 depends on release of
> > >> Apache Beam 2.2.0 so upvoting for a release (the sooner the better).
> > >>
> > >> On Thu, Nov 9, 2017 at 3:33 PM, Reuven Lax 
> > >> wrote:
> > >>
> > >> > Are we waiting for any more validation of this candidate? If people
> > are
> > >> > still running tests I'll hold off on RC4 (to reduce the chance of an
> > >> RC5),
> > >> > otherwise I'll cut RC4 once Valentyn's PR is merged.
> > >> >
> > >> > Reuven
> > >> >
> > >> > On Thu, Nov 9, 2017 at 2:26 PM, Valentyn Tymofieiev <
> > >> > valen...@google.com.invalid> wrote:
> > >> >
> > >> > > https://github.com/apache/beam/pull/4109 is out to address both
> > >> > findings I
> > >> > > reported earlier.
> > >> > >
> > >> > > On Thu, Nov 9, 2017 at 8:54 AM, Etienne Chauchot <
> > echauc...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Just as a remark, I compared (on my laptop though) queries
> > execution
> > >> > > times
> > >> > > > on my previous run of 2.2.0-RC3 with release 2.1.0 and I did not
> > see
> > >> > any
> > >> > > > performance regression.
> > >> > > >
> > >> > > > Best
> > >> > > >
> > >> > > > Etienne
> > >> > > >
> > >> > > >
> > >> > > > Le 09/11/2017 à 03:13, Valentyn Tymofieiev a écrit :
> > >> > > >
> > >> > > >> I looked at Python side of Dataflow & Direct runners on Linux.
> > There
> > >> > are
> > >> > > >> two findings:
> > >> > > >>
> > >> > > >> 1. One of the mobile gaming examples did not pass for Dataflow
> > >> runner,
> > >> > > >> addressed in: https://github.com/apache/beam/pull/4102
> > >> > > >>  > >> > > >> che%2Fbeam%2Fpull%2F4102=D=1=AFQjCNF3OS6Oo-MeNET
> > >> > > >> CCmOxJj5Gm2uH6g>
> > >> > > >>
> > >> > > >> .
> > >> > > >>
> > >> > > >> 2. Python streaming did not work for Dataflow runner, one PR is
> > out
> > >> > > >> https://github.com/apache/beam/pull/4106, but follow up PRs
> may
> > be
> > >> > > >> required
> > >> > > >> as we continue to investigate. If we had a PostCommit tests
> suite
> > >> > > running
> > >> > > >> against a release branch, this could have been caught earlier.
> > Filed
> > >> > > >> https://issues.apache.org/jira/browse/BEAM-3163.
> > >> > > >>
> > >> > > >> On Wed, Nov 8, 2017 at 2:39 PM, Reuven Lax
> >  > >> >
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >> Hi everyone,
> > >> > > >>>
> > >> > > >>> Please review and vote on the release candidate #3 for the
> > version
> > >> > > 2.2.0,
> > >> > > >>> as follows:
> > >> > > >>>    [ ] +1, Approve the release
> > >> > > >>>    [ ] -1, Do not approve the release (please provide specific
> > >> > > comments)
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> The complete staging area is available for your review, which
> > >> > includes:
> > >> > > >>>    * JIRA release notes [1],
> > >> > > >>>    * the official Apache source release to be deployed to
> > >> > > >>> dist.apache.org
> > >> > > >>> [2],
> > >> > > >>> which is signed with the key with fingerprint B98B7708 [3],
> > >> > > >>>    * all artifacts to be deployed to the Maven Central
> > Repository
> > >> > [4],
> > >> > > >>>    * source code tag "v2.2.0-RC3" [5],
> > >> > > >>>    * website pull request listing the release 

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-08 Thread Ted Yu
Having both Spark1 and Spark2 modules would benefit wider user base.

I would vote for that.

Cheers

On Wed, Nov 8, 2017 at 12:51 AM, Jean-Baptiste Onofré 
wrote:

> Hi Robert,
>
> Thanks for your feedback !
>
> From an user perspective, with the current state of the PR, the same
> pipelines can run on both Spark 1.x and 2.x: the only difference is the
> dependencies set.
>
> I'm calling the vote to get suck kind of feedback: if we consider Spark
> 1.x still need to be supported, no problem, I will improve the PR to have
> three modules (common, spark1, spark2) and let users pick the desired
> version.
>
> Let's wait a bit other feedbacks, I will update the PR accordingly.
>
> Regards
> JB
>
>
> On 11/08/2017 09:47 AM, Robert Bradshaw wrote:
>
>> I'm generally a -0.5 on this change, or at least doing so hastily.
>>
>> As with dropping Java 7 support, I think this should at least be
>> announced in release notes that we're considering dropping support in
>> the subsequent release, as this dev list likely does not reach a
>> substantial portion of the userbase.
>>
>> How much work is it to move from a Spark 1.x cluster to a Spark 2.x
>> cluster? I get the feeling it's not nearly as transparent as upgrading
>> Java versions. Can Spark 1.x pipelines be run on Spark 2.x clusters,
>> or is a new cluster (and/or upgrading all pipelines) required (e.g.
>> for those who operate spark clusters shared among their many users)?
>>
>> Looks like the latest release of Spark 1.x was about a year ago,
>> overlapping a bit with the 2.x series which is coming up on 1.5 years
>> old, so I could see a lot of people still using 1.x even if 2.x is
>> clearly the future. But it sure doesn't seem very backwards
>> compatible.
>>
>> Mostly I'm not comfortable with dropping 1.x in the same release as
>> adding support for 2.x, giving no transition period, but could be
>> convinced if this transition is mostly a no-op or no one's still using
>> 1.x. If there's non-trivial code complexity issues, I would perhaps
>> revisit the issue of having a single Spark Runner that does chooses
>> the backend implicitly in favor of simply having two runners which
>> share the code that's easy to share and diverge otherwise (which seems
>> it would be much simpler both to implement and explain to users). I
>> would be OK with even letting the Spark 1.x runner be somewhat
>> stagnant (e.g. few or no new features) until we decide we can kill it
>> off.
>>
>> On Tue, Nov 7, 2017 at 11:27 PM, Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi all,
>>>
>>> as you might know, we are working on Spark 2.x support in the Spark
>>> runner.
>>>
>>> I'm working on a PR about that:
>>>
>>> https://github.com/apache/beam/pull/3808
>>>
>>> Today, we have something working with both Spark 1.x and 2.x from a code
>>> standpoint, but I have to deal with dependencies. It's the first step of
>>> the
>>> update as I'm still using RDD, the second step would be to support
>>> dataframe
>>> (but for that, I would need PCollection elements with schemas, that's
>>> another topic on which Eugene, Reuven and I are discussing).
>>>
>>> However, as all major distributions now ship Spark 2.x, I don't think
>>> it's
>>> required anymore to support Spark 1.x.
>>>
>>> If we agree, I will update and cleanup the PR to only support and focus
>>> on
>>> Spark 2.x.
>>>
>>> So, that's why I'm calling for a vote:
>>>
>>>[ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>>>[ ] 0 (I don't care ;))
>>>[ ] -1, I would like to still support Spark 1.x, and so having
>>> support of
>>> both Spark 1.x and 2.x (please provide specific comment)
>>>
>>> This vote is open for 48 hours (I have the commits ready, just waiting
>>> the
>>> end of the vote to push on the PR).
>>>
>>> Thanks !
>>> Regards
>>> JB
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-07 Thread Ted Yu
>> >>> > > >artifacts
> >> >>> > > >> > and
> >> >>> > > >> > >> releases as we need to support our users using Maven.
> >> >>> > > >> > >>
> >> >>> > > >> > >> On Mon, Oct 30, 2017 at 11:26 PM, Jean-Baptiste
> >Onofré <
> >> >>> > > >> j...@nanthrax.net
> >> >>> > > >> > >
> >> >>> > > >> > >> wrote:
> >> >>> > > >> > >>
> >> >>> > > >> > >> > Generally speaking, it's interesting to evaluate
> >> >>> alternatives,
> >> >>> > > >> > especially
> >> >>> > > >> > >> > Gradle. My point is also to keep Maven artifacts
> >and
> >> >>> > > >"releases" as
> >> >>> > > >> > most
> >> >>> > > >> > >> of
> >> >>> > > >> > >> > our users will use Maven.
> >> >>> > > >> > >> > For incremental build, afair, there's some
> >> >enhancements on
> >> >>> > > >Maven
> >> >>> > > >> but I
> >> >>> > > >> > >> > have to take a look.
> >> >>> > > >> > >> >
> >> >>> > > >> > >> > Regards
> >> >>> > > >> > >> > JB
> >> >>> > > >> > >> >
> >> >>> > > >> > >> > On Oct 31, 2017, 07:22, at 07:22, Eugene Kirpichov
> >> >>> > > >> > >> > <kirpic...@google.com.INVALID> wrote:
> >> >>> > > >> > >> > >Hi!
> >> >>> > > >> > >> > >
> >> >>> > > >> > >> > >Many of these points sound valid, but AFAICT Maven
> >> >doesn't
> >> >>> > > >really
> >> >>> > > >> do
> >> >>> > > >> > >> > >incremental builds [1]. The best it can do is, it
> >> >seems,
> >> >>> > > >recompile
> >> >>> > > >> > only
> >> >>> > > >> > >> > >changed files, but Java compilation is a tiny part
> >of
> >> >the
> >> >>> > > >overall
> >> >>> > > >> > >> > >build.
> >> >>> > > >> > >> > >
> >> >>> > > >> > >> > >Almost all time is taken by other plugins, such as
> >> >unit
> >> >>> > > >testing or
> >> >>> > > >> > >> > >findbugs
> >> >>> > > >> > >> > >- and Maven does not seem to currently support
> >> >features such
> >> >>> > > >as "do
> >> >>> > > >> > not
> >> >>> > > >> > >> > >rerun unit tests of a module if the code didn't
> >> >change".
> >> >>> > > >> > >> > >
> >> >>> > > >> > >> > >The fact that the surefire plugin has existed for
> >>11
> >> >years
> >> >>> > > >> (version
> >> >>> > > >> > >> > >2.0
> >> >>> > > >> > >> > >was released in 2006) and still doesn't have this
> >> >feature
> >> >>> > > >makes me
> >> >>> > > >> > >> > >think
> >> >>> > > >> > >> > >that it's unlikely to be supported in the next few
> >> >years
> >> >>> > > >either.
> >> >>> > > >> > >> > >
> >> >>> > > >> > >> > >I suspect most PRs affect a very small number of
> >> >modules, so
> >> >>> > > >I
> >> >>> > > >> think
> >> >>> > > >> > >> > >the
> >> >>> > > >> > >> > >performance advantage of a build system truly
> >> >supporting
> >> >>> > > >> incremental
> >> >>> > > >> > >> > >builds
> >> >>> > > >> > >> > >may be so overwhelming as to trump many other
> >> >factors. Of
> >> >>> > > >course,
> >> >>> > > >> > we'd
> >> >>> > > >> > >> > >need
> >> >>> > > >> > >> > >to prototype and have hard numbers in hand to
> >discuss
> >> >this
> >> >>> > > >with
> >> >>> > > >> more
> >> >>> > > >> > >> > >substance.
> >> >>> > > >> > >> > >
> >> >>> > > >> > >> > >[1]
> >> >>> > > >> > >> >
> >> >>https://stackoverflow.com/questions/8918165/does-maven-
> >> >>> > > >> > >> > support-incremental-builds
> >> >>> > > >> > >> > >
> >> >>> > > >> > >> > >On Mon, Oct 30, 2017 at 10:57 PM Romain
> >Manni-Bucau
> >> >>> > > >> > >> > ><rmannibu...@gmail.com>
> >> >>> > > >> > >> > >wrote:
> >> >>> > > >> > >> > >
> >> >>> > > >> > >> > >> Hi
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> Even if not a commiter or even PMC, I'd like to
> >> >mention a
> >> >>> > > >few
> >> >>> > > >> > points
> >> >>> > > >> > >> > >from
> >> >>> > > >> > >> > >> an external eye:
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> - Maven stays the most common build tool and
> >easier
> >> >one
> >> >>> for
> >> >>> > > >any
> >> >>> > > >> > user.
> >> >>> > > >> > >> > >It
> >> >>> > > >> > >> > >> means it is the best one to hope contributions
> >> >IMHO.
> >> >>> > > >> > >> > >> - Maven has incremental support but if there is
> >any
> >> >>> blocker
> >> >>> > > >the
> >> >>> > > >> > >> > >community
> >> >>> > > >> > >> > >> is probably ready to enhance it (has been done
> >for
> >> >>> compiler
> >> >>> > > >> plugin
> >> >>> > > >> > >> > >for
> >> >>> > > >> > >> > >> instance)
> >> >>> > > >> > >> > >> - Gradle hides issues easily with its daemon so
> >a
> >> >build
> >> >>> > > >without
> >> >>> > > >> > >> > >daemon is
> >> >>> > > >> > >> > >> needed
> >> >>> > > >> > >> > >> - Gradle doesnt isolate plugins well enough so
> >> >ensure your
> >> >>> > > >> planned
> >> >>> > > >> > >> > >plugins
> >> >>> > > >> > >> > >> doesnt conflict
> >> >>> > > >> > >> > >> - Only Maven is correctly supported in
> >mainstream
> >> >and
> >> >>> > > >OS/free IDE
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> This is the reasons why I think Maven is better
> >-
> >> >not even
> >> >>> > > >> entering
> >> >>> > > >> > >> > >into
> >> >>> > > >> > >> > >> the ASF points.
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> Now Maven is not perfect but some quick
> >> >enhancements can
> >> >>> be
> >> >>> > > >done:
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> - A fast build profile can be created
> >> >>> > > >> > >> > >> - Takari scheduler can be used yo enhance the
> >> >parallel
> >> >>> > > >build
> >> >>> > > >> > >> > >> - Scripts can be provided to build a subpart of
> >the
> >> >>> project
> >> >>> > > >> > >> > >> - A beam extension can surely be done to
> >optimize
> >> >or
> >> >>> > > >compute the
> >> >>> > > >> > >> > >reactors
> >> >>> > > >> > >> > >> more easily based on module names
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> Romain
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> Le 31 oct. 2017 06:42, "Jean-Baptiste Onofré"
> >> >>> > > ><j...@nanthrax.net>
> >> >>> > > >> a
> >> >>> > > >> > >> > >écrit :
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> -0
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> For the following reasons reasons:
> >> >>> > > >> > >> > >> - maven is a Apache project and we can have
> >> >>> > > >support/improvement
> >> >>> > > >> > >> > >> - I don't see how another build tool would speed
> >up
> >> >the
> >> >>> > > >build by
> >> >>> > > >> > >> > >itself
> >> >>> > > >> > >> > >> - Apache default release process is based on
> >Maven
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> On the other hand, Gradle could be interesting.
> >> >Anyway
> >> >>> it's
> >> >>> > > >> > something
> >> >>> > > >> > >> > >to
> >> >>> > > >> > >> > >> evaluate.
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> Regards
> >> >>> > > >> > >> > >> JB
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> > >> On Oct 30, 2017, 18:46, at 18:46, Ted Yu
> >> >>> > > ><yuzhih...@gmail.com>
> >> >>> > > >> > wrote:
> >> >>> > > >> > >> > >> >I agree with Ben's comment.
> >> >>> > > >> > >> > >> >
> >> >>> > > >> > >> > >> >Recently I have been using gradle in another
> >> >Apache
> >> >>> > > >project and
> >> >>> > > >> > >> > >found
> >> >>> > > >> > >> > >> >it
> >> >>> > > >> > >> > >> >interesting.
> >> >>> > > >> > >> > >> >
> >> >>> > > >> > >> > >> >Cheers
> >> >>> > > >> > >> > >>
> >> >>> > > >> > >> >
> >> >>> > > >> > >>
> >> >>> > > >> >
> >> >>> > > >>
> >> >>> > >
> >> >>> >
> >> >>>
> >>
>


Re: [Proposal] Apache Beam Swag Store

2017-11-05 Thread Ted Yu
+1
 Original message From: Jacob Marble  
Date: 11/5/17  1:50 PM  (GMT-08:00) To: dev@beam.apache.org Subject: Re: 
[Proposal] Apache Beam Swag Store 
I think this is a great idea, ready to order mine. :)

Jacob

On Sat, Oct 28, 2017 at 11:19 AM, Jean-Baptiste Onofré 
wrote:

> It sounds good. Please let us know trademark update.
>
> Thanks
> Regards
> JB
>
> On Oct 28, 2017, 20:15, at 20:15, Griselda Cuevas 
> wrote:
> >Thanks for the feedback all, I'll send this idea to the trademark@
> >folks
> >and wait for validation. Once we have it I'll look into building the
> >store
> >possibly embedded in the website.
> >
> >Enjoy the weekend.
> >G
> >
> >5
> >
> >
> >
> >On 27 October 2017 at 11:53, Tyler Akidau 
> >wrote:
> >
> >> One additional note: for the logos w/ the name below them, would be
> >nice to
> >> not have quite so much whitespace between the logo and the name.
> >Otherwise,
> >> trademark validation aside, this looks great.
> >>
> >> On Fri, Oct 27, 2017 at 10:15 AM Griselda Cuevas
> >
> >> wrote:
> >>
> >> > Hi Dan - thanks for bringing this up to my attention. I haven't
> >raised
> >> this
> >> > up to the tradema...@apache.org people. Can I just reach out to
> >them to
> >> > get
> >> > the proposal or should one of the PMCs do this?
> >> >
> >> >
> >> >
> >> > Gris Cuevas Zambrano
> >> >
> >> > g...@google.com
> >> >
> >> > Open Source Strategy
> >> >
> >> > 345 Spear Street, San Francisco, 94105
> >> >  >> +94105=gmail=g>
> >> >
> >> >
> >> >
> >> > On 26 October 2017 at 18:30, Daniel Kulp  wrote:
> >> >
> >> > >
> >> > > Have you run this through tradema...@apache.org yet?
> >> > >
> >> > > I bring this up for two reasons:
> >> > >
> >> > > 1) We would need to make sure the appearance and such of the logo
> >is
> >> > > “correct”
> >> > >
> >> > > 2).A few years ago, Apache did have a partnership with a company
> >that
> >> > > would produce various swag things and then donate a portion back
> >to
> >> > > Apache.   I don’t know what the state of that agreement is and
> >whether
> >> > that
> >> > > would restrict going to another vendor or something.
> >> > >
> >> > > Dan
> >> > >
> >> > >
> >> > > > On Oct 25, 2017, at 8:51 AM, Griselda Cuevas
> > >> >
> >> > > wrote:
> >> > > >
> >> > > > Hi Everyone,
> >> > > >
> >> > > > I'd like to propose the creation of an online swag store for
> >Apache
> >> > Beam
> >> > > where anyone could order swag from a wide selection and get it
> >deliver
> >> to
> >> > > their home or office. I got in touch with a provider who could
> >include
> >> > this
> >> > > service as part of an order I'm placing to send swag for the
> >Meetups
> >> > we're
> >> > > organizing this year. What do you think?
> >> > > >
> >> > > > I'd also like to get your feedback on the swag I'm requesting
> >(you
> >> can
> >> > > see it in the pdf I attached), what do you think of the colors,
> >design,
> >> > > etc.?
> >> > > >
> >> > > > Lastly, I'll be ordering swag for Meetup organized this year so
> >if
> >> > > you're hosting one or speaking at one get in touch with me to
> >send
> >> some!
> >> > > >
> >> > > > Cheers,
> >> > > > G
> >> > >
> >> > > --
> >> > > Daniel Kulp
> >> > > dk...@apache.org - http://dankulp.com/blog
> >> > > Talend Community Coder - http://coders.talend.com
> >> > >
> >> > >
> >> >
> >>
>


Re: Policy for stale PRs

2017-08-18 Thread Ted Yu
bq. component leads regularly triage their components, including
unassigning issues.

+1

On Fri, Aug 18, 2017 at 5:11 PM, Ahmet Altay <al...@google.com.invalid>
wrote:

> To summarize the stale PR issue, do we agree on the following statement:
>
> A PR becomes stale after its author fails to respond to actionable comments
> for 60 days. The community will close stale PRs. Author is welcome to
> reopen the same PR again in the future. The associated JIRAs will be
> unassigned from the author but will stay open.
>
> On Wed, Aug 16, 2017 at 3:25 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > bq. IRAs should still stay open but should become unassigned
> >
> > The above would need admin privilege, right ?
> > Is there automated way to do it ?
> >
> > bq. Prevent contributors/committers from taking more than 'n' JIRAs at
> the
> > same time
> >
> > It would be hard to determine the N above since the amount of coding /
> > testing varies greatly across JIRAs.
> >
>
> I agree with Ismaël that there is an issue here. We currently have 969 open
> JIRAs, 427 of them are unassigned and the remaining 542 are assigned to 87
> people. The average of 6 issues per assignee is not that high. I think the
> problem is some of us (mainly component leads, including myself) have too
> many issues assigned.  Top 5 of them have 218 issues assigned to them. I
> believe these issues are automatically assigned for triage purposes. We
> probably do not need to codify an exact set of rules,, we could ask
> component leads regularly triage their components, including unassigning
> issues.
>
>
> >
> >
> >
> > On Wed, Aug 16, 2017 at 3:20 PM, Ismaël Mejía <ieme...@gmail.com> wrote:
> >
> > > Thanks Ahmet for bringing this subject.
> > >
> > > +1 to close the stale PRs automatically after a fixed time of
> inactivity.
> > > 90
> > > days is ok, but maybe a shorter period is better. If we consider that
> > being
> > > stale is just not having any activity i.e., the author of the PR does
> not
> > > answer
> > > any message. The author can buy extra time just by adding a message to
> > say,
> > > 'wait I am still working on this', and win a complete period of time,
> so
> > > the
> > > longer the staleness period is the longer it can eventually be
> extended.
> > >
> > > I agree with Thomas the JIRAs should still stay open but should become
> > > unassigned because the issue won't be yet fixed but we want to
> encourage
> > > people
> > > to work on it.
> > >
> > > Other additional subject that makes sense to discuss here is if we need
> > > policies
> > > to avoid 'stale' JIRAs (JIRAs that have been taken but that don't have
> > > progress)?, for example:
> > >
> > > - Prevent contributors/committers from taking more than 'n' JIRAs at
> the
> > > same
> > >   time (we should define this n considering the period of staleness,
> > maybe
> > > 10?).
> > >
> > > - Automatically free 'stale' JIRAs after a fixed time period with no
> > > active work
> > >
> > > Remember the objective is to encourage more people to contribute but
> > people
> > > won't be encouraged to contribute on subjects that other people have
> > > taken, this
> > > is a well known anti-pattern in volunteer communities, see
> > > http://communitymgt.wikia.com/wiki/Cookie_Licking
> > >
> > > On Wed, Aug 16, 2017 at 10:38 PM, Thomas Groh <tg...@google.com.invalid
> >
> > > wrote:
> > > > JIRAs should only be closed if the issue that they track is no longer
> > > > relevant (either via being fixed or being determined to not be a
> > > problem).
> > > > If a JIRA isn't being meaningfully worked on, it should be unassigned
> > (in
> > > > all cases, not just if there's an associated pull request that has
> not
> > > been
> > > > worked on).
> > > >
> > > > +1 on closing PRs with no action from the original author after some
> > > > reasonable time frame (90 days is certainly reasonable; 30 might be
> too
> > > > short) if the author has not responded to actionable feedback.
> > > >
> > > > On Wed, Aug 16, 2017 at 12:07 PM, Sourabh Bajaj <
> > > > sourabhba...@google.com.invalid> wrote:
> > > >
> > > >> Some projects I have seen close stale PRs after 30 days, saying
> > "Closing
> > > 

Re: Policy for stale PRs

2017-08-16 Thread Ted Yu
bq. IRAs should still stay open but should become unassigned

The above would need admin privilege, right ?
Is there automated way to do it ?

bq. Prevent contributors/committers from taking more than 'n' JIRAs at the
same time

It would be hard to determine the N above since the amount of coding /
testing varies greatly across JIRAs.



On Wed, Aug 16, 2017 at 3:20 PM, Ismaël Mejía <ieme...@gmail.com> wrote:

> Thanks Ahmet for bringing this subject.
>
> +1 to close the stale PRs automatically after a fixed time of inactivity.
> 90
> days is ok, but maybe a shorter period is better. If we consider that being
> stale is just not having any activity i.e., the author of the PR does not
> answer
> any message. The author can buy extra time just by adding a message to say,
> 'wait I am still working on this', and win a complete period of time, so
> the
> longer the staleness period is the longer it can eventually be extended.
>
> I agree with Thomas the JIRAs should still stay open but should become
> unassigned because the issue won't be yet fixed but we want to encourage
> people
> to work on it.
>
> Other additional subject that makes sense to discuss here is if we need
> policies
> to avoid 'stale' JIRAs (JIRAs that have been taken but that don't have
> progress)?, for example:
>
> - Prevent contributors/committers from taking more than 'n' JIRAs at the
> same
>   time (we should define this n considering the period of staleness, maybe
> 10?).
>
> - Automatically free 'stale' JIRAs after a fixed time period with no
> active work
>
> Remember the objective is to encourage more people to contribute but people
> won't be encouraged to contribute on subjects that other people have
> taken, this
> is a well known anti-pattern in volunteer communities, see
> http://communitymgt.wikia.com/wiki/Cookie_Licking
>
> On Wed, Aug 16, 2017 at 10:38 PM, Thomas Groh <tg...@google.com.invalid>
> wrote:
> > JIRAs should only be closed if the issue that they track is no longer
> > relevant (either via being fixed or being determined to not be a
> problem).
> > If a JIRA isn't being meaningfully worked on, it should be unassigned (in
> > all cases, not just if there's an associated pull request that has not
> been
> > worked on).
> >
> > +1 on closing PRs with no action from the original author after some
> > reasonable time frame (90 days is certainly reasonable; 30 might be too
> > short) if the author has not responded to actionable feedback.
> >
> > On Wed, Aug 16, 2017 at 12:07 PM, Sourabh Bajaj <
> > sourabhba...@google.com.invalid> wrote:
> >
> >> Some projects I have seen close stale PRs after 30 days, saying "Closing
> >> due to lack of activity, please feel free to re-open".
> >>
> >> On Wed, Aug 16, 2017 at 12:05 PM Ahmet Altay <al...@google.com.invalid>
> >> wrote:
> >>
> >> > Sounds like we have consensus. Since this is a new policy, I would
> >> suggest
> >> > picking the most flexible option for now (90 days) and we can tighten
> it
> >> in
> >> > the future. To answer Kenn's question, I do not know, how other
> projects
> >> > handle this. I did a basic search but could not find a good answer.
> >> >
> >> > What mechanism can we use to close PRs, assuming that author will be
> out
> >> of
> >> > communication. We can push a commit with a "This closes #xyz #abc"
> >> message.
> >> > Is there another way to do this?
> >> >
> >> > Ahmet
> >> >
> >> > On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur <aviem...@gmail.com>
> wrote:
> >> >
> >> > > Makes sense to close after a long time of inactivity and no
> response,
> >> and
> >> > > as Kenn mentioned they can always re-open.
> >> > >
> >> > > On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré <
> j...@nanthrax.net
> >> >
> >> > > wrote:
> >> > >
> >> > > > If we consider the author, it makes sense.
> >> > > >
> >> > > > Regards
> >> > > > JB
> >> > > >
> >> > > > On Aug 15, 2017, 01:29, at 01:29, Ted Yu <yuzhih...@gmail.com>
> >> wrote:
> >> > > > >The proposal makes sense.
> >> > > > >
> >> > > > >If the author of PR doesn't respond for 90 days, the PR is likely
> >> out
> >> > > > >of
> >> > > > >sync with curre

Re: Policy for stale PRs

2017-08-16 Thread Ted Yu
What should be done to the JIRA associated with the PR?
 Original message From: Ahmet Altay <al...@google.com.INVALID> 
Date: 8/16/17  12:05 PM  (GMT-08:00) To: dev@beam.apache.org Subject: Re: 
Policy for stale PRs 
Sounds like we have consensus. Since this is a new policy, I would suggest
picking the most flexible option for now (90 days) and we can tighten it in
the future. To answer Kenn's question, I do not know, how other projects
handle this. I did a basic search but could not find a good answer.

What mechanism can we use to close PRs, assuming that author will be out of
communication. We can push a commit with a "This closes #xyz #abc" message.
Is there another way to do this?

Ahmet

On Wed, Aug 16, 2017 at 4:32 AM, Aviem Zur <aviem...@gmail.com> wrote:

> Makes sense to close after a long time of inactivity and no response, and
> as Kenn mentioned they can always re-open.
>
> On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > If we consider the author, it makes sense.
> >
> > Regards
> > JB
> >
> > On Aug 15, 2017, 01:29, at 01:29, Ted Yu <yuzhih...@gmail.com> wrote:
> > >The proposal makes sense.
> > >
> > >If the author of PR doesn't respond for 90 days, the PR is likely out
> > >of
> > >sync with current repo.
> > >
> > >Cheers
> > >
> > >On Mon, Aug 14, 2017 at 5:27 PM, Ahmet Altay <al...@google.com.invalid>
> > >wrote:
> > >
> > >> Hi all,
> > >>
> > >> Do we have an existing policy for handling stale PRs? If not could we
> > >come
> > >> up with one. We are getting close to 100 open PRs. Some of the open
> > >PRs
> > >> have not been touched for a while, and if we exclude the pings the
> > >number
> > >> will be higher.
> > >>
> > >> For example, we could close PRs that have not been updated by the
> > >original
> > >> author for 90 days even after multiple attempts to reach them (e.g.
> > >[1],
> > >> [2] are such PRs.)
> > >>
> > >> What do you think?
> > >>
> > >> Thank you,
> > >> Ahmet
> > >>
> > >> [1] https://github.com/apache/beam/pull/1464
> > >> [2] https://github.com/apache/beam/pull/2949
> > >>
> >
>


Re: [ANNOUNCEMENT] New committers, August 2017 edition!

2017-08-11 Thread Ted Yu
Congratulations to all.


On Fri, Aug 11, 2017 at 10:40 AM, Davor Bonaci  wrote:

> Please join me and the rest of Beam PMC in welcoming the following
> contributors as our newest committers. They have significantly contributed
> to the project in different ways, and we look forward to many more
> contributions in the future.
>
> * Reuven Lax
> Reuven has been with the project since the very beginning, contributing
> mostly to the core SDK and the GCP IO connectors. He accumulated 52 commits
> (19,824 ++ / 12,039 --). Most recently, Reuven re-wrote several IO
> connectors that significantly expanded their functionality. Additionally,
> Reuven authored important new design documents relating to update and
> snapshot functionality.
>
> * Jingsong Lee
> Jingsong has been contributing to Apache Beam since the beginning of the
> year, particularly to the Flink runner. He has accumulated 34 commits
> (11,214 ++ / 6,314 --) of deep, fundamental changes that significantly
> improved the quality of the runner. Additionally, Jingsong has contributed
> to the project in other ways too -- reviewing contributions, and
> participating in discussions on the mailing list, design documents, and
> JIRA issue tracker.
>
> * Mingmin Xu
> Mingmin started the SQL DSL effort, and has driven it to the point of
> merging to the master branch. In this effort, he extended the project to
> the significant new user community.
>
> * Mingming (James) Xu
> James joined the SQL DSL effort, contributing some of the trickier parts,
> such as the Join functionality. Additionally, he's consistently shown
> himself to be an insightful code reviewer, significantly impacting the
> project’s code quality and ensuring the success of the new major component.
>
> * Manu Zhang
> Manu initiated and developed a runner for the Apache Gearpump (incubating)
> engine, and has driven it to the point of merging to the master branch. In
> this effort, he accumulated 65 commits (7,812 ++ / 4,882 --) and extended
> the project to the new user community.
>
> Congratulations to all five! Welcome!
>
> Davor
>


Re: beam-site issues with Jenkins and MergeBot

2017-08-09 Thread Ted Yu
However, the following is accessible:

https://github.com/apache/beam-site.git

Last commit was 13 days ago.

On Wed, Aug 9, 2017 at 1:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> For #1, under https://git-wip-us.apache.org/repos/asf , I don't see
> beam-site
>
> FYI
>
> On Wed, Aug 9, 2017 at 1:08 PM, Eugene Kirpichov <
> kirpic...@google.com.invalid> wrote:
>
>> Hello,
>>
>> I've been trying to merge a PR https://github.com/apache/beam
>> -site/pull/278
>> and ran into the following issues:
>>
>> 1) When I do "git fetch --all" on beam-site, I get an error "fatal:
>> repository 'https://git-wip-us.apache.org/repos/asf/beam-site.git/' not
>> found". Has the git address of the apache repo changed? Is it no longer
>> valid because we have MergeBot?
>>
>> 2) Precommit tests are failing nearly 100% of the time.
>> If you look at build history on
>> https://builds.apache.org/job/beam_PreCommit_Website_Test/ - 9 out of 10
>> last builds failed.
>> Failures I saw:
>>
>> 7 times:
>> + gpg --keyserver hkp://keys.gnupg.net --recv-keys
>> 409B6B1796C275462A1703113804BB82D39DC0E3
>> gpg: requesting key D39DC0E3 from hkp server keys.gnupg.net
>> ?: keys.gnupg.net: Cannot assign requested address
>>
>> 2 times:
>> - ./content/subdir/contribute/testing/index.html
>>   *  External link https://builds.apache.org/view/Beam/ failed: 404 No
>> error
>>
>> The second failure seems legit - https://builds.apache.org/view/Beam/ is
>> actually 404 right now (I'll send a separate email about htis)
>>
>> The gnupg failure is not legit - I'm able to run the same command myself
>> with no issues.
>>
>> 3) Suppose because of this, I'm not able to merge my PR with "@asfgit
>> merge" command - I suppose it requires a successful test run. Would be
>> nice
>> if it posted a comment saying why it refuses to merge.
>>
>
>


Re: beam-site issues with Jenkins and MergeBot

2017-08-09 Thread Ted Yu
For #1, under https://git-wip-us.apache.org/repos/asf , I don't see
beam-site

FYI

On Wed, Aug 9, 2017 at 1:08 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

> Hello,
>
> I've been trying to merge a PR https://github.com/apache/
> beam-site/pull/278
> and ran into the following issues:
>
> 1) When I do "git fetch --all" on beam-site, I get an error "fatal:
> repository 'https://git-wip-us.apache.org/repos/asf/beam-site.git/' not
> found". Has the git address of the apache repo changed? Is it no longer
> valid because we have MergeBot?
>
> 2) Precommit tests are failing nearly 100% of the time.
> If you look at build history on
> https://builds.apache.org/job/beam_PreCommit_Website_Test/ - 9 out of 10
> last builds failed.
> Failures I saw:
>
> 7 times:
> + gpg --keyserver hkp://keys.gnupg.net --recv-keys
> 409B6B1796C275462A1703113804BB82D39DC0E3
> gpg: requesting key D39DC0E3 from hkp server keys.gnupg.net
> ?: keys.gnupg.net: Cannot assign requested address
>
> 2 times:
> - ./content/subdir/contribute/testing/index.html
>   *  External link https://builds.apache.org/view/Beam/ failed: 404 No
> error
>
> The second failure seems legit - https://builds.apache.org/view/Beam/ is
> actually 404 right now (I'll send a separate email about htis)
>
> The gnupg failure is not legit - I'm able to run the same command myself
> with no issues.
>
> 3) Suppose because of this, I'm not able to merge my PR with "@asfgit
> merge" command - I suppose it requires a successful test run. Would be nice
> if it posted a comment saying why it refuses to merge.
>


Re: is it ok to have a dicussion without subscribe the list

2017-08-09 Thread Ted Yu
Derek:
You can periodically visit:

http://search-hadoop.com/Beam

where it is easy to find the thread(s) you're interested in.

The latency of indexing is very low.

FYI

On Wed, Aug 9, 2017 at 1:28 AM, derek  wrote:

> On Tue, Aug 8, 2017 at 1:23 PM, Jason Kuster 
> wrote:
> > Hi Derek,
> >
> > If you aren't subscribed to the list then people have to manually add you
> > back into the to: line in order for you to receive replies (I always do
> > anyway). Subscribing (and unsubscribing) to the list is fairly
> > straightforward, so that's what I would suggest.
>
> but a question (or problem, applies to many other mailing list as well) is:
>
> is it ok to subscribe only the thread which I started, because many
> mailing lists have overwhelmingly too many emails, I have no interest
> to other threads
>
> >
> > Best,
> >
> > Jason
> >
> > On Mon, Aug 7, 2017 at 4:20 PM, derek  wrote:
> >>
> >> is it ok to have a dicussion without subscribe the list?
>


Re: Java 8?

2017-07-14 Thread Ted Yu
Recently on Spark and Flink mailing lists there have been discussion on
dropping support for Java 7.

For master branch, we should consider moving to Java 8 for compilation.


On Fri, Jul 14, 2017 at 2:41 PM, Swapnil Bawaskar 
wrote:

> Hi,
>
> I am trying to write a write an I/O transform for Apache Geode, which is
> compiled with java-8. When I try to compile what I have so far, I get the
> following error:
>
> [INFO] --- maven-enforcer-plugin:1.4.1:enforce (enforce) @
> beam-sdks-java-io-geode ---
> [INFO] Restricted to JDK 1.7 yet
> org.apache.geode:geode-core:jar:1.1.1:compile contains
> org/apache/geode/admin/AdminConfig$Entry.class targeted to JDK 1.8
> [INFO] Restricted to JDK 1.7 yet
> org.apache.geode:geode-json:jar:1.1.1:compile contains org/json/CDL.class
> targeted to JDK 1.8
> [INFO] Restricted to JDK 1.7 yet
> org.apache.geode:geode-common:jar:1.1.1:compile contains
> org/apache/geode/annotations/Experimental.class targeted to JDK 1.8
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.EnforceBytecodeVersion
> failed with message:
> Found Banned Dependency: org.apache.geode:geode-core:jar:1.1.1
>
> Are there any plans to move to java-8 for compiling?
>
> Thanks!
> Swapnil.
>


Re: MergeBot is here!

2017-07-07 Thread Ted Yu
For https://gitbox.apache.org/setup/ , after completing the first two
steps, is there any action needed for "MFA Status" box ?

Cheers

On Fri, Jul 7, 2017 at 1:37 PM, Lukasz Cwik 
wrote:

> for i in range(0, inf): +1
>
> Note that the URL for gitbox linking is:
> https://gitbox.apache.org/setup/ (above
> URL was missing '/' and was giving 404)
>
>
> On Fri, Jul 7, 2017 at 1:21 PM, Jason Kuster  invalid
> > wrote:
>
> > Hi Beam Community,
> >
> > Early on in the project, we had a number of discussions about creating an
> > automated tool for merging pull requests. I’m happy to announce that
> we’ve
> > developed such a tool and it is ready for experimental usage in Beam!
> >
> > The tool, MergeBot, works in conjunction with ASF’s existing GitBox tool,
> > providing numerous benefits:
> > * Automating the merge process -- instead of many manual steps with
> > multiple Git remotes, merging is as simple as commenting a specific
> command
> > in GitHub.
> > * Automatic verification of each pull request against the latest master
> > code before merge.
> > * Merge queue enforces an ordering of pull requests, which ensures that
> > pull requests that have bad interactions don’t get merged at the same
> time.
> > * GitBox-enabled features such as reviewers, assignees, and labels.
> > * Enabling enhanced use of tools like reviewable.io.
> >
> > If you are a committer, the first step is to link your Apache and GitHub
> > accounts at http://gitbox.apache.org/setup. Once the accounts are
> linked,
> > you should have immediate access to new GitHub features like labels,
> > assignees, etc., as well as the ability to merge pull requests by simply
> > commenting “@asfgit merge” on the pull request. MergeBot will communicate
> > its status back to you via the same mechanism used already by Jenkins.
> >
> > This functionally is currently enabled for the “beam-site” repository
> only.
> > In this phase, we’d like to gather feedback and improve the user
> experience
> > -- so please comment back early and often. Once we are happy with the
> > experience, we’ll deploy it on the main Beam repository, and recommend it
> > for wider adoption.
> >
> > I’d like to give a huge thank you to the Apache Infrastructure team,
> > especially Daniel Pono Takamori, Daniel Gruno, and Chris Thistlethwaite
> who
> > were instrumental in bringing this project to fruition. Additionally,
> this
> > could not have happened without the extensive work Davor put in to keep
> > things moving along. Thank you Davor.
> >
> > Looking forward to hearing your comments and feedback. Thanks.
> >
> > Jason
> >
> > --
> > ---
> > Jason Kuster
> > Apache Beam / Google Cloud Dataflow
> >
>


Re: writing to s3 in beam

2017-07-05 Thread Ted Yu
Please take a look at BEAM-2500 (and related JIRAs).

Cheers

On Wed, Jul 5, 2017 at 8:00 PM, Jyotirmoy Sundi  wrote:

> Hi Folks,
>
>  I am trying to write to s3 from beam.
>
> These are configs I am passing
>
> --hdfsConfiguration='[{"fs.default.name": "s3://xxx-output",
> "fs.s3.awsAccessKeyId" :"xxx", "fs.s3.awsSecretAccessKey":"yyy"}]'
> --input="/home/hadoop/data" --output="s3://xx-output/beam-output/"
>
> *Any idea how can I write to s3, I am using beam release-2.0.0*
>
> *Trace*
>
> 17/07/06 02:55:46 WARN TaskSetManager: Lost task 7.0 in stage 2.0 (TID 31,
> ip-10-130-237-28.vpc.internal): org.apache.beam.sdk.util.
> UserCodeException:
> java.lang.NullPointerException
>
> at
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
>
> at
> org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles$auxiliary$
> TXDiaduA.invokeProcessElement(Unknown
> Source)
>
> at
> org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(
> SimpleDoFnRunner.java:197)
>
> at
> org.apache.beam.runners.core.SimpleDoFnRunner.processElement(
> SimpleDoFnRunner.java:155)
>
> at
> org.apache.beam.runners.spark.translation.DoFnRunnerWithMetrics.
> processElement(DoFnRunnerWithMetrics.java:64)
>
> at
> org.apache.beam.runners.spark.translation.SparkProcessContext$
> ProcCtxtIterator.computeNext(SparkProcessContext.java:165)
>
> at
> org.apache.beam.runners.spark.repackaged.com.google.common.
> collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
>
> at
> org.apache.beam.runners.spark.repackaged.com.google.common.
> collect.AbstractIterator.hasNext(AbstractIterator.java:140)
>
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.
> hasNext(Wrappers.scala:41)
>
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> at scala.collection.generic.Growable$class.$plus$plus$eq(
> Growable.scala:48)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(
> ArrayBuffer.scala:103)
>
> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(
> ArrayBuffer.scala:47)
>
> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
> at scala.collection.TraversableOnce$class.toArray(
> TraversableOnce.scala:252)
>
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
> at
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.
> apply(RDD.scala:927)
>
> at
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.
> apply(RDD.scala:927)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
> SparkContext.scala:1858)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(
> SparkContext.scala:1858)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.NullPointerException
>
> at java.io.File.(File.java:277)
>
> at
> org.apache.hadoop.fs.s3.S3OutputStream.newBackupFile(
> S3OutputStream.java:92)
>
> at org.apache.hadoop.fs.s3.S3OutputStream.(S3OutputStream.java:84)
>
> at org.apache.hadoop.fs.s3.S3FileSystem.create(S3FileSystem.java:252)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:915)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:896)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:793)
>
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:782)
>
> at
> org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(
> HadoopFileSystem.java:103)
>
> at
> org.apache.beam.sdk.io.hdfs.HadoopFileSystem.create(
> HadoopFileSystem.java:67)
>
> at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:207)
>
> at org.apache.beam.sdk.io.FileSystems.create(FileSystems.java:194)
>
> at org.apache.beam.sdk.io.FileBasedSink$Writer.open(
> FileBasedSink.java:876)
>
> at
> org.apache.beam.sdk.io.FileBasedSink$Writer.openUnwindowed(FileBasedSink.
> java:842)
>
> at
> org.apache.beam.sdk.io.WriteFiles$WriteShardedBundles.
> processElement(WriteFiles.java:362)
>
>
>
> --
> Best Regards,
> Jyotirmoy Sundi
>


Re: Build Failure in * release-2.0.0

2017-07-05 Thread Ted Yu
bq. Caused by: java.net.SocketException: Too many open files

Please adjust ulimit.

FYI

On Wed, Jul 5, 2017 at 1:33 PM, Jyotirmoy Sundi  wrote:

> Hi Folks ,
>
> Any idea why the build is failing in release-2.0.0 , i did "mvn clean
> package"
>
>
> *Trace*
>
> [INFO] Running org.apache.beam.sdk.io.hbase.HBaseResultCoderTest
>
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
> 0.461 s - in org.apache.beam.sdk.io.hbase.HBaseResultCoderTest
>
> [INFO] Running org.apache.beam.sdk.io.hbase.HBaseIOTest
>
> [ERROR] Tests run: 17, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 4.504 s <<< FAILURE! - in org.apache.beam.sdk.io.hbase.HBaseIOTest
>
> [ERROR] testReadingWithKeyRange(org.apache.beam.sdk.io.hbase.HBaseIOTest)
> Time
> elapsed: 4.504 s  <<< ERROR!
>
> java.lang.RuntimeException:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=1, exceptions:
>
> Wed Jul 05 13:31:23 PDT 2017,
> RpcRetryingCaller{globalStartTime=1499286683193, pause=100, retries=1},
> java.net.SocketException: Too many open files
>
>
> at
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.
> waitUntilFinish(DirectRunner.java:330)
>
> at
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.
> waitUntilFinish(DirectRunner.java:292)
>
> at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:200)
>
> at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:63)
>
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
>
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
>
> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:340)
>
> at
> org.apache.beam.sdk.io.hbase.HBaseIOTest.runReadTestLength(
> HBaseIOTest.java:418)
>
> at
> org.apache.beam.sdk.io.hbase.HBaseIOTest.testReadingWithKeyRange(
> HBaseIOTest.java:253)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 62)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
> FrameworkMethod.java:50)
>
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(
> ReflectiveCallable.java:12)
>
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(
> FrameworkMethod.java:47)
>
> at
> org.junit.internal.runners.statements.InvokeMethod.
> evaluate(InvokeMethod.java:17)
>
> at
> org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:321)
>
> at
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.
> evaluate(ExpectedException.java:239)
>
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(
> BlockJUnit4ClassRunner.java:78)
>
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(
> BlockJUnit4ClassRunner.java:57)
>
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>
> at
> org.apache.maven.surefire.junitcore.pc.Scheduler$1.run(Scheduler.java:393)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Failed
> after attempts=1, exceptions:
>
> Wed Jul 05 13:31:23 PDT 2017,
> RpcRetryingCaller{globalStartTime=1499286683193, pause=100, retries=1},
> java.net.SocketException: Too many open files
>
>
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> RpcRetryingCaller.java:157)
>
> at
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService
> $QueueingFuture.run(ResultBoundedCompletionService.java:65)
>
> ... 3 more
>
> Caused by: java.net.SocketException: Too many open files
>
> at sun.nio.ch.Net.socket0(Native Method)
>
> at sun.nio.ch.Net.socket(Net.java:411)
>
> at sun.nio.ch.Net.socket(Net.java:404)
>
> at sun.nio.ch.SocketChannelImpl.(SocketChannelImpl.java:105)
>
> at
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(
> SelectorProviderImpl.java:60)
>
> at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
>
> at
> org.apache.hadoop.net.StandardSocketFactory.createSocket(
> StandardSocketFactory.java:62)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> setupConnection(RpcClientImpl.java:410)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> setupIOstreams(RpcClientImpl.java:722)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> writeRequest(RpcClientImpl.java:906)
>
> at
> 

Re: Build failed in Jenkins: beam_PerformanceTests_Python #47

2017-07-01 Thread Ted Yu
I checked out PerfKitBenchmarker.
It seems we can get rid of the error either by defining a default value for
runner_profile_override :

+def AddRunnerProfileMvnArgument(service_type, mvn_command,
+runner_profile_override):

Or passing None in the places where AddRunnerProfileMvnArgument is called.

On Sat, Jul 1, 2017 at 9:19 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> >   File "<https://builds.apache.org/job/beam_PerformanceTests_
> > Python/ws/PerfKitBenchmarker/perfkitbenchmarker/beam_benchma
> rk_helper.py",>
> > line 136, in InitializeBeamRepo
> > mvn_command)
> > TypeError: AddRunnerProfileMvnArgument() takes exactly 3 arguments (2
> > given)
>
> Looks like 3rd parameter is needed to initialize repo.
>
> beam_benchmark_helper.py doesn't seem to be in source repo, though.
>
>
> On Fri, Jun 30, 2017 at 10:55 PM, Kenneth Knowles <k...@google.com.invalid>
> wrote:
>
>> This build has been failing for two days. Does anyone have any insights?
>>
>> On Fri, Jun 30, 2017 at 10:50 PM, Apache Jenkins Server <
>> jenk...@builds.apache.org> wrote:
>>
>> > See <https://builds.apache.org/job/beam_PerformanceTests_
>> > Python/47/display/redirect>
>> >
>> > --
>> > Started by timer
>> > [EnvInject] - Loading node environment variables.
>> > Building remotely on beam3 (beam) in workspace <
>> https://builds.apache.org/
>> > job/beam_PerformanceTests_Python/ws/>
>> >  > git rev-parse --is-inside-work-tree # timeout=10
>> > Fetching changes from the remote Git repository
>> >  > git config remote.origin.url https://github.com/apache/beam.git #
>> > timeout=10
>> > Fetching upstream changes from https://github.com/apache/beam.git
>> >  > git --version # timeout=10
>> >  > git fetch --tags --progress https://github.com/apache/beam.git
>> > +refs/heads/*:refs/remotes/origin/* +refs/pull/${ghprbPullId}/*:
>> > refs/remotes/origin/pr/${ghprbPullId}/*
>> >  > git rev-parse origin/master^{commit} # timeout=10
>> > Checking out Revision 0e429b33ff85eba08da5018c9febd0b99b44f720
>> > (origin/master)
>> >  > git config core.sparsecheckout # timeout=10
>> >  > git checkout -f 0e429b33ff85eba08da5018c9febd0b99b44f720
>> >  > git rev-list 0e429b33ff85eba08da5018c9febd0b99b44f720 # timeout=10
>> > Cleaning workspace
>> >  > git rev-parse --verify HEAD # timeout=10
>> > Resetting working tree
>> >  > git reset --hard # timeout=10
>> >  > git clean -fdx # timeout=10
>> > [EnvInject] - Executing scripts and injecting environment variables
>> after
>> > the SCM step.
>> > [EnvInject] - Injecting as environment variables the properties content
>> > SPARK_LOCAL_IP=127.0.0.1
>> >
>> > [EnvInject] - Variables injected successfully.
>> > [beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/
>> > hudson1915973589795310141.sh
>> > + rm -rf PerfKitBenchmarker
>> > [beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/
>> > hudson7830718553943537745.sh
>> > + git clone https://github.com/GoogleCloudPlatform/PerfKitBenchmarker.
>> git
>> > Cloning into 'PerfKitBenchmarker'...
>> > [beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/
>> > hudson7887504202385599721.sh
>> > + pip install --user -r PerfKitBenchmarker/requirements.txt
>> > Requirement already satisfied (use --upgrade to upgrade):
>> > python-gflags==3.1.1 in /home/jenkins/.local/lib/pytho
>> n2.7/site-packages
>> > (from -r PerfKitBenchmarker/requirements.txt (line 14))
>> > Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in
>> > /usr/local/lib/python2.7/dist-packages (from -r
>> PerfKitBenchmarker/requirements.txt
>> > (line 15))
>> > Requirement already satisfied (use --upgrade to upgrade): setuptools in
>> > /usr/lib/python2.7/dist-packages (from -r
>> PerfKitBenchmarker/requirements.txt
>> > (line 16))
>> > Requirement already satisfied (use --upgrade to upgrade):
>> > colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/pytho
>> n2.7/site-packages
>> > (from -r PerfKitBenchmarker/requirements.txt (line 17))
>> >   Installing extra requirements: 'windows'
>> > Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3
>> in
>> > /home/jenkins/.local/lib/python2.7/site-packages (from -r
>> > PerfKitBenchmarker/r

Re: Build failed in Jenkins: beam_PerformanceTests_Python #47

2017-07-01 Thread Ted Yu
>   File " Python/ws/PerfKitBenchmarker/perfkitbenchmarker/beam_
benchmark_helper.py",>
> line 136, in InitializeBeamRepo
> mvn_command)
> TypeError: AddRunnerProfileMvnArgument() takes exactly 3 arguments (2
> given)

Looks like 3rd parameter is needed to initialize repo.

beam_benchmark_helper.py doesn't seem to be in source repo, though.


On Fri, Jun 30, 2017 at 10:55 PM, Kenneth Knowles 
wrote:

> This build has been failing for two days. Does anyone have any insights?
>
> On Fri, Jun 30, 2017 at 10:50 PM, Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
>
> > See  > Python/47/display/redirect>
> >
> > --
> > Started by timer
> > [EnvInject] - Loading node environment variables.
> > Building remotely on beam3 (beam) in workspace <
> https://builds.apache.org/
> > job/beam_PerformanceTests_Python/ws/>
> >  > git rev-parse --is-inside-work-tree # timeout=10
> > Fetching changes from the remote Git repository
> >  > git config remote.origin.url https://github.com/apache/beam.git #
> > timeout=10
> > Fetching upstream changes from https://github.com/apache/beam.git
> >  > git --version # timeout=10
> >  > git fetch --tags --progress https://github.com/apache/beam.git
> > +refs/heads/*:refs/remotes/origin/* +refs/pull/${ghprbPullId}/*:
> > refs/remotes/origin/pr/${ghprbPullId}/*
> >  > git rev-parse origin/master^{commit} # timeout=10
> > Checking out Revision 0e429b33ff85eba08da5018c9febd0b99b44f720
> > (origin/master)
> >  > git config core.sparsecheckout # timeout=10
> >  > git checkout -f 0e429b33ff85eba08da5018c9febd0b99b44f720
> >  > git rev-list 0e429b33ff85eba08da5018c9febd0b99b44f720 # timeout=10
> > Cleaning workspace
> >  > git rev-parse --verify HEAD # timeout=10
> > Resetting working tree
> >  > git reset --hard # timeout=10
> >  > git clean -fdx # timeout=10
> > [EnvInject] - Executing scripts and injecting environment variables after
> > the SCM step.
> > [EnvInject] - Injecting as environment variables the properties content
> > SPARK_LOCAL_IP=127.0.0.1
> >
> > [EnvInject] - Variables injected successfully.
> > [beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/
> > hudson1915973589795310141.sh
> > + rm -rf PerfKitBenchmarker
> > [beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/
> > hudson7830718553943537745.sh
> > + git clone https://github.com/GoogleCloudPlatform/
> PerfKitBenchmarker.git
> > Cloning into 'PerfKitBenchmarker'...
> > [beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/
> > hudson7887504202385599721.sh
> > + pip install --user -r PerfKitBenchmarker/requirements.txt
> > Requirement already satisfied (use --upgrade to upgrade):
> > python-gflags==3.1.1 in /home/jenkins/.local/lib/python2.7/site-packages
> > (from -r PerfKitBenchmarker/requirements.txt (line 14))
> > Requirement already satisfied (use --upgrade to upgrade): jinja2>=2.7 in
> > /usr/local/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/
> requirements.txt
> > (line 15))
> > Requirement already satisfied (use --upgrade to upgrade): setuptools in
> > /usr/lib/python2.7/dist-packages (from -r PerfKitBenchmarker/
> requirements.txt
> > (line 16))
> > Requirement already satisfied (use --upgrade to upgrade):
> > colorlog[windows]==2.6.0 in /home/jenkins/.local/lib/
> python2.7/site-packages
> > (from -r PerfKitBenchmarker/requirements.txt (line 17))
> >   Installing extra requirements: 'windows'
> > Requirement already satisfied (use --upgrade to upgrade): blinker>=1.3 in
> > /home/jenkins/.local/lib/python2.7/site-packages (from -r
> > PerfKitBenchmarker/requirements.txt (line 18))
> > Requirement already satisfied (use --upgrade to upgrade): futures>=3.0.3
> > in /home/jenkins/.local/lib/python2.7/site-packages (from -r
> > PerfKitBenchmarker/requirements.txt (line 19))
> > Requirement already satisfied (use --upgrade to upgrade): PyYAML==3.12 in
> > /home/jenkins/.local/lib/python2.7/site-packages (from -r
> > PerfKitBenchmarker/requirements.txt (line 20))
> > Requirement already satisfied (use --upgrade to upgrade): pint>=0.7 in
> > /home/jenkins/.local/lib/python2.7/site-packages (from -r
> > PerfKitBenchmarker/requirements.txt (line 21))
> > Requirement already satisfied (use --upgrade to upgrade): numpy in
> > /home/jenkins/.local/lib/python2.7/site-packages (from -r
> > PerfKitBenchmarker/requirements.txt (line 22))
> > Requirement already satisfied (use --upgrade to upgrade): functools32 in
> > /home/jenkins/.local/lib/python2.7/site-packages (from -r
> > PerfKitBenchmarker/requirements.txt (line 23))
> > Requirement already satisfied (use --upgrade to upgrade):
> > contextlib2>=0.5.1 in /home/jenkins/.local/lib/python2.7/site-packages
> > (from -r PerfKitBenchmarker/requirements.txt (line 24))
> > Cleaning up...
> > [beam_PerformanceTests_Python] $ /bin/bash -xe /tmp/
> > hudson8072560625143652403.sh
> > + 

Re: Java Cross-JDK Test Available on Jenkins Postcommit!

2017-06-09 Thread Ted Yu
This is nice.

Nit: I don't see JDK 1.8 (Oracle) variant. Is this intentional ?

Thanks

On Fri, Jun 9, 2017 at 3:49 PM, Mark Liu  wrote:

> Hi all,
>
> I worked on BEAM-1544 
> for
> a while, which is to bring Java cross-JDK test to Jenkins, and finished
> this task recently. Now, there is a post commit Jenkins build running on
> three different Java JDK versions (Jenkins link
> )!
> This
> test has already caught several issues, and I hope it continues to provide
> quick insights into the functionality of Beam across JDK versions.
>
> Here are some details:
>
>- Current JDK versions used in the test:
>- JDK 1.7
>   - OpenJDK 7
>   - OpenJDK 8
>- Same coverage as beam_PostCommit_Java_MavenInstall.
>- Scheduled to run every 6 hours, concurrently.
>
> Known issues:
>
>- PR-3320  (just merged):
>Compiling error in Flink runners
>- BEAM-2425 : Package
>does not exit when building beam-sdks-java-javadoc in JDK1.7
>- BEAM-2322 ,
> BEAM-2323
>, BEAM-2324
>: some Apex tests
>failed if project directory contains space
>
> Mark
>


Re: Precommit Jenkins Linkage Broken

2017-05-30 Thread Ted Yu
INFRA-14247 is currently marked Major.

Suggest raising the priority so that it gets more attention.

Cheers

On Tue, May 30, 2017 at 2:59 PM, Jason Kuster <
jasonkus...@google.com.invalid> wrote:

> Hey folks,
>
> Just wanted to mention on the dev list that Jenkins precommit breakage is a
> known issue and has been escalated to Infra (thanks JB!)[1]. I'm monitoring
> the issue and will ping back here with any updates and when it starts
> working again.
>
> Best,
>
> Jason
>
> [1] https://issues.apache.org/jira/browse/INFRA-14247
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>


Re: How can I disable running Python SDK tests when testing my Java change?

2017-05-19 Thread Ted Yu
I logged BEAM-2335 with some of my findings.

Cheers

On Fri, May 19, 2017 at 12:38 PM, Borisa Zivkovic <
borisha.zivko...@gmail.com> wrote:

> I think it should be added. I am compiling a list of useful maven commands
> to put there. But it takes time.
>
> For example, how do I execute only one test marked as @NeedsRunner?
> How do I execute one specific test in java io?
> How to execute one pecific test in any of the runners?
> How to use beamTestpipelineoptions with few json examples?
> Will mvn clean verify execute ALL tests against all runners?
>
> I think this kind of information would be very useful to speed up new
> developers.
> To figure this out one has to go through pom files.
>
> Cheers
>
> On Fri 19 May 2017 at 18:41, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Should this tip be added to https://beam.apache.org/contribute/testing/
> ?
> >
> > Cheers
> >
> > On Fri, May 19, 2017 at 10:31 AM, Lukasz Cwik <lc...@google.com.invalid>
> > wrote:
> >
> > > Developers can use *-pl \!sdks/python* to skip the python module.
> > >
> > > Breaking it up would help developers working outside of Python and
> would
> > > decrease the precommit/postcommit execution times.
> > >
> > > On Thu, May 18, 2017 at 7:24 PM, Robert Bradshaw <
> > > rober...@google.com.invalid> wrote:
> > >
> > > > We could consider splitting Python up into the four things it runs:
> > > > all tests with Cython, all tests without Cython, docs, and
> checkstyle.
> > > > However, I never use Maven when developing the python portions.
> > > >
> > > > On Thu, May 18, 2017 at 6:35 PM, Thomas Groh
> <tg...@google.com.invalid
> > >
> > > > wrote:
> > > > > Generally I pass "-am -amd -pl sdks/java/core" to my maven
> > invocation.
> > > > -pl
> > > > > is the module to build, -am indicates to also make all modules my
> > > target
> > > > > depends upon, and -amd indicates to also make all of the
> > dependencies;
> > > so
> > > > > if you're only modifying java, that should hit everything. If
> you're
> > > > making
> > > > > another module, you can specify that as the -pl target, and if you
> > > > > 'install' instead of 'verify' you can resume arbitrarily.
> > > > >
> > > > > On Thu, May 18, 2017 at 4:29 PM, Eugene Kirpichov <
> > > > > kirpic...@google.com.invalid> wrote:
> > > > >
> > > > >> I've noticed that when I run "mvn verify", most of the time when I
> > > look
> > > > at
> > > > >> the screen it's running Python tests.
> > > > >>
> > > > >> Indeed, the Reactor Summary says:
> > > > >> ...
> > > > >> [INFO] Apache Beam :: SDKs :: Python ..
> SUCCESS
> > > > [11:56
> > > > >> min]
> > > > >> ...
> > > > >> [INFO] Total time: 12:03 min (Wall Clock)
> > > > >>
> > > > >> i.e. it's clearly on the critical path. The longest other project
> is
> > > > >> 02:17min (Runners::Spark).
> > > > >>
> > > > >> Are our .pom files customizable with an option to run only Java
> > tests?
> > > > (or,
> > > > >> respectively, only Python tests)
> > > > >>
> > > > >> Thanks.
> > > > >>
> > > >
> > >
> >
>


Re: How can I disable running Python SDK tests when testing my Java change?

2017-05-19 Thread Ted Yu
Should this tip be added to https://beam.apache.org/contribute/testing/ ?

Cheers

On Fri, May 19, 2017 at 10:31 AM, Lukasz Cwik 
wrote:

> Developers can use *-pl \!sdks/python* to skip the python module.
>
> Breaking it up would help developers working outside of Python and would
> decrease the precommit/postcommit execution times.
>
> On Thu, May 18, 2017 at 7:24 PM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
> > We could consider splitting Python up into the four things it runs:
> > all tests with Cython, all tests without Cython, docs, and checkstyle.
> > However, I never use Maven when developing the python portions.
> >
> > On Thu, May 18, 2017 at 6:35 PM, Thomas Groh 
> > wrote:
> > > Generally I pass "-am -amd -pl sdks/java/core" to my maven invocation.
> > -pl
> > > is the module to build, -am indicates to also make all modules my
> target
> > > depends upon, and -amd indicates to also make all of the dependencies;
> so
> > > if you're only modifying java, that should hit everything. If you're
> > making
> > > another module, you can specify that as the -pl target, and if you
> > > 'install' instead of 'verify' you can resume arbitrarily.
> > >
> > > On Thu, May 18, 2017 at 4:29 PM, Eugene Kirpichov <
> > > kirpic...@google.com.invalid> wrote:
> > >
> > >> I've noticed that when I run "mvn verify", most of the time when I
> look
> > at
> > >> the screen it's running Python tests.
> > >>
> > >> Indeed, the Reactor Summary says:
> > >> ...
> > >> [INFO] Apache Beam :: SDKs :: Python .. SUCCESS
> > [11:56
> > >> min]
> > >> ...
> > >> [INFO] Total time: 12:03 min (Wall Clock)
> > >>
> > >> i.e. it's clearly on the critical path. The longest other project is
> > >> 02:17min (Runners::Spark).
> > >>
> > >> Are our .pom files customizable with an option to run only Java tests?
> > (or,
> > >> respectively, only Python tests)
> > >>
> > >> Thanks.
> > >>
> >
>


Re: [PROPOSAL] design of DSL SQL interface

2017-05-13 Thread Ted Yu
Can you fill out the Transition Plan ?

Thanks

On Fri, May 12, 2017 at 10:49 PM, Mingmin Xu  wrote:

> Hi all,
>
> As you may know, we're working on BeamSQL to execute SQL queries as a Beam
> pipeline. This is a valuable feature, not only shipped as a packaged CLI,
> but also as part of the SDK to assemble a pipeline.
>
> I prepare a document[1] to list the high level APIs, to show how SQL
> queries can be added in a pipeline. Below is a snippet of pseudocode for a
> quick reference:
>
> PipelineOptions options =  PipelineOptionsFactory...
> Pipeline pipeline = Pipeline.create(options);
>
> //prepare environment of BeamSQL
> BeamSQLEnvironment sqlEnv = BeamSQLEnvironment.create(pipeline);
> //register table metadata
> sqlEnv.addTableMetadata(String tableName, BeamSqlTable tableMetadata);
> //register UDF
>
> sqlEnv.registerUDF(String functionName, Method udfMethod);
>
>
> //explain a SQL statement, SELECT only, and return as a PCollection;
> PCollection phase1Stream = sqlEnv.explainSQL(String
> sqlStatement);
> //A PCollection explained by BeamSQL can be converted into a table, and
> apply queries on it;
> sqlEnv.registerPCollectionAsTable(String tableName, phase1Stream);
>
> //apply more queries, even based on phase1Stream
>
> pipeline.run().waitUntilFinish();
>
> Any feedback is very welcome!
>
> [1]
> https://docs.google.com/document/d/1uWXL_yF3UUO5GfCxbL6kWsmC8xCWfICU3Rw
> iQKsk7Mk/edit?usp=sharing
>
> --
> 
> Mingmin
>


Re: First stable release: version designation?

2017-05-04 Thread Ted Yu
What's the difference between first and second, third and fourth columns ?

On Thu, May 4, 2017 at 3:36 PM, María García Herrero <
mari...@google.com.invalid> wrote:

> Thanks for the suggestion, Ted. Get your vote in here
> <https://docs.google.com/document/d/1ABx3U8ojcfUkFig3hG53lOYl73tdk
> Wqz5B6eQ40TEgk/edit?usp=sharing>
> .
> I have already added all the votes that Davor compiled 3 hours ago and the
> responses afterwards.
>
> On Thu, May 4, 2017 at 12:49 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Maybe create a google doc with columns as the camps.
> >
> > Each person can put his/her name under the camp in his/her favor.
> >
> > On Thu, May 4, 2017 at 12:32 PM, Thomas Weise <t...@apache.org> wrote:
> >
> > > I'm in the relaxed 1.0.0 camp.
> > >
> > > --
> > > sent from mobile
> > > On May 4, 2017 12:29 PM, "Mingmin Xu" <mingm...@gmail.com> wrote:
> > >
> > > > I slightly prefer1.0.0 for the *first* stable release, but fine with
> > > 2.0.0.
> > > >
> > > > On Thu, May 4, 2017 at 12:25 PM, Lukasz Cwik
> <lc...@google.com.invalid
> > >
> > > > wrote:
> > > >
> > > > > Put me under Strongly for 2.0.0
> > > > >
> > > > > On Thu, May 4, 2017 at 12:24 PM, Kenneth Knowles
> > > <k...@google.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > > I'll join Davor's group.
> > > > > >
> > > > > > On Thu, May 4, 2017 at 12:07 PM, Davor Bonaci <da...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > > I don't think we have reached a consensus here yet. Let's
> > > re-examine
> > > > > this
> > > > > > > after some time has passed.
> > > > > > >
> > > > > > > If I understand everyone's opinion correctly, this is the
> > summary:
> > > > > > >
> > > > > > > Strongly for 2.0.0:
> > > > > > > * Aljoscha
> > > > > > > * Dan
> > > > > > >
> > > > > > > Slight preference toward 2.0.0, but fine with 1.0.0:
> > > > > > > * Davor
> > > > > > >
> > > > > > > Strongly for 1.0.0: none.
> > > > > > >
> > > > > > > Slight preference toward 1.0.0, but fine with 2.0.0:
> > > > > > > * Amit
> > > > > > > * Jesse
> > > > > > > * JB
> > > > > > > * Ted
> > > > > > >
> > > > > > > Any additional opinions?
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Davor
> > > > > > >
> > > > > > > On Wed, Mar 8, 2017 at 12:58 PM, Amit Sela <
> amitsel...@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > If we were to go with a 2.0 release, we would have to be very
> > > clear
> > > > > on
> > > > > > > > maturity of different modules; for example python SDK is not
> as
> > > > > mature
> > > > > > as
> > > > > > > > Java SDK, some runners support streaming better than others,
> > some
> > > > run
> > > > > > on
> > > > > > > > YARN better than others, etc.
> > > > > > > >
> > > > > > > > My only reservation here is that the Apache community usually
> > > > expects
> > > > > > > > version 2.0 to be a mature products, so I'm OK as long as we
> do
> > > > some
> > > > > > > > "maturity-analysis" and document properly.
> > > > > > > >
> > > > > > > > On Tue, Mar 7, 2017 at 4:48 AM Ted Yu <yuzhih...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > If we end up with version 2.0, more effort (trying out more
> > use
> > > > > > > scenarios
> > > > > > > > > e.g.) should go into release process to make sure what is
> > > > released
> > > > > is
> > > > > > > > > indeed stable.
> > > 

Re: First stable release: version designation?

2017-05-04 Thread Ted Yu
Maybe create a google doc with columns as the camps.

Each person can put his/her name under the camp in his/her favor.

On Thu, May 4, 2017 at 12:32 PM, Thomas Weise <t...@apache.org> wrote:

> I'm in the relaxed 1.0.0 camp.
>
> --
> sent from mobile
> On May 4, 2017 12:29 PM, "Mingmin Xu" <mingm...@gmail.com> wrote:
>
> > I slightly prefer1.0.0 for the *first* stable release, but fine with
> 2.0.0.
> >
> > On Thu, May 4, 2017 at 12:25 PM, Lukasz Cwik <lc...@google.com.invalid>
> > wrote:
> >
> > > Put me under Strongly for 2.0.0
> > >
> > > On Thu, May 4, 2017 at 12:24 PM, Kenneth Knowles
> <k...@google.com.invalid
> > >
> > > wrote:
> > >
> > > > I'll join Davor's group.
> > > >
> > > > On Thu, May 4, 2017 at 12:07 PM, Davor Bonaci <da...@apache.org>
> > wrote:
> > > >
> > > > > I don't think we have reached a consensus here yet. Let's
> re-examine
> > > this
> > > > > after some time has passed.
> > > > >
> > > > > If I understand everyone's opinion correctly, this is the summary:
> > > > >
> > > > > Strongly for 2.0.0:
> > > > > * Aljoscha
> > > > > * Dan
> > > > >
> > > > > Slight preference toward 2.0.0, but fine with 1.0.0:
> > > > > * Davor
> > > > >
> > > > > Strongly for 1.0.0: none.
> > > > >
> > > > > Slight preference toward 1.0.0, but fine with 2.0.0:
> > > > > * Amit
> > > > > * Jesse
> > > > > * JB
> > > > > * Ted
> > > > >
> > > > > Any additional opinions?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Davor
> > > > >
> > > > > On Wed, Mar 8, 2017 at 12:58 PM, Amit Sela <amitsel...@gmail.com>
> > > wrote:
> > > > >
> > > > > > If we were to go with a 2.0 release, we would have to be very
> clear
> > > on
> > > > > > maturity of different modules; for example python SDK is not as
> > > mature
> > > > as
> > > > > > Java SDK, some runners support streaming better than others, some
> > run
> > > > on
> > > > > > YARN better than others, etc.
> > > > > >
> > > > > > My only reservation here is that the Apache community usually
> > expects
> > > > > > version 2.0 to be a mature products, so I'm OK as long as we do
> > some
> > > > > > "maturity-analysis" and document properly.
> > > > > >
> > > > > > On Tue, Mar 7, 2017 at 4:48 AM Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > > > >
> > > > > > > If we end up with version 2.0, more effort (trying out more use
> > > > > scenarios
> > > > > > > e.g.) should go into release process to make sure what is
> > released
> > > is
> > > > > > > indeed stable.
> > > > > > >
> > > > > > > Normally people would have higher expectation on 2.0 release
> > > compared
> > > > > to
> > > > > > > 1.0 release.
> > > > > > >
> > > > > > > On Mon, Mar 6, 2017 at 6:34 PM, Davor Bonaci <da...@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > > > It sounds like we'll end up with two camps on this topic.
> This
> > > > issue
> > > > > is
> > > > > > > > probably best resolved with a vote, but I'll try to rephrase
> > the
> > > > > > question
> > > > > > > > once to see whether a consensus is possible.
> > > > > > > >
> > > > > > > > Instead of asking which option is better, does anyone think
> the
> > > > > project
> > > > > > > > would be negatively impacted if we were to decide on, in your
> > > > > opinion,
> > > > > > > the
> > > > > > > > less desirable variant? If so, can you comment on the
> negative
> > > > impact
> > > > > > of
> > > > > > > > the less desirable alternative please?
> > > > > > > &

Re: Congratulations Davor!

2017-05-04 Thread Ted Yu
Congratulations, Davor!

On Thu, May 4, 2017 at 12:45 AM, Aviem Zur  wrote:

> Congrats Davor! :)
>
> On Thu, May 4, 2017 at 10:42 AM Jean-Baptiste Onofré 
> wrote:
>
> > Congrats ! Well deserved ;)
> >
> > Regards
> > JB
> >
> > On 05/04/2017 09:30 AM, Jason Kuster wrote:
> > > Hi all,
> > >
> > > The ASF has just published a blog post[1] welcoming new members of the
> > > Apache Software Foundation, and our own Davor Bonaci is among them!
> > > Congratulations and thank you to Davor for all of your work for the
> Beam
> > > community, and the ASF at large. Well deserved.
> > >
> > > Best,
> > >
> > > Jason
> > >
> > > [1] https://blogs.apache.org/foundation/entry/the-apache-sof
> > > tware-foundation-welcomes
> > >
> > > P.S. I dug through the list to make sure I wasn't missing any other
> Beam
> > > community members; if I have, my sincerest apologies and please
> recognize
> > > them on this or a new thread.
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: git commit message

2017-04-30 Thread Ted Yu
I was actually talking about title of the PR.

>From the link, there is guideline which some of the recent commits didn't
follow:

The title of the pull request should be strictly in the following format:


On Sun, Apr 30, 2017 at 1:22 PM, Chamikara Jayalath <chamik...@apache.org>
wrote:

> Do we have a convention on the commit message ? Seems like Contributor's
> guide only talks about the title of the PR not the commit message. I might
> be missing something.
>
> https://beam.apache.org/contribute/contribution-guide/
> #create-a-pull-request
>
> Thanks,
> Cham
>
> On Sun, Apr 30, 2017 at 12:41 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Hi,
> >
> > IMHO, it's very rare. That's true we did couple of "fast" commit to fix
> > issues
> > (I remember at the beginning of the project).
> >
> > However, almost all changes go via PR and most of with a Jira.
> >
> > So, if maybe you found couple of, again, it's rare and we follow the
> > Jira/PR/review/commit path.
> >
> > Regards
> > JB
> >
> > On 04/30/2017 04:59 PM, Ted Yu wrote:
> > > Hi,
> > > When I went over git commit messages, I found some without either of
> the
> > > following:
> > >
> > > JIRA number
> > > PR number
> > >
> > > It would be nice for other people to get background / goal / discussion
> > > w.r.t. any commit if either (or both) of the above is present.
> > >
> > > My two cents.
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: Status of our CI tools

2017-04-28 Thread Ted Yu
+1

On Fri, Apr 28, 2017 at 6:24 PM, Thomas Groh 
wrote:

> +1! This will be really helpful when looking at my PRs; I basically get no
> signal from the current state of the github UI, and this will restore that
> to giving me very strong positive signal.
>
> On Fri, Apr 28, 2017 at 6:22 PM, Davor Bonaci  wrote:
>
> > Early on in the project, we've discussed our CI needs and concluded to
> use
> > ASF-hosted Jenkins as our preferred tool of choice. We've also enabled
> > Travis-CI, which covered some scenarios that Jenkins couldn't do at the
> > time, but with the idea to transition to Jenkins eventually.
> >
> > Over the last few months, Travis-CI has been broken consistently, and
> > several different kinds of infrastructure breakages have been added, one
> on
> > top of another. This has caused plenty of cost and confusion. In
> > particular, contributors often get confused as to which signal they
> should
> > care about.
> >
> > At the same time, Jenkins capabilities have improved greatly: multiple
> > parallel precommits are now supported, checked-in DSL support, pipelined
> > matrix builds, Google's donation of Jenkins executors more than doubled,
> > and others.
> >
> > So, based on the previous consensus and the fact the signal was broken
> for
> > a long time, Jason and I went and asked Infra to disable Travis-CI on our
> > code repository. (Website repository was disabled months ago.)
> >
> > I believe there should be minimal impact of this. The only two elements
> of
> > the Travis matrix that were passing (still) are Python SDK on the Linux &
> > Mac. Linux one can be trivially moved to Jenkins -- and I know Jason is
> > looking at that. Mac coverage is the only loss at the moment, but is
> > something we can likely address in the (near) future.
> >
> > I'm excited that we finally managed to unify our CI tooling, and can make
> > efforts on improving and maintaining one system as opposed to two. That
> > said, please comment if you have any worries about this or ideas for
> > further CI improvements ;-)
> >
> > Davor
> >
>


Re: Community hackathon

2017-04-24 Thread Ted Yu
+1

> On Apr 24, 2017, at 12:51 AM, Jean-Baptiste Onofré  wrote:
> 
> That's a wonderful idea !
> 
> I think the easiest way to organize this event is using the Slack channels to 
> discuss, help each other, and sync together.
> 
> Regards
> JB
> 
>> On 04/24/2017 09:48 AM, Davor Bonaci wrote:
>> We've been working as a community towards the first stable release for a
>> while now, and I think we made a ton of progress across the board over the
>> last few weeks.
>> 
>> We could try to organize a community-wide hackathon to identify and fix
>> those last few issues, as well as to get a better sense of the overall
>> project quality as it stands right now.
>> 
>> This could be a self-organized event, and coordinated via the Slack
>> channel. For example, we (as a community and participants) can try out the
>> project in various ways -- quickstart, examples, different runners,
>> different platforms -- immediately fixing issues as we run into them. It
>> could last, say, 24 hours, with people from different time zones
>> participating at the time of their choosing.
>> 
>> Thoughts?
>> 
>> Davor
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com


Re: Hanging Jenkins builds.

2017-04-22 Thread Ted Yu
Looks like this might be the cause for the failed build (
https://builds.apache.org/view/Beam/job/beam_PreCommit_Java_MavenInstall/9927/console
):

/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineFns.java:145:
error: reference notfound
   * See {@link #compose()} or {@link #composeKeyed()}) for details.
 ^
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/CombineFns.java:147:
warning - Tag @link:can't find composeKeyed() in
org.apache.beam.sdk.transforms.CombineFns.CoCombineResult

FYI

On Fri, Apr 21, 2017 at 11:17 PM, Aviem Zur  wrote:

> Hi all,
>
> Please be aware that Beam builds (precommit + postcommit validations) are
> hanging since a few hours ago.
>
> This seems to be a problem in builds of other projects as well (for
> example, Kafka).
>
> I've opened an INFRA ticket:
> https://issues.apache.org/jira/browse/INFRA-13949
>


Re: [DISCUSSION] Encouraging more contributions

2017-04-22 Thread Ted Yu
+1

On Sat, Apr 22, 2017 at 7:31 AM, Aviem Zur  wrote:

> Hi all,
>
> I wanted to start a discussion about actions we can take to encourage more
> contributions to the project.
>
> A few points I've been thinking of:
>
> 1. Have people unassign themselves from issues they're not actively working
> on.
> 2. Have the community engage more in triage, improving tickets descriptions
> and raising concerns.
> 3. Clean house - apply (2) to currently open issues (over 800). Perhaps
> some can be closed.
>
> Thoughts? Ideas?
>


Re: Build failed in Jenkins: beam_SeedJob #214

2017-04-18 Thread Ted Yu
Thanks Jason for the effort.
Looks like we hit this:

ERROR: script not yet approved for use


On Tue, Apr 18, 2017 at 10:16 AM, Jason Kuster <
jasonkus...@google.com.invalid> wrote:

> I'm looking into this currently as well; that's one of the mitigations I'm
> considering too but I'm giving the evaluate thing a try[1][2] (once it
> starts running -- executors are full currently).
>
> [1] https://builds.apache.org/view/Beam/job/beam_SeedJob/215/
> [2] https://github.com/apache/beam/pull/2578
>
> On Tue, Apr 18, 2017 at 10:12 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > To unblock the builds, how about embedding functions used by respective
> > scripts in the scripts themselves ?
> >
> > e.g. buildPerformanceTest is only used by the following scripts:
> >
> > .test-infra/jenkins/job_beam_PerformanceTests_Dataflow.groovy:
> >  common_job_properties.buildPerformanceTest(delegate, argMap)
> > .test-infra/jenkins/job_beam_PerformanceTests_JDBC.groovy:
> >  common_job_properties.buildPerformanceTest(delegate, argMap)
> > .test-infra/jenkins/job_beam_PerformanceTests_Spark.groovy:
> >  common_job_properties.buildPerformanceTest(delegate, argMap)
> >
> > On Tue, Apr 18, 2017 at 10:05 AM, Davor Bonaci <da...@apache.org> wrote:
> >
> > > Not so simple, unfortunately [1]. Ideas welcome ;-)
> > >
> > > Davor
> > >
> > > [1]
> > > https://github.com/jenkinsci/job-dsl-plugin/wiki/Migration#
> > > migrating-to-160
> > >
> > > On Tue, Apr 18, 2017 at 9:57 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > I wonder if we should adopt the suggestion here (involving evaluate):
> > > > http://stackoverflow.com/questions/9136328/including-a-
> > > > groovy-script-in-another-groovy
> > > >
> > > > On Tue, Apr 18, 2017 at 9:45 AM, Apache Jenkins Server <
> > > > jenk...@builds.apache.org> wrote:
> > > >
> > > > > See <https://builds.apache.org/job/beam_SeedJob/214/display/
> > > > > redirect?page=changes>
> > > > >
> > > > > Changes:
> > > > >
> > > > > [jbonofre] [BEAM-59] Register standard FileSystems wherever we
> > register
> > > > >
> > > > > [iemejia] Enable flink dependency enforcement and make dependencies
> > > > > explicit
> > > > >
> > > > > [iemejia] Fix Javadoc warnings on Flink Runner
> > > > >
> > > > > [iemejia] Remove flink-annotations dependency
> > > > >
> > > > > [iemejia] [BEAM-1993] Remove special unbounded Flink source/sink
> > > > >
> > > > > [tgroh] Translate PTransforms to and from Runner API Protos
> > > > >
> > > > > [altay] Clean up DirectRunner Clock and TransformResult
> > > > >
> > > > > [altay] Remove overloading of __call__ in DirectRunner
> > > > >
> > > > > --
> > > > > [...truncated 202.75 KB...]
> > > > >  x [deleted] (none) -> origin/pr/902/merge
> > > > >  x [deleted] (none) -> origin/pr/903/head
> > > > >  x [deleted] (none) -> origin/pr/903/merge
> > > > >  x [deleted] (none) -> origin/pr/904/head
> > > > >  x [deleted] (none) -> origin/pr/904/merge
> > > > >  x [deleted] (none) -> origin/pr/905/head
> > > > >  x [deleted] (none) -> origin/pr/905/merge
> > > > >  x [deleted] (none) -> origin/pr/906/head
> > > > >  x [deleted] (none) -> origin/pr/906/merge
> > > > >  x [deleted] (none) -> origin/pr/907/head
> > > > >  x [deleted] (none) -> origin/pr/907/merge
> > > > >  x [deleted] (none) -> origin/pr/908/head
> > > > >  x [deleted] (none) -> origin/pr/909/head
> > > > >  x [deleted] (none) -> origin/pr/909/merge
> > > > >  x [deleted] (none) -> origin/pr/91/head
> > > > >  x [deleted] (none) -> origin/pr/91/merge
> > > > >  x [deleted] (none) -> origin/pr/910/head
> > > > >  x [deleted] (none) -> origin/pr/911/head
> > > > >  x [deleted] (none) -> origin/pr/911/merge
> > > > >  x [deleted]  

Re: Build failed in Jenkins: beam_SeedJob #214

2017-04-18 Thread Ted Yu
To unblock the builds, how about embedding functions used by respective
scripts in the scripts themselves ?

e.g. buildPerformanceTest is only used by the following scripts:

.test-infra/jenkins/job_beam_PerformanceTests_Dataflow.groovy:
 common_job_properties.buildPerformanceTest(delegate, argMap)
.test-infra/jenkins/job_beam_PerformanceTests_JDBC.groovy:
 common_job_properties.buildPerformanceTest(delegate, argMap)
.test-infra/jenkins/job_beam_PerformanceTests_Spark.groovy:
 common_job_properties.buildPerformanceTest(delegate, argMap)

On Tue, Apr 18, 2017 at 10:05 AM, Davor Bonaci <da...@apache.org> wrote:

> Not so simple, unfortunately [1]. Ideas welcome ;-)
>
> Davor
>
> [1]
> https://github.com/jenkinsci/job-dsl-plugin/wiki/Migration#
> migrating-to-160
>
> On Tue, Apr 18, 2017 at 9:57 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > I wonder if we should adopt the suggestion here (involving evaluate):
> > http://stackoverflow.com/questions/9136328/including-a-
> > groovy-script-in-another-groovy
> >
> > On Tue, Apr 18, 2017 at 9:45 AM, Apache Jenkins Server <
> > jenk...@builds.apache.org> wrote:
> >
> > > See <https://builds.apache.org/job/beam_SeedJob/214/display/
> > > redirect?page=changes>
> > >
> > > Changes:
> > >
> > > [jbonofre] [BEAM-59] Register standard FileSystems wherever we register
> > >
> > > [iemejia] Enable flink dependency enforcement and make dependencies
> > > explicit
> > >
> > > [iemejia] Fix Javadoc warnings on Flink Runner
> > >
> > > [iemejia] Remove flink-annotations dependency
> > >
> > > [iemejia] [BEAM-1993] Remove special unbounded Flink source/sink
> > >
> > > [tgroh] Translate PTransforms to and from Runner API Protos
> > >
> > > [altay] Clean up DirectRunner Clock and TransformResult
> > >
> > > [altay] Remove overloading of __call__ in DirectRunner
> > >
> > > --
> > > [...truncated 202.75 KB...]
> > >  x [deleted] (none) -> origin/pr/902/merge
> > >  x [deleted] (none) -> origin/pr/903/head
> > >  x [deleted] (none) -> origin/pr/903/merge
> > >  x [deleted] (none) -> origin/pr/904/head
> > >  x [deleted] (none) -> origin/pr/904/merge
> > >  x [deleted] (none) -> origin/pr/905/head
> > >  x [deleted] (none) -> origin/pr/905/merge
> > >  x [deleted] (none) -> origin/pr/906/head
> > >  x [deleted] (none) -> origin/pr/906/merge
> > >  x [deleted] (none) -> origin/pr/907/head
> > >  x [deleted] (none) -> origin/pr/907/merge
> > >  x [deleted] (none) -> origin/pr/908/head
> > >  x [deleted] (none) -> origin/pr/909/head
> > >  x [deleted] (none) -> origin/pr/909/merge
> > >  x [deleted] (none) -> origin/pr/91/head
> > >  x [deleted] (none) -> origin/pr/91/merge
> > >  x [deleted] (none) -> origin/pr/910/head
> > >  x [deleted] (none) -> origin/pr/911/head
> > >  x [deleted] (none) -> origin/pr/911/merge
> > >  x [deleted] (none) -> origin/pr/912/head
> > >  x [deleted] (none) -> origin/pr/912/merge
> > >  x [deleted] (none) -> origin/pr/913/head
> > >  x [deleted] (none) -> origin/pr/913/merge
> > >  x [deleted] (none) -> origin/pr/914/head
> > >  x [deleted] (none) -> origin/pr/914/merge
> > >  x [deleted] (none) -> origin/pr/915/head
> > >  x [deleted] (none) -> origin/pr/915/merge
> > >  x [deleted] (none) -> origin/pr/916/head
> > >  x [deleted] (none) -> origin/pr/916/merge
> > >  x [deleted] (none) -> origin/pr/917/head
> > >  x [deleted] (none) -> origin/pr/917/merge
> > >  x [deleted] (none) -> origin/pr/918/head
> > >  x [deleted] (none) -> origin/pr/918/merge
> > >  x [deleted] (none) -> origin/pr/919/head
> > >  x [deleted] (none) -> origin/pr/919/merge
> > >  x [deleted] (none) -> origin/pr/92/head
> > >  x [deleted] (none) -> origin/pr/92/merge
> > >  x [deleted] (none) -> origin/pr/920/head
> > >  x [deleted

Re: Pipeline termination in the unified Beam model

2017-04-18 Thread Ted Yu
Why is the timeout needed for Spark ?

Thanks

> On Apr 18, 2017, at 3:05 AM, Etienne Chauchot  wrote:
> 
> +1 on "runners really terminate in a timely manner to easily programmatically 
> orchestrate Beam pipelines in a portable way, you do need to know whether
> the pipeline will finish without thinking about the specific runner and its 
> options"
> 
> As an example, in Nexmark, we have streaming mode tests, and for the 
> benchmark, we need all the queries to behave the same between runners towards 
> termination.
> 
> For now, to have the consistent behavior, in this mode we need to set a 
> timeout (a bit random and flaky) on waitUntilFinish() for spark but this 
> timeout is not needed for direct runner.
> 
> Etienne
> 
>> Le 02/03/2017 à 19:27, Kenneth Knowles a écrit :
>> Isn't this already the case? I think semantically it is an unavoidable
>> conclusion, so certainly +1 to that.
>> 
>> The DirectRunner and TestDataflowRunner both have this behavior already.
>> I've always considered that a streaming job running forever is just [very]
>> suboptimal shutdown latency :-)
>> 
>> Some bits of the discussion on the ticket seem to surround whether or how
>> to communicate this property in a generic way. Since a runner owns its
>> PipelineResult it doesn't seem necessary.
>> 
>> So is the bottom line just that you want to more strongly insist that
>> runners really terminate in a timely manner? I'm +1 to that, too, for
>> basically the reason Stas gives: In order to easily programmatically
>> orchestrate Beam pipelines in a portable way, you do need to know whether
>> the pipeline will finish without thinking about the specific runner and its
>> options (as with our RunnableOnService tests).
>> 
>> Kenn
>> 
>> On Thu, Mar 2, 2017 at 9:09 AM, Dan Halperin 
>> wrote:
>> 
>>> Note that even "unbounded pipeline in a streaming runner".waitUntilFinish()
>>> can return, e.g., if you cancel it or terminate it. It's totally reasonable
>>> for users to want to understand and handle these cases.
>>> 
>>> +1
>>> 
>>> Dan
>>> 
>>> On Thu, Mar 2, 2017 at 2:53 AM, Jean-Baptiste Onofré 
>>> wrote:
>>> 
 +1
 
 Good idea !!
 
 Regards
 JB
 
 
> On 03/02/2017 02:54 AM, Eugene Kirpichov wrote:
> 
> Raising this onto the mailing list from
> https://issues.apache.org/jira/browse/BEAM-849
> 
> The issue came up: what does it mean for a pipeline to finish, in the
>>> Beam
> model?
> 
> Note that I am deliberately not talking about "batch" and "streaming"
> pipelines, because this distinction does not exist in the model. Several
> runners have batch/streaming *modes*, which implement the same semantics
> (potentially different subsets: in batch mode typically a runner will
> reject pipelines that have at least one unbounded PCollection) but in an
> operationally different way. However we should define pipeline
>>> termination
> at the level of the unified model, and then make sure that all runners
>>> in
> all modes implement that properly.
> 
> One natural way is to say "a pipeline terminates when the output
> watermarks
> of all of its PCollection's progress to +infinity". (Note: this can be
> generalized, I guess, to having partial executions of a pipeline: if
> you're
> interested in the full contents of only some collections, then you wait
> until only the watermarks of those collections progress to infinity)
> 
> A typical "batch" runner mode does not implement watermarks - we can
>>> think
> of it as assigning watermark -infinity to an output of a transform that
> hasn't started executing yet, and +infinity to output of a transform
>>> that
> has finished executing. This is consistent with how such runners
>>> implement
> termination in practice.
> 
> Dataflow streaming runner additionally implements such termination for
> pipeline drain operation: it has 2 parts: 1) stop consuming input from
>>> the
> sources, and 2) wait until all watermarks progress to infinity.
> 
> Let us fill the gap by making this part of the Beam model and declaring
> that all runners should implement this behavior. This will give nice
> properties, e.g.:
> - A pipeline that has only bounded collections can be run by any runner
>>> in
> any mode, with the same results and termination behavior (this is
>>> actually
> my motivating example for raising this issue is: I was running
>>> Splittable
> DoFn tests
>  src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java>
> with the streaming Dataflow runner - these tests produce only bounded
> collections - and noticed that they wouldn't terminate even though all
> data
> was processed)
> - It will be possible to implement pipelines that 

Re: Renaming SideOutput

2017-04-12 Thread Ted Yu
+1

> On Apr 11, 2017, at 5:34 PM, Thomas Groh  wrote:
> 
> I think that's a good idea. I would call the outputs of a ParDo the "Main
> Output" and "Additional Outputs" - it seems like an easy way to make it
> clear that there's one output that is always expected, and there may be
> more.
> 
> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
> 
>> We should do some renaming in Python too. Right now we have
>> SideOutputValue which I'd propose naming TaggedOutput or something
>> like that.
>> 
>> Should the docs change too?
>> https://beam.apache.org/documentation/programming-guide/#transforms-sideio
>> 
>> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles 
>> wrote:
>>> +1 ditto about sideInput and sideOutput not actually being related
>>> 
>>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
>>> rober...@google.com.invalid> wrote:
>>> 
 +1, I think this is a lot clearer.
 
 On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk 
 wrote:
> strong +1 for changing the name away from sideOutput - the fact that
> sideInput and sideOutput are not really related was definitely a
>> source
 of
> confusion for me when learning beam.
> 
> S
> 
> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh >> 
> wrote:
> 
>> Hey everyone:
>> 
>> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
>> SDK).
>> 
>> Having two methods, both named output, one which takes the "main
>> output
>> type" and one that takes a tag to specify the type more clearly
>> communicates the actual behavior - sideOutput isn't a "special" way
>> to
>> output, it's the same as output(T), just to a specified PCollection.
 This
>> will help pipeline authors understand the actual behavior of
>> outputting
 to
>> a tag, and detangle it from "sideInput", which is a special way to
 receive
>> input. Giving them the same name means that it's not even strange to
 call
>> output and provide the main output type, which is what we want -
>> it's a
>> more specific way to output, but does not have different
>> restrictions or
>> capabilities.
>> 
>> This is also a pretty small change within the SDK - it touches about
>> 20
>> files, and the changes are pretty automatic.
>> 
>> Thanks,
>> 
>> Thomas
>> 


Re: IO ITs: Hosting Docker images

2017-04-08 Thread Ted Yu
+1

> On Apr 7, 2017, at 10:46 PM, Jean-Baptiste Onofré  wrote:
> 
> Hi Stephen,
> 
> I think we should go to 1 and 4:
> 
> 1. Try to use existing images providing what we need. If we don't find 
> existing image, we can always ask and help other community to provide so.
> 4. If we don't find a suitable image, and waiting for this image, we can 
> store the image in our own "IT dockerhub".
> 
> Regards
> JB
> 
>> On 04/08/2017 01:03 AM, Stephen Sisk wrote:
>> Wanted to see if anyone else had opinions on this/provide a quick update.
>> 
>> I think for both elasticsearch and HIFIO that we can find existing,
>> supported images that could serve those purposes - HIFIO is looking like
>> it'll able to do so for cassandra, which was proving tricky.
>> 
>> So to summarize my current proposed solutions: (ordered by my preference)
>> 1. (new) Strongly urge people to find existing docker images that meet our
>> image criteria - regularly updated/security checked
>> 2. Start using helm
>> 3. Push our docker images to docker hub
>> 4. Host our own public container registry
>> 
>> S
>> 
>>> On Tue, Apr 4, 2017 at 10:16 AM Stephen Sisk  wrote:
>>> 
>>> I'd like to hear what direction folks want to go in, and from there look
>>> at the options. I think for some of these options (like running our own
>>> public registry), they may be able to and it's something we should look at,
>>> but I don't assume they have time to work on this type of issue.
>>> 
>>> S
>>> 
>>> On Tue, Apr 4, 2017 at 10:00 AM Lukasz Cwik 
>>> wrote:
>>> 
>>> Is this something that Apache infra could help us with?
>>> 
>>> On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk 
>>> wrote:
>>> 
 Summary:
 
 For IO ITs that use data stores that need custom docker images in order
>>> to
 run, we can't currently use them in a kubernetes cluster (which is where
>>> we
 host our data stores.) I have a couple options for how to solve this and
>>> am
 looking for feedback from folks involved in creating IO ITs/opinions on
 kubernetes.
 
 
 Details:
 
 We've discussed in the past that we'll want to allow developers to submit
 just a dockerfile, and then we'll use that when creating the data store
>>> on
 kubernetes. This is the case for ElasticsearchIO and I assume more data
 stores in the future will want to do this. It's also looking like it'll
>>> be
 necessary to use custom docker images for the HadoopInputFormatIO's
 cassandra ITs - to run a cassandra cluster, there doesn't seem to be a
>>> good
 image you can use out of the box.
 
 In either case, in order to retrieve a docker image, kubernetes needs a
 container registry - it will read the docker images from there. A simple
 private container registry doesn't work because kubernetes config files
>>> are
 static - this means that if local devs try to use the kubernetes files,
 they point at the private container registry and they wouldn't be able to
 retrieve the images since they don't have access. They'd have to manually
 edit the files, which in theory is an option, but I don't consider that
>>> to
 be acceptable since it feels pretty unfriendly (it is simple, so if we
 really don't like the below options we can revisit it.)
 
 Quick summary of the options
 
 ===
 
 We can:
 
 * Start using something like k8 helm - this adds more dependencies, adds
>>> a
 small amount of complexity (this is my recommendation, but only by a
 little)
 
 * Start pushing images to docker hub - this means they'll be publicly
 visible and raises the bar for maintenance of those images
 
 * Host our own public container registry - this means running our own
 public service with costs, etc..
 
 Below are detailed discussions of these options. You can skip to the "My
 thoughts on this" section if you're not interested in the details.
 
 
 1. Templated kubernetes images
 
 =
 
 Kubernetes (k8) does not currently have built in support for
>>> parameterizing
 scripts - there's an issues open for this[1], but it doesn't seem to be
 very active.
 
 There are tools like Kubernetes helm that allow users to specify
>>> parameters
 when running their kubernetes scripts. They also enable a lot more
>>> (they're
 probably closer to a package manager like apt-get) - see this
 description[3] for an overview.
 
 I'm open to other options besides helm, but it seems to be the officially
 supported one.
 
 How the world would look using helm:
 
 * When developing an IO IT, someone (either the developer or one of us),
 would need to create a chart (the name for the helm script) - it's
 basically another set of config files but in theory is as simple 

Re: [PROPOSAL]: a new feature branch for SQL DSL

2017-04-05 Thread Ted Yu
Working in feature branch is good - you may want to periodically sync up
with master.

I noticed that you are using 1.11.0 of calcite.
1.12 is out, FYI

On Wed, Apr 5, 2017 at 2:05 PM, Mingmin Xu  wrote:

> Hi all,
>
> I'm working on https://issues.apache.org/jira/browse/BEAM-301(Add a Beam
> SQL DSL). The skeleton is already in
> https://github.com/XuMingmin/beam/tree/BEAM-301, using Java SDK in the
> back-end. The goal is to provide a SQL interface over Beam, based on
> Calcite, including:
> 1). a translator to create Beam pipeline from SQL,
> (SELECT/INSERT/FILTER/GROUP-BY/JOIN/...);
> 2). an interactive client to submit queries;  (All-SQL mode)
> 3). a SQL API which reduce the work to create a Pipeline; (Semi-SQL mode)
>
> As we see many folks are interested in this feature, would like to create a
> feature branch to have more involvement.
> Looking for comments and feedback.
>
> Thanks!
> 
> Mingmin
>


Re: [DISCUSSION] Consistent use of loggers

2017-04-03 Thread Ted Yu
+1

> On Apr 3, 2017, at 11:48 AM, Aviem Zur  wrote:
> 
> Upon further inspection there seems to be an issue we may have overlooked:
> In cluster mode, some of the runners will have dependencies added directly
> to the classpath by the cluster, and since SLF4J can only work with one
> binding, the first one in the classpath will be used.
> 
> So while what we suggested would work in local mode, the user's chosen
> binding and configuration might be ignored in cluster mode, which is
> detrimental to what we wanted to accomplish.
> 
> So I believe what we should do instead is:
> 
>   1. Add better documentation regarding logging in each runner, which
>   binding is used, perhaps examples of how to configure logging for that
>   runner.
>   2. Have direct runner use the most common binding among runners (this
>   appears to be log4j which is used by Spark runner, Flink runner and Apex
>   runner).
> 
> 
>> On Mon, Apr 3, 2017 at 7:02 PM Aljoscha Krettek  wrote:
>> 
>> Yes, I think we can exclude log4j from the Flink dependencies. It’s
>> somewhat annoying that they are there in the first place.
>> 
>> The Flink doc has this to say about the topic:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/logging.html
 On 3. Apr 2017, at 17:56, Aviem Zur  wrote:
 
 * java.util.logging could be a good choice for the Direct Runner
>>> Yes, this will be great for users (Instead of having no logging when
>> using
>>> direct runner).
>>> 
 * Logging backend could be runner-specific, particularly if it needs to
 integrate into some other experience
>>> Good point, let's take a look at the current state of runners:
>>> Direct runner - will use JUL as suggested.
>>> Dataflow runner - looks like there is already no binding (There is a
>>> binding in tests only).
>>> Spark runner - currently uses slf4j-log4j12. does not require any
>> specific
>>> logger, we can change this to no binding.
>>> Flink runner - uses slf4j-log4j12 transitively from Flink dependencies.
>> I'm
>>> assuming this is not a must and we can default to no binding here.
>>> @aljoscha please confirm.
>>> Apex runner - uses slf4j-log4j12 transitively from Apex dependencies. I'm
>>> assuming this is not a must and we can default to no binding here. @thw
>>> please confirm.
>>> 
>>> It might be a good idea to use a consistent binding in tests (Since we'll
>>> use JUL for direct runner, let this be JUL).
>>> 
>>> On Wed, Mar 29, 2017 at 7:23 PM Davor Bonaci  wrote:
>>> 
>>> +1 on consistency across Beam modules on the logging facade
>>> +1 on enforcing consistency
>>> +1 on clearly documenting how to do logging
>>> 
>>> Mixed feelings:
>>> * Logging backend could be runner-specific, particularly if it needs to
>>> integrate into some other experience
>>> * java.util.logging could be a good choice for the Direct Runner
>>> 
>>> On Tue, Mar 28, 2017 at 6:50 PM, Ahmet Altay 
>>> wrote:
>>> 
 On Wed, Mar 22, 2017 at 10:38 AM, Tibor Kiss 
 wrote:
 
> This is a great idea!
> 
> I believe Python-SDK's logging could also be enhanced (a bit
 differently):
> Currently we are not instantiating the logger, just using the class
>> what
> logging package provides.
> Shortcoming of this approach is that the user cannot set the log level
>>> on
> a per module basis as all log messages
> end up in the root level.
 
 +1 to this. Python SDK needs to expands its logging capabilities. Filed
>>> [1]
 for this.
 
 Ahmet
 
 [1] https://issues.apache.org/jira/browse/BEAM-1825
 
 
> 
> On 3/22/17, 5:46 AM, "Aviem Zur"  wrote:
> 
>   +1 to what JB said.
> 
>   Will just have to be documented well as if we provide no binding
 there
> will
>   be no logging out of the box unless the user adds a binding.
> 
>   On Wed, Mar 22, 2017 at 6:24 AM Jean-Baptiste Onofré <
 j...@nanthrax.net>
>   wrote:
> 
>> Hi Aviem,
>> 
>> Good point.
>> 
>> I think, in our dependencies set, we should just depend to
 slf4j-api
> and
>> let the
>> user provides the binding he wants (slf4j-log4j12, slf4j-simple,
> whatever).
>> 
>> We define a binding only with test scope in our modules.
>> 
>> Regards
>> JB
>> 
>>> On 03/22/2017 04:58 AM, Aviem Zur wrote:
>>> Hi all,
>>> 
>>> There have been a few reports lately (On JIRA [1] and on Slack)
> from
>> users
>>> regarding inconsistent loggers used across Beam's modules.
>>> 
>>> While we use SLF4J, different modules use a different logger
> behind it
>>> (JUL, log4j, etc)
>>> So when people add a log4j.properties file to their classpath
>>> for
>> instance,
>>> they expect this to affect all of their dependencies on Beam

Re: [PROPOSAL] ORC support

2017-04-01 Thread Ted Yu
+1

> On Apr 1, 2017, at 8:31 AM, Tibor Kiss  wrote:
> 
> Hello,
> 
> Recently the Optimized Row Columnar (ORC) file format was spin off from Hive
> and became a top level Apache Project: https://orc.apache.org/
> 
> It is similar to Parquet in a sense that it uses column major format but
> ORC has
> a more elaborate type system and stores basic statistics about each row.
> 
> I'd be interested extending Beam with ORC support if others find it helpful
> too.
> 
> What do you think?
> 
> - Tibor


Re: Beam spark 2.x runner status

2017-03-29 Thread Ted Yu
This is what I did over HBASE-16179:

-f.call((asJavaIterator(it), conn)).iterator()
+// the return type is different in spark 1.x & 2.x, we handle both
cases
+f.call(asJavaIterator(it), conn) match {
+  // spark 1.x
+  case iterable: Iterable[R] => iterable.iterator()
+  // spark 2.x
+  case iterator: Iterator[R] => iterator
+}
   )

FYI

On Wed, Mar 29, 2017 at 1:47 AM, Amit Sela <amitsel...@gmail.com> wrote:

> Just tried to replace dependencies and see what happens:
>
> Most required changes are about the runner using deprecated Spark APIs, and
> after fixing them the only real issue is with the Java API for
> Pair/FlatMapFunction that changed return value to Iterator (in 1.6 its
> Iterable).
>
> So I'm not sure that a profile that simply sets dependency on 1.6.3/2.1.0
> is feasible.
>
> On Thu, Mar 23, 2017 at 10:22 AM Kobi Salant <kobi.sal...@gmail.com>
> wrote:
>
> > So, if everything is in place in Spark 2.X and we use provided
> dependencies
> > for Spark in Beam.
> > Theoretically, you can run the same code in 2.X without any need for a
> > branch?
> >
> > 2017-03-23 9:47 GMT+02:00 Amit Sela <amitsel...@gmail.com>:
> >
> > > If StreamingContext is valid and we don't have to use SparkSession, and
> > > Accumulators are valid as well and we don't need AccumulatorsV2, I
> don't
> > > see a reason this shouldn't work (which means there are still tons of
> > > reasons this could break, but I can't think of them off the top of my
> > head
> > > right now).
> > >
> > > @JB simply add a profile for the Spark dependencies and run the tests -
> > > you'll have a very definitive answer ;-) .
> > > If this passes, try on a cluster running Spark 2 as well.
> > >
> > > Let me know of I can assist.
> > >
> > > On Thu, Mar 23, 2017 at 6:55 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> > > wrote:
> > >
> > > > Hi guys,
> > > >
> > > > Ismaël summarize well what I have in mind.
> > > >
> > > > I'm a bit late on the PoC around that (I started a branch already).
> > > > I will move forward over the week end.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > On 03/22/2017 11:42 PM, Ismaël Mejía wrote:
> > > > > Amit, I suppose JB is talking about the RDD based version, so no
> need
> > > > > to worry about SparkSession or different incompatible APIs.
> > > > >
> > > > > Remember the idea we are discussing is to have in master both the
> > > > > spark 1 and spark 2 runners using the RDD based translation. At the
> > > > > same time we can have a feature branch to evolve the DataSet based
> > > > > translator (this one will replace the RDD based translator for
> spark
> > 2
> > > > > once it is mature).
> > > > >
> > > > > The advantages have been already discussed as well as the possible
> > > > > issues so I think we have to see now if JB's idea is feasible and
> how
> > > > > hard would be to live with this while the DataSet version evolves.
> > > > >
> > > > > I think what we are trying to avoid is to have a long living branch
> > > > > for a spark 2 runner based on RDD  because the maintenance burden
> > > > > would be even worse. We would have to fight not only with the
> double
> > > > > merge of fixes (in case the profile idea does not work), but also
> > with
> > > > > the continue evolution of Beam and we would end up in the long
> living
> > > > > branch mess that others runners have dealt with (e.g. the Apex
> > runner)
> > > > >
> > > > >
> > > > https://lists.apache.org/thread.html/12cc086f5ffe331cc70b89322ce541
> > > 6c3112b87efc3393e3e16032a2@%3Cdev.beam.apache.org%3E
> > > > >
> > > > > What do you think about this Amit ? Would you be ok to go with it
> if
> > > > > JB's profile idea proves to help with the msintenance issues ?
> > > > >
> > > > > Ismaël
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 22, 2017 at 5:53 PM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > > >> hbase-spark module doesn't use SparkSession. So situation there is
> > > > simpler
> > > > >> :-)
> > > > >>
> > > > >> On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <amitsel...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >>> I'm still wondering how we'll do this - it's not just different
> > > > >>> implementations of the same Class, but a completely different
> > > concepts
> > > > such
> > > > >>> as using SparkSession in Spark 2 instead of
> > > > SparkContext/StreamingContext
> > > > >>> in Spark 1.
> > > > >>>
> > > > >>> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com>
> > wrote:
> > > > >>>
> > > > >>>> I have done some work over in HBASE-16179 where compatibility
> > > modules
> > > > are
> > > > >>>> created to isolate changes in Spark 2.x API so that code in
> > > > hbase-spark
> > > > >>>> module can be reused.
> > > > >>>>
> > > > >>>> FYI
> > > > >>>>
> > > > >>>
> > > >
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>


Re: Beam spark 2.x runner status

2017-03-22 Thread Ted Yu
hbase-spark module doesn't use SparkSession. So situation there is simpler
:-)

On Wed, Mar 22, 2017 at 5:35 AM, Amit Sela <amitsel...@gmail.com> wrote:

> I'm still wondering how we'll do this - it's not just different
> implementations of the same Class, but a completely different concepts such
> as using SparkSession in Spark 2 instead of SparkContext/StreamingContext
> in Spark 1.
>
> On Tue, Mar 21, 2017 at 7:25 PM Ted Yu <yuzhih...@gmail.com> wrote:
>
> > I have done some work over in HBASE-16179 where compatibility modules are
> > created to isolate changes in Spark 2.x API so that code in hbase-spark
> > module can be reused.
> >
> > FYI
> >
>


Re: Beam spark 2.x runner status

2017-03-21 Thread Ted Yu
I have done some work over in HBASE-16179 where compatibility modules are
created to isolate changes in Spark 2.x API so that code in hbase-spark
module can be reused.

FYI


Re: why Source#validate() is not declared to throw any exception

2017-03-21 Thread Ted Yu
Looks like JIRA notification is temporarily not working.

I have logged BEAM-1773

FYI

On Mon, Mar 20, 2017 at 11:26 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:

> I think it would make sense to allow the validate method to throw
> Exception.
>
> On Mon, Mar 20, 2017, 11:21 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Hi Ted,
> >
> > validate() is supposed to throw runtime exception (IllegalStateException,
> > RuntimeException, ...) to "traverse" the executor.
> >
> > Regards
> > JB
> >
> > On 03/21/2017 01:56 AM, Ted Yu wrote:
> > > Hi,
> > > I was reading HDFSFileSource.java where:
> > >
> > >   @Override
> > >   public void validate() {
> > > ...
> > >   } catch (IOException | InterruptedException e) {
> > > throw new RuntimeException(e);
> > >   }
> > >
> > > Why is validate() not declared to throw any exception ?
> > > If validation doesn't pass, there is nothing to clean up ?
> > >
> > > Thanks
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


why Source#validate() is not declared to throw any exception

2017-03-20 Thread Ted Yu
Hi,
I was reading HDFSFileSource.java where:

  @Override
  public void validate() {
...
  } catch (IOException | InterruptedException e) {
throw new RuntimeException(e);
  }

Why is validate() not declared to throw any exception ?
If validation doesn't pass, there is nothing to clean up ?

Thanks


Re: Build failed in Jenkins: beam_PostCommit_Java_RunnableOnService_Spark #1282

2017-03-19 Thread Ted Yu
Looks like Jenkins just came back (to life).

Hopefully the next precommit passes.

On Sun, Mar 19, 2017 at 11:39 AM, Amit Sela  wrote:

> I think its a Jenkins issue. Jenkins is shutting down. I'll follow and
> relate if this keeps happening.
>
> -- Forwarded message -
> From: Apache Jenkins Server 
> Date: Sun, Mar 19, 2017 at 8:21 PM
> Subject: Build failed in Jenkins:
> beam_PostCommit_Java_RunnableOnService_Spark #1282
> To: 
>
>
> See <
> https://builds.apache.org/job/beam_PostCommit_Java_
> RunnableOnService_Spark/1282/display/redirect
> >
>
> --
> [...truncated 1.69 KB...]
> maven33-agent.jar already up to date
> maven33-interceptor.jar already up to date
> maven3-interceptor-commons.jar already up to date
> [beam_PostCommit_Java_RunnableOnService_Spark] $
> /home/jenkins/tools/java/latest1.8/bin/java
> -Dorg.slf4j.simpleLogger.showDateTime=true
> -Dorg.slf4j.simpleLogger.dateTimeFormat=-MM-dd'T'HH:mm:ss.SSS
> -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -cp
> /home/jenkins/jenkins-slave/maven33-agent.jar:/home/
> jenkins/tools/maven/apache-maven-3.3.3/boot/plexus-
> classworlds-2.5.2.jar:/home/jenkins/tools/maven/apache-
> maven-3.3.3/conf/logging
> jenkins.maven3.agent.Maven33Main
> /home/jenkins/tools/maven/apache-maven-3.3.3
> /home/jenkins/jenkins-slave/slave.jar
> /home/jenkins/jenkins-slave/maven33-interceptor.jar
> /home/jenkins/jenkins-slave/maven3-interceptor-commons.jar 51594
> <===[JENKINS REMOTING CAPACITY]===>   channel started
> Executing Maven:  -B -f <
> https://builds.apache.org/job/beam_PostCommit_Java_
> RunnableOnService_Spark/ws/pom.xml>
> -Dmaven.repo.local=<
> https://builds.apache.org/job/beam_PostCommit_Java_
> RunnableOnService_Spark/ws/.repository>
> -B -e clean verify -am -pl runners/spark -Prunnable-on-service-tests
> -Plocal-runnable-on-service-tests -Dspark.ui.enabled=false
> 2017-03-19T18:19:06.924 [INFO] Error stacktraces are turned on.
> 2017-03-19T18:19:07.010 [INFO] Scanning for projects...
> 2017-03-19T18:19:07.740 [INFO] Downloading:
> https://repo.maven.apache.org/maven2/kr/motd/maven/os-maven-
> plugin/1.4.0.Final/os-maven-plugin-1.4.0.Final.pom
> 2017-03-19T18:19:08.170
>  maven-plugin/1.4.0.Final/os-maven-plugin-1.4.0.Final.
> pom2017-03-19T18:19:08.170>
> [INFO] Downloaded:
> https://repo.maven.apache.org/maven2/kr/motd/maven/os-maven-
> plugin/1.4.0.Final/os-maven-plugin-1.4.0.Final.pom
> (7 KB at 14.0 KB/sec)
> 2017-03-19T18:19:08.183 [INFO] Downloading:
> https://repo.maven.apache.org/maven2/org/sonatype/oss/oss-
> parent/9/oss-parent-9.pom
> 2017-03-19T18:19:08.224
>  oss-parent/9/oss-parent-9.pom2017-03-19T18:19:08.224>
> [INFO] Downloaded:
> https://repo.maven.apache.org/maven2/org/sonatype/oss/oss-
> parent/9/oss-parent-9.pom
> (7 KB at 160.4 KB/sec)
> 2017-03-19T18:19:08.235 [INFO] Downloading:
> https://repo.maven.apache.org/maven2/org/apache/maven/maven-
> plugin-api/3.2.1/maven-plugin-api-3.2.1.pom
> 2017-03-19T18:19:08.270
>  maven-plugin-api/3.2.1/maven-plugin-api-3.2.1.pom2017-03-19T18:19:08.270>
> [INFO] Downloaded:
> https://repo.maven.apache.org/maven2/org/apache/maven/maven-
> plugin-api/3.2.1/maven-plugin-api-3.2.1.pom
> (4 KB at 91.7 KB/sec)
> 2017-03-19T18:19:08.273 [INFO] Downloading:
> https://repo.maven.apache.org/maven2/org/apache/maven/maven/
> 3.2.1/maven-3.2.1.pom
> 2017-03-19T18:19:08.330
>  maven/3.2.1/maven-3.2.1.pom2017-03-19T18:19:08.330>
> [INFO] Downloaded:
> https://repo.maven.apache.org/maven2/org/apache/maven/maven/
> 3.2.1/maven-3.2.1.pom
> (23 KB at 387.0 KB/sec)
> 2017-03-19T18:19:08.334 [INFO] Downloading:
> https://repo.maven.apache.org/maven2/org/apache/maven/maven-
> parent/23/maven-parent-23.pom
> 2017-03-19T18:19:08.377
>  maven-parent/23/maven-parent-23.pom2017-03-19T18:19:08.377>
> [INFO] Downloaded:
> https://repo.maven.apache.org/maven2/org/apache/maven/maven-
> parent/23/maven-parent-23.pom
> (32 KB at 740.1 KB/sec)
> 2017-03-19T18:19:08.382 [INFO] Downloading:
> https://repo.maven.apache.org/maven2/org/apache/apache/13/apache-13.pom
> 2017-03-19T18:19:08.415
>  13/apache-13.pom2017-03-19T18:19:08.415>
> [INFO] Downloaded:
> https://repo.maven.apache.org/maven2/org/apache/apache/13/apache-13.pom
> (14
> KB at 413.5 KB/sec)
> 2017-03-19T18:19:08.423 [INFO] Downloading:
> https://repo.maven.apache.org/maven2/org/apache/maven/maven-
> model/3.2.1/maven-model-3.2.1.pom
> 2017-03-19T18:19:08.454
>  maven-model/3.2.1/maven-model-3.2.1.pom2017-03-19T18:19:08.454>
> [INFO] Downloaded:

Re: [ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-17 Thread Ted Yu
Congratulations!

On Fri, Mar 17, 2017 at 2:13 PM, Davor Bonaci  wrote:

> Please join me and the rest of Beam PMC in welcoming the following
> contributors as our newest committers. They have significantly contributed
> to the project in different ways, and we look forward to many more
> contributions in the future.
>
> * Chamikara Jayalath
> Chamikara has been contributing to Beam since inception, and previously to
> Google Cloud Dataflow, accumulating a total of 51 commits (8,301 ++ / 3,892
> --) since February 2016 [1]. He contributed broadly to the project, but
> most significantly to the Python SDK, building the IO framework in this SDK
> [2], [3].
>
> * Eugene Kirpichov
> Eugene has been contributing to Beam since inception, and previously to
> Google Cloud Dataflow, accumulating a total of 95 commits (22,122 ++ /
> 18,407 --) since February 2016 [1]. In recent months, he’s been driving the
> Splittable DoFn effort [4]. A true expert on IO subsystem, Eugene has
> reviewed nearly every IO contributed to Beam. Finally, Eugene contributed
> the Beam Style Guide, and is championing it across the project.
>
> * Ismaël Mejia
> Ismaël has been contributing to Beam since mid-2016, accumulating a total
> of 35 commits (3,137 ++ / 1,328 --) [1]. He authored the HBaseIO connector,
> helped on the Spark runner, and contributed in other areas as well,
> including cross-project collaboration with Apache Zeppelin. Ismaël reported
> 24 Jira issues.
>
> * Aviem Zur
> Aviem has been contributing to Beam since early fall, accumulating a total
> of 49 commits (6,471 ++ / 3,185 --) [1]. He reported 43 Jira issues, and
> resolved ~30 issues. Aviem improved the stability of the Spark runner a
> lot, and introduced support for metrics. Finally, Aviem is championing
> dependency management across the project.
>
> Congratulations to all four! Welcome!
>
> Davor
>
> [1]
> https://github.com/apache/beam/graphs/contributors?from=
> 2016-02-01=2017-03-17=c
> [2]
> https://github.com/apache/beam/blob/v0.6.0/sdks/python/
> apache_beam/io/iobase.py#L70
> [3]
> https://github.com/apache/beam/blob/v0.6.0/sdks/python/
> apache_beam/io/iobase.py#L561
> [4] https://s.apache.org/splittable-do-fn
>


Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-13 Thread Ted Yu
bq.  I would prefer that we have a .tar.gz release

+1

On Mon, Mar 13, 2017 at 4:21 PM, Ismaël Mejía  wrote:

> ​+1 (non-binding)
>
> - verified signatures + checksums
> - run mvn clean install -Prelease, all artifacts build and the tests run
> smoothly (modulo some local issues I had with the installation of tox for
> the python sdk, I created a PR to fix those in case other people can have
> the same trouble).
>
> Some remarks still to fix from the release, but that I don’t consider
> blockers:
>
> 1. The section Getting Started in the main README.md needs to be updated
> with the information about the creating/activating the virtualenv. At this
> moment just running mvn clean install won’t work without this.
>
> 2.  Both zip files in the current release produce a folder with the same
> name ‘apache-beam-0.6.0’. This can be messy if users unzip both files into
> the same folder (as happened to me, the compressed files should produce a
> directory with the exact same name that the file, so
> apache-beam-0.6.0-python.zip will produce apache-beam-0.6.0-python and the
> other its respective directory.
>
> 3. The name of the files of the release probably should be different:
>
> The source release could be just apache-beam-0.6.0.zip instead of
> apache-beam-0.6.0-source-release.zip considering that we don’t have binary
> artifacts, or just apache-beam-0.6.0-src.zip following the convention of
> other apache projects.
>
> The python release also could be renamed from
> apache-beam-0.6.0-bin-python.zip instead of apache-beam-0.6.0-python.zip
> so
> users understand that these are executable files (but well I am not sure
> about that one considering that python is a scripting language).
>
> Finally I would prefer that we have a .tar.gz release as JB mentioned in
> the previous vote, and as most apache projects do. In any case if the zip
> is somehow a requirement it would be nice to have both a .zip and a .tar.gz
> file.
>


Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-12 Thread Ted Yu
I was able to run "mvn clean install -Prelease" command successfully, too.

On Sun, Mar 12, 2017 at 12:02 AM, Ahmet Altay 
wrote:

> Amit,
>
> I was able to successfully build in a clean environment with the following
> commands:
>
> git checkout tags/v0.6.0-RC2 -b RC2
> mvn clean install -Prelease
>
> I am not a very familiar with maven build process, it would be great if
> someone else can also confirm this.
>
> Ahmet
>
>
>
> On Sat, Mar 11, 2017 at 11:00 PM, Amit Sela  wrote:
>
> > Building the RC2 tag failed for me with: "mvn clean install -Prelease"
> on a
> > missing artifact "beam-sdks-java-harness" when trying to build
> > "beam-sdks-java-javadoc".
> >
> > I want to make sure It's not something local that happens in my env. so
> if
> > anyone else could validate this it would be great.
> >
> > Amit
> >
> > On Sat, Mar 11, 2017 at 9:48 PM Robert Bradshaw
> > 
> > wrote:
> >
> > > On Fri, Mar 10, 2017 at 9:05 PM, Ahmet Altay  >
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #2 for the version
> > 0.6.0,
> > > > as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > > * JIRA release notes [1],
> > > > * the official Apache source release to be deployed to
> dist.apache.org
> > > > [2],
> > > > which is signed with the key with fingerprint 6096FA00 [3],
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > * source code tag "v0.6.0-RC2" [5],
> > > > * website pull request listing the release and publishing the API
> > > reference
> > > > manual [6].
> > > > * python artifacts are deployed along with the source release to to
> > > > dist.apache.org [2].
> > > >
> > >
> > > Are there plans also to deploy this at PyPi, and if so, what are the
> > > details?
> > >
> > >
> > > > A suite of Jenkins jobs:
> > > > * PreCommit_Java_MavenInstall [7],
> > > > * PostCommit_Java_MavenInstall [8],
> > > > * PostCommit_Java_RunnableOnService_Apex [9],
> > > > * PostCommit_Java_RunnableOnService_Flink [10],
> > > > * PostCommit_Java_RunnableOnService_Spark [11],
> > > > * PostCommit_Java_RunnableOnService_Dataflow [12]
> > > > * PostCommit_Python_Verify [13]
> > > >
> > > > Compared to release candidate #1, this candidate contains pull
> requests
> > > > #2217 [14], #2221 [15], # [16], #2224 [17], and #2225 [18]; see
> the
> > > > discussion for reasoning.
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Ahmet
> > > >
> > > > [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> > > > ctId=12319527=12339256
> > > > [2] https://dist.apache.org/repos/dist/dev/beam/0.6.0/
> > > > [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
> > > > [4]
> > > https://repository.apache.org/content/repositories/orgapachebeam-1013/
> > > > [5] https://git-wip-us.apache.org/repos/asf?p=beam.git;a=tag;h=r
> > > > efs/tags/v0.6.0-RC2
> > > > [6] https://github.com/apache/beam-site/pull/175
> > > > [7] https://builds.apache.org/view/Beam/job/beam_PreCommit_Java_
> > > > MavenInstall/8340/
> > > > [8] https://builds.apache.org/view/Beam/job/beam_PostCommit_
> > > > Java_MavenInstall/2877/
> > > > [9] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
> > > > _RunnableOnService_Apex/736/
> > > > [10] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
> > > > _RunnableOnService_Flink/1895/
> > > > [11] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
> > > > _RunnableOnService_Spark/1207/
> > > > [12] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_
> > > > RunnableOnService_Dataflow/2526/
> > > > [13] https://builds.apache.org/view/Beam/job/beam_PostCommit_Pyth
> > > > on_Verify/1481/
> > > > [14] https://github.com/apache/beam/pull/2217
> > > > [15] https://github.com/apache/beam/pull/2221
> > > > [16] https://github.com/apache/beam/pull/
> > > > [17] https://github.com/apache/beam/pull/2224
> > > > [18] https://github.com/apache/beam/pull/2225
> > > >
> > >
> >
>


Re: [VOTE] Release 0.6.0, release candidate #2

2017-03-11 Thread Ted Yu
+1

Checked signature.

Ran test suite which passed.

On Fri, Mar 10, 2017 at 9:05 PM, Ahmet Altay 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #2 for the version 0.6.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which is signed with the key with fingerprint 6096FA00 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v0.6.0-RC2" [5],
> * website pull request listing the release and publishing the API reference
> manual [6].
> * python artifacts are deployed along with the source release to to
> dist.apache.org [2].
>
> A suite of Jenkins jobs:
> * PreCommit_Java_MavenInstall [7],
> * PostCommit_Java_MavenInstall [8],
> * PostCommit_Java_RunnableOnService_Apex [9],
> * PostCommit_Java_RunnableOnService_Flink [10],
> * PostCommit_Java_RunnableOnService_Spark [11],
> * PostCommit_Java_RunnableOnService_Dataflow [12]
> * PostCommit_Python_Verify [13]
>
> Compared to release candidate #1, this candidate contains pull requests
> #2217 [14], #2221 [15], # [16], #2224 [17], and #2225 [18]; see the
> discussion for reasoning.
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Ahmet
>
> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> ctId=12319527=12339256
> [2] https://dist.apache.org/repos/dist/dev/beam/0.6.0/
> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1013/
> [5] https://git-wip-us.apache.org/repos/asf?p=beam.git;a=tag;h=r
> efs/tags/v0.6.0-RC2
> [6] https://github.com/apache/beam-site/pull/175
> [7] https://builds.apache.org/view/Beam/job/beam_PreCommit_Java_
> MavenInstall/8340/
> [8] https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_MavenInstall/2877/
> [9] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
> _RunnableOnService_Apex/736/
> [10] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
> _RunnableOnService_Flink/1895/
> [11] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java
> _RunnableOnService_Spark/1207/
> [12] https://builds.apache.org/view/Beam/job/beam_PostCommit_Java_
> RunnableOnService_Dataflow/2526/
> [13] https://builds.apache.org/view/Beam/job/beam_PostCommit_Pyth
> on_Verify/1481/
> [14] https://github.com/apache/beam/pull/2217
> [15] https://github.com/apache/beam/pull/2221
> [16] https://github.com/apache/beam/pull/
> [17] https://github.com/apache/beam/pull/2224
> [18] https://github.com/apache/beam/pull/2225
>


Re: [VOTE] Release 0.6.0, release candidate #1

2017-03-09 Thread Ted Yu
bq. ran into a known issue [14]

Currently BEAM-1674 is marked blocker. Would it be pushed to next release ?

Cheers

On Thu, Mar 9, 2017 at 4:07 PM, Ahmet Altay 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 0.6.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which is signed with the key with fingerprint 6096FA00 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v0.6.0-RC1" [5],
> * website pull request listing the release and publishing the API reference
> manual [6].
> * python artifacts are deployed along with the source release to to
> dist.apache.org [2].
>
> A suite of Jenkins jobs:
> * PreCommit_Java_MavenInstall [7],
> * PostCommit_Java_MavenInstall [8],
> * PostCommit_Java_RunnableOnService_Apex [9],
> * PostCommit_Java_RunnableOnService_Flink [10], -> ran into a known issue
> [14]
> * PostCommit_Java_RunnableOnService_Spark [11],
> * PostCommit_Java_RunnableOnService_Dataflow [12] -> timed out at 100
> minutes, the logs are good up to that point [15] is for increasing this
> timeout.
> * PostCommit_Python_Verify [13]
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Ahmet
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12319527=12339256
> [2] https://dist.apache.org/repos/dist/dev/beam/0.6.0/
> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1012/
> [5]
> https://git-wip-us.apache.org/repos/asf?p=beam.git;a=tag;h=
> refs/tags/v0.6.0-RC1
> [6] https://github.com/apache/beam-site/pull/175
> [7]
> https://builds.apache.org/view/Beam/job/beam_PreCommit_
> Java_MavenInstall/8281/
> [8]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_MavenInstall/2858/
> [9]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_RunnableOnService_Apex/717/
> [10]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_RunnableOnService_Flink/1874/
> [11]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_RunnableOnService_Spark/1184/
> [12]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_RunnableOnService_Dataflow/2511/
> [13]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Python_Verify/1466/
> [14] https://issues.apache.org/jira/browse/BEAM-1674
> [15] https://github.com/apache/beam/pull/2197
>


Re: [VOTE] Release 0.6.0, release candidate #1

2017-03-09 Thread Ted Yu
+1

Checked signature

Ran test suite - all passed.

On Thu, Mar 9, 2017 at 4:07 PM, Ahmet Altay 
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 0.6.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which is signed with the key with fingerprint 6096FA00 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v0.6.0-RC1" [5],
> * website pull request listing the release and publishing the API reference
> manual [6].
> * python artifacts are deployed along with the source release to to
> dist.apache.org [2].
>
> A suite of Jenkins jobs:
> * PreCommit_Java_MavenInstall [7],
> * PostCommit_Java_MavenInstall [8],
> * PostCommit_Java_RunnableOnService_Apex [9],
> * PostCommit_Java_RunnableOnService_Flink [10], -> ran into a known issue
> [14]
> * PostCommit_Java_RunnableOnService_Spark [11],
> * PostCommit_Java_RunnableOnService_Dataflow [12] -> timed out at 100
> minutes, the logs are good up to that point [15] is for increasing this
> timeout.
> * PostCommit_Python_Verify [13]
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Ahmet
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12319527=12339256
> [2] https://dist.apache.org/repos/dist/dev/beam/0.6.0/
> [3] https://dist.apache.org/repos/dist/dev/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1012/
> [5]
> https://git-wip-us.apache.org/repos/asf?p=beam.git;a=tag;h=
> refs/tags/v0.6.0-RC1
> [6] https://github.com/apache/beam-site/pull/175
> [7]
> https://builds.apache.org/view/Beam/job/beam_PreCommit_
> Java_MavenInstall/8281/
> [8]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_MavenInstall/2858/
> [9]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_RunnableOnService_Apex/717/
> [10]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_RunnableOnService_Flink/1874/
> [11]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_RunnableOnService_Spark/1184/
> [12]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Java_RunnableOnService_Dataflow/2511/
> [13]
> https://builds.apache.org/view/Beam/job/beam_PostCommit_
> Python_Verify/1466/
> [14] https://issues.apache.org/jira/browse/BEAM-1674
> [15] https://github.com/apache/beam/pull/2197
>


Re: First stable release: version designation?

2017-03-06 Thread Ted Yu
If we end up with version 2.0, more effort (trying out more use scenarios
e.g.) should go into release process to make sure what is released is
indeed stable.

Normally people would have higher expectation on 2.0 release compared to
1.0 release.

On Mon, Mar 6, 2017 at 6:34 PM, Davor Bonaci  wrote:

> It sounds like we'll end up with two camps on this topic. This issue is
> probably best resolved with a vote, but I'll try to rephrase the question
> once to see whether a consensus is possible.
>
> Instead of asking which option is better, does anyone think the project
> would be negatively impacted if we were to decide on, in your opinion, the
> less desirable variant? If so, can you comment on the negative impact of
> the less desirable alternative please?
>
> (I understand this may be pushing it a bit, but I think a possible
> consensus on this is worth it. Personally, I'll stay away from weighing in
> on this topic.)
>
> On Thu, Mar 2, 2017 at 2:57 AM, Aljoscha Krettek 
> wrote:
>
> > I prefer 2.0.0 for the first stable release. It totally makes sense for
> > people coming from Dataflow 1.x and I can already envision the confusion
> > between Beam 1.5 and Dataflow 1.5.
> >
> > On Thu, 2 Mar 2017 at 07:42 Jean-Baptiste Onofré 
> wrote:
> >
> > > Hi Davor,
> > >
> > >
> > > For a Beam community perspective, 1.0.0 would make more sense. We have
> a
> > > fair number of people starting with Beam (without knowing Dataflow).
> > >
> > > However, as Dataflow SDK (origins of Beam) was in 1.0.0, in order to
> > > avoid confusion with users coming to Beam from Dataflow, 2.0.0 could
> > help.
> > >
> > > I have a preference to 1.0.0 anyway, but I would understand starting
> > > from 2.0.0.
> > >
> > > Regards
> > > JB
> > >
> > > On 03/01/2017 07:56 PM, Davor Bonaci wrote:
> > > > The first stable release is our next major project-wide goal; see
> > > > discussion in [1]. I've been referring to it as "the first stable
> > > release"
> > > > for a long time, not "1.0.0" or "2.0.0" or "2017" or something else,
> to
> > > > make sure we have an unbiased discussion and a consensus-based
> decision
> > > on
> > > > this matter.
> > > >
> > > > I think that now is the time to consider the appropriate designation
> > for
> > > > our first stable release, and formally make a decision on it. A
> > > reasonable
> > > > choices could be "1.0.0" or "2.0.0", perhaps there are others.
> > > >
> > > > 1.0.0:
> > > > * It logically comes after the current series, 0.x.y.
> > > > * Most people would expect it, I suppose.
> > > > * A possible confusion between Dataflow SDKs and Beam SDKs carrying
> the
> > > > same number.
> > > >
> > > > 2.0.0:
> > > > * Follows the pattern some other projects have taken -- continuing
> > their
> > > > version numbering scheme from their previous origin.
> > > > * Better communicates project's roots, and degree of maturity.
> > > > * May be unexpected to some users.
> > > >
> > > > I'd invite everyone to share their thoughts and preferences -- names
> > are
> > > > important and well correlated with success. Thanks!
> > > >
> > > > Davor
> > > >
> > > > [1] https://lists.apache.org/thread.html/
> > c35067071aec9029d9100ae973c629
> > > > 9aa919c31d0de623ac367128e2@%3Cdev.beam.apache.org%3E
> > > >
> > >
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>


Re: First stable release: version designation?

2017-03-01 Thread Ted Yu
The following explanation for adopting 2.0 version should be put in release
notes for the stable release.

Cheers

On Wed, Mar 1, 2017 at 2:03 PM, Dan Halperin <dhalp...@google.com.invalid>
wrote:

> A large set of Beam users will be coming from the pre-Apache technologies
> (aka Google Cloud Dataflow, Scio). Because Dataflow was 1.0 before Beam
> started, there is a lot of pre-existing documentation, Stack Overflow, etc.
> that refers to version 1.0 to mean what is now a year-and-a-half old
> release.
>
> I think starting Beam from "2.0.0" will be best for that set of users and
> frankly also new ones -- this will make it unambiguous whether referring to
> pre-Beam or Beam releases.
>
> I understand the 1.0 motivation -- it's cleaner in isolation -- but I think
> it would lead to long-term confusion in the user community.
>
> On Wed, Mar 1, 2017 at 1:11 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > +1 to what Jesse and Amit said.
> >
> > On Wed, Mar 1, 2017 at 12:32 PM, Amit Sela <amitsel...@gmail.com> wrote:
> >
> > > I think 1.0.0 for a couple of reasons:
> > >
> > > * It makes sense coming after 0.X (+1 Jesse).
> > > * It is the FIRST stable release as a project, regardless of its roots.
> > > * while the SDK is definitely a 2.0.0, Beam is not made only of the
> SDK,
> > > and I hope we'll have more milage with users running all sorts of
> runners
> > > in production before our 2.0.0 release.
> > >
> > > Amit.
> > >
> > > On Wed, Mar 1, 2017 at 10:25 PM Jesse Anderson <je...@smokinghand.com>
> > > wrote:
> > >
> > > I think 1.0 makes the most sense.
> > >
> > > On Wed, Mar 1, 2017, 10:57 AM Davor Bonaci <da...@apache.org> wrote:
> > >
> > > > The first stable release is our next major project-wide goal; see
> > > > discussion in [1]. I've been referring to it as "the first stable
> > > release"
> > > > for a long time, not "1.0.0" or "2.0.0" or "2017" or something else,
> to
> > > > make sure we have an unbiased discussion and a consensus-based
> decision
> > > on
> > > > this matter.
> > > >
> > > > I think that now is the time to consider the appropriate designation
> > for
> > > > our first stable release, and formally make a decision on it. A
> > > reasonable
> > > > choices could be "1.0.0" or "2.0.0", perhaps there are others.
> > > >
> > > > 1.0.0:
> > > > * It logically comes after the current series, 0.x.y.
> > > > * Most people would expect it, I suppose.
> > > > * A possible confusion between Dataflow SDKs and Beam SDKs carrying
> the
> > > > same number.
> > > >
> > > > 2.0.0:
> > > > * Follows the pattern some other projects have taken -- continuing
> > their
> > > > version numbering scheme from their previous origin.
> > > > * Better communicates project's roots, and degree of maturity.
> > > > * May be unexpected to some users.
> > > >
> > > > I'd invite everyone to share their thoughts and preferences -- names
> > are
> > > > important and well correlated with success. Thanks!
> > > >
> > > > Davor
> > > >
> > > > [1] https://lists.apache.org/thread.html/c35067071aec9029d9100ae
> > 973c629
> > > > 9aa919c31d0de623ac367128e2@%3Cdev.beam.apache.org%3E
> > > >
> > >
> >
>