Re: Apex runner status and next steps

2016-10-26 Thread Jean-Baptiste Onofré
+1

Good idea and fully agree about the three points.

Regards
JB

⁣​

On Oct 26, 2016, 19:24, at 19:24, Thomas Weise  wrote:
>Hi,
>
>The Apex runner is currently in a feature branch:
>
>https://github.com/apache/incubator-beam/tree/apex-runner
>
>Focus till here has been on functional completeness. It passes all the
>integration tests.
>
>Apex with its stateful stream processing architecture can support all
>of
>the concepts in the Beam model (event time, triggers, watermarks etc.).
>Most of these are already supported through the Beam SDK. The glue code
>that had to be written isn't that much, which speaks to the conceptual
>alignment in general.
>
>The runner in its current form does not leverage all the performance
>and
>scalability that Apex can deliver. We expect to address this with
>future
>contributions, leveraging things like incremental checkpointing,
>partitioning and operator affinity from Apex.
>
>From a code perspective, the runner should be close to what is needed
>for a
>merge to master (based on the contribution guidelines). The following
>items
>have been identified as prerequisite:
>
>* Add a README.md to the runner directory that summarizes its current
>state
>* Update the https://beam.apache.org/learn/runners/capability-matrix/
>to
>include the Apex info
>* Create the page under learn/runners (at least the place holder)
>
>It should also be noted that the integration tests currently take quite
>long to run with embedded Apex (~50 minutes). Some of that has to do
>with
>how completion of the tests is determined and there are ideas to
>improve it.
>
>I have created some JIRAs from my TODO list of follow-up work for more
>contributors to get involved:
>
>https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20component%20%3D%20runner-apex
>
>Some folks on the Apex dev list have expressed interest to take up some
>of
>this work. And thanks to Ismaël Mejía for BEAM-815
> !
>
>I'm looking forward to your comments and suggestions.
>
>Thanks,
>Thomas


Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Jean-Baptiste Onofré
A -1 vote doesn't necessarily mean a veto. For instance it's not really 
possible to veto a release vote.

Anyway, we call it vote or discussion, but I think a formal summary of the 
different proposed approaches is a good thing.

My $0.01 ;)

Regards
JB

⁣​

On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci  wrote:
>In terms of reaching a decision on any code or design changes,
>including
>this one, I'd suggest going without formal votes. Voting process for
>code
>modifications between choices A and B doesn't necessarily end with a
>decision A or B -- a single (qualified) -1 vote is a veto and cannot be
>overridden [1]. Said differently, the guideline is that code changes
>should
>be made by consensus; not by one group outvoting another. I'd like to
>avoid
>setting such precedent; we should try to drive consensus, as opposed to
>attempting to outvote another part of the community.
>
>In this particular case, we have had a great discussion. Many
>contributors
>brought different perspectives. Consequently, some opinions have been
>likely changed. At this point, someone should summarize the arguments,
>try
>to critique them from a neutral standpoint, and suggest a refined
>proposal
>that takes these perspectives into account. If nobody objects in a
>short
>time, we should consider this decided. [ I can certainly help here, but
>I'd
>love to see somebody else do it! ]
>
>[1] http://www.apache.org/foundation/voting.html
>
>On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
>
>wrote:
>
>> I also like Distinct since it doesn't make it sound like it modifies
>any
>> underlying collection. RemoveDuplicates makes it sound like the
>duplicates
>> are removed, rather than a new PCollection without duplicates being
>> returned.
>>
>> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré 
>> wrote:
>>
>> > Agree. It was more a transition proposal.
>> >
>> > Regards
>> > JB
>> >
>> > ⁣​
>> >
>> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
>> >  wrote:
>> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
>> > > wrote:
>> > >> And what about use RemoveDuplicates and create an alias Distinct
>?
>> > >
>> > >I'd really like to avoid (long term) aliases--you end up having to
>> > >document (and maintain) them both, and it adds confusion as to
>which
>> > >one to use (especially if they every diverge), and means searching
>for
>> > >one or the other yields half the results.
>> > >
>> > >> It doesn't break the API and would address both SQL users and
>more
>> > >"big data" users.
>> > >>
>> > >> My $0.01 ;)
>> > >>
>> > >> Regards
>> > >> JB
>> > >>
>> > >> ⁣
>> > >>
>> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
>> > > wrote:
>> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
>> > >>>preference:
>> > >>>
>> > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords
>are
>> > >in
>> > >>>the
>> > >>>Javadoc. This reduces churn on our users and is honestly pretty
>dang
>> > >>> descriptive.
>> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user and
>> > >likely
>> > >>>less clear otherwise. This is a backwards-incompatible API
>change, so
>> > >>>we
>> > >>>should do it before we go stable.
>> > >>>
>> > >>>I am not super strong that 1 > 2, but I am very strong that
>> > >"Distinct"
>> > >>
>> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
>> > >>>
>> > >>>Dan
>> > >>>
>> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
>> > >>>
>> > >>>wrote:
>> > >>>
>> >  The precedent that we use verbs has many exceptions. We have
>> >  ApproximateQuantiles, Values, Keys, WithTimestamps, and I
>would
>> > >even
>> >  include Sum (at least when I read it).
>> > 
>> >  Historical note: the predilection towards verbs is from the
>Google
>> > >>>Style
>> >  Guide for Java method names
>> > 
>> > >>>> 2.3-method-names
>> > >,
>> >  which states "Method names are typically verbs or verb
>phrases".
>> > >But
>> > >>>even
>> >  in Google code there are lots of exceptions when it makes
>sense,
>> > >like
>> >  Guava's
>> >  Iterables.any(), Iterables.all(), Iterables.toArray(), the
>entire
>> >  Predicates module, etc. Just an aside; Beam isn't Google code.
>I
>> > >>>suggest we
>> >  use our judgment rather than a policy.
>> > 
>> >  I think "Distinct" is one of those exceptions. It is a
>standard
>> > >>>widespread
>> >  name and also reads better as an adjective. I prefer it, but
>also
>> > >>>don't
>> >  care strongly enough to change it or to change it back :-)
>> > 
>> >  If we must have a verb, I like it as-is more than MakeDistinct
>and
>> >  AvoidDuplicate.
>> > 
>> >  On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
>> > 

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-26 Thread Lukasz Cwik
Combine.perKey takes a single SerializableFunction which knows how to
convert from Iterable to V.

It turns out that many runners implement optimizations which allow them to
run the combine operation across several machines to parallelize the work
and potentially reduce the amount of data they store during a GBK.
To be able to do such an optimization, it requires you to actually have
three functions:
InputT -> AccumulatorT : Creates the intermediate representation which
allows for associative combining
Iterable -> AccumulatorT: Performs the actual combining
AccumT -> OutputT: Extracts the output

In the case of Combine.perKey with a SerializableFunction, your providing
Iterable -> AccumulatorT and the other two functions are the
identity functions.

To be able to support a Combine.perKey which can go from Iterable
-> OutputT would require that this occurred within a single machine
removing the parallelization benefits that runners provide and for almost
all cases is not a good idea.

On Wed, Oct 26, 2016 at 6:23 PM, Manu Zhang  wrote:

> Hi all,
>
> I'm wondering why `Combine.perKey(SerializableFunction)` requires input
> and
> output to be of the same type while `Combine.PerKey` doesn't have this
> restriction.
>
> Thanks,
> Manu
>


Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-26 Thread Manu Zhang
Hi all,

I'm wondering why `Combine.perKey(SerializableFunction)` requires input and
output to be of the same type while `Combine.PerKey` doesn't have this
restriction.

Thanks,
Manu


Re: Podling Report Reminder - November 2016

2016-10-26 Thread James Malone
Hello everyone!

Unless anyone disagrees or wants to do it, I am happy to volunteer to draft
this podling report for review before we submit it. I can get it done for a
review this Friday (US-Pacific) if that works.

Cheers!

James

On Wed, Oct 26, 2016 at 4:01 PM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 16 November 2016, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, November 02).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
>
> This should be appended to the Incubator Wiki page at:
>
> http://wiki.apache.org/incubator/November2016
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Podling Report Reminder - November 2016

2016-10-26 Thread johndament
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 16 November 2016, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, November 02).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.

This should be appended to the Incubator Wiki page at:

http://wiki.apache.org/incubator/November2016

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC


Re: GitHub mirroring issue

2016-10-26 Thread Dan Halperin
(Sometimes this happens even when there is not a systemic issue: I have
seen github mirroring fail if two things are merged close together, but
usually the bot "magically" fixes it on the next commit.)

Dan

On Wed, Oct 26, 2016 at 1:40 PM, Amit Sela  wrote:

> Thanks!
>
> On Wed, Oct 26, 2016, 23:32 Suneel Marthi  wrote:
>
> > We have been seeing Github mirroring issues today on other projects too,
> > filed an Infra jira - INFRA-12830
> >
> > On Wed, Oct 26, 2016 at 4:21 PM, Amit Sela  wrote:
> >
> > > Hi all,
> > >
> > > I've merged a PR ~2 hours ago and while the apache remote seems
> > up-to-date,
> > > github didn't nor did the PR or JIRA.
> > > The last commit hash is: 6db9424 (9f30b21 merge commit).
> > >
> > > Hopefully this will update after the next commit but FYI I guess.
> > >
> > > Thanks,
> > > Amit
> > >
> >
>


Re: GitHub mirroring issue

2016-10-26 Thread Amit Sela
Thanks!

On Wed, Oct 26, 2016, 23:32 Suneel Marthi  wrote:

> We have been seeing Github mirroring issues today on other projects too,
> filed an Infra jira - INFRA-12830
>
> On Wed, Oct 26, 2016 at 4:21 PM, Amit Sela  wrote:
>
> > Hi all,
> >
> > I've merged a PR ~2 hours ago and while the apache remote seems
> up-to-date,
> > github didn't nor did the PR or JIRA.
> > The last commit hash is: 6db9424 (9f30b21 merge commit).
> >
> > Hopefully this will update after the next commit but FYI I guess.
> >
> > Thanks,
> > Amit
> >
>


Re: GitHub mirroring issue

2016-10-26 Thread Suneel Marthi
We have been seeing Github mirroring issues today on other projects too,
filed an Infra jira - INFRA-12830

On Wed, Oct 26, 2016 at 4:21 PM, Amit Sela  wrote:

> Hi all,
>
> I've merged a PR ~2 hours ago and while the apache remote seems up-to-date,
> github didn't nor did the PR or JIRA.
> The last commit hash is: 6db9424 (9f30b21 merge commit).
>
> Hopefully this will update after the next commit but FYI I guess.
>
> Thanks,
> Amit
>


GitHub mirroring issue

2016-10-26 Thread Amit Sela
Hi all,

I've merged a PR ~2 hours ago and while the apache remote seems up-to-date,
github didn't nor did the PR or JIRA.
The last commit hash is: 6db9424 (9f30b21 merge commit).

Hopefully this will update after the next commit but FYI I guess.

Thanks,
Amit


Re: [DISCUSS] Merging master -> feature branch

2016-10-26 Thread Thomas Weise
+1

For a merge from master to the feature branch that does not require extra
changes, RTC does not add value. It actually delays and burns reviewer time
(even mechanics need some) that "real" PRs could benefit from. If
adjustments are needed, then the regular process kicks in.

Thanks,
Thomas


On Wed, Oct 26, 2016 at 1:33 AM, Amit Sela  wrote:

> I generally agree with Kenneth.
>
> While working on the SparkRunnerV2 branch, it was a pain - i avoided
> frequent merges to avoid trivial PRs, but it cost me with very large and
> non-trivial merges later.
> I think that frequent merges for feature-branches should most of the time
> be trivial (no conflicts) and a committer should be allowed to self-merge
> once tests pass.
> As for conflicts, even for the smallest once I'd go with review just so
> it's very clear when self-merging is OK - we can always revisit this later
> and further discuss if we think we can improve this process.
>
> I guess +1 from me.
>
> Thanks,
> Amit.
>
> On Wed, Oct 26, 2016 at 8:10 AM Frances Perry 
> wrote:
>
> > On Tue, Oct 25, 2016 at 9:44 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > Agree. When possible it would be great to have the branch merged on
> > master
> > > quickly, even when it's not fully ready. It would give more visibility
> to
> > > potential contributors.
> > >
> >
> > This thread is about the opposite, I think -- merging master into feature
> > branches regularly to prevent them from getting out of sync.
> >
> > As for increasing the visibility of feature branches, we have these new
> > webpages:
> > http://beam.incubator.apache.org/contribute/work-in-progress/
> > http://beam.incubator.apache.org/contribute/contribution-
> > guide/#feature-branches
> > with more changes coming in the basic SDK/Runner landing pages too.
> >
>


Re: build failed with dependency problems

2016-10-26 Thread Scott Wegner
I believe this is JIRA issue BEAM-688


On Wed, Oct 26, 2016 at 1:15 AM Manu Zhang  wrote:

> Thanks, it succeeded after `maven clean`.
>
> On Wed, Oct 26, 2016 at 2:58 PM Amit Sela  wrote:
>
> I just fetched and pulled latest master and build succeeded, maybe try
> again ?
>
> On Wed, Oct 26, 2016 at 9:19 AM Manu Zhang 
> wrote:
>
> > Hi All,
> >
> > I tried to build latest master but failed with the following dependency
> > problems.
> >
> > [INFO] --- maven-dependency-plugin:2.10:analyze-only (default) @
> > beam-sdks-java-maven-archetypes-starter ---
> > [WARNING] Used undeclared dependencies found:
> > [WARNING]org.slf4j:slf4j-api:jar:1.7.14:runtime
> > [INFO]
> > 
> > [INFO] Reactor Summary:
> > [INFO]
> > [INFO] Apache Beam :: Parent .. SUCCESS [
> >  3.428 s]
> > [INFO] Apache Beam :: SDKs  SUCCESS [
> >  0.063 s]
> > [INFO] Apache Beam :: SDKs :: Java  SUCCESS [
> >  0.057 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Build Tools . SUCCESS [
> >  1.070 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Core  SUCCESS
> [03:49
> > min]
> > [INFO] Apache Beam :: Runners . SUCCESS [
> >  0.191 s]
> > [INFO] Apache Beam :: Runners :: Core Java  SUCCESS [
> > 43.679 s]
> > [INFO] Apache Beam :: Runners :: Direct Java .. SUCCESS [
> > 32.827 s]
> > [INFO] Apache Beam :: Runners :: Google Cloud Dataflow  SUCCESS [
> > 17.201 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO .. SUCCESS [
> >  0.071 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: Google Cloud Platform SUCCESS
> [
> > 17.278 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: HDFS .. SUCCESS [
> > 11.424 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: JMS ... SUCCESS [
> >  2.571 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: Kafka . SUCCESS [
> >  1.658 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: Kinesis ... SUCCESS [
> >  3.208 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: MongoDB ... SUCCESS [
> >  3.026 s]
> > [INFO] Apache Beam :: SDKs :: Java :: IO :: JDBC .. SUCCESS [
> >  1.732 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Extensions .. SUCCESS [
> >  0.028 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Extensions :: Join library SUCCESS
> [
> >  1.057 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Microbenchmarks . SUCCESS [
> >  6.892 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Java 8 Tests  SUCCESS [
> >  2.396 s]
> > [INFO] Apache Beam :: Runners :: Flink  SUCCESS [
> >  0.028 s]
> > [INFO] Apache Beam :: Runners :: Flink :: Core  SUCCESS [
> >  6.688 s]
> > [INFO] Apache Beam :: Runners :: Flink :: Examples  SUCCESS [
> >  2.966 s]
> > [INFO] Apache Beam :: Runners :: Spark  SUCCESS [
> > 27.347 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes  SUCCESS [
> >  0.040 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Starter FAILURE
> [
> >  3.281 s]
> > [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Examples
> SKIPPED
> > [INFO] Apache Beam :: Examples  SKIPPED
> > [INFO] Apache Beam :: Examples :: Java  SKIPPED
> > [INFO] Apache Beam :: Examples :: Java 8 .. SKIPPED
> > [INFO]
> > 
> > [INFO] BUILD FAILURE
> > [INFO]
> > 
> > [INFO] Total time: 07:01 min
> > [INFO] Finished at: 2016-10-26T14:10:23+08:00
> > [INFO] Final Memory: 143M/551M
> > [INFO]
> > 
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-dependency-plugin:2.10:analyze-only
> > (default) on project beam-sdks-java-maven-archetypes-starter: Dependency
> > problems found -> [Help 1]
> > [ERROR]
> > [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e
> > switch.
> > [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> > [ERROR]
> > [ERROR] For more information about the errors and possible solutions,
> > please read the following articles:
> > [ERROR] [Help 1]
> > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> > [ERROR]
> > [ERROR] After correcting the problems, you can resume the build with the
> > command
> > [ERROR]   mvn  -rf :beam-sdks-java-maven-archetypes-starter
> >
> > Has anyone seen the same thing ?
> >
> > Thanks,
> > Manu
> >
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Ben Chambers
I also like Distinct since it doesn't make it sound like it modifies any
underlying collection. RemoveDuplicates makes it sound like the duplicates
are removed, rather than a new PCollection without duplicates being
returned.

On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré  wrote:

> Agree. It was more a transition proposal.
>
> Regards
> JB
>
> ⁣​
>
> On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
>  wrote:
> >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> > wrote:
> >> And what about use RemoveDuplicates and create an alias Distinct ?
> >
> >I'd really like to avoid (long term) aliases--you end up having to
> >document (and maintain) them both, and it adds confusion as to which
> >one to use (especially if they every diverge), and means searching for
> >one or the other yields half the results.
> >
> >> It doesn't break the API and would address both SQL users and more
> >"big data" users.
> >>
> >> My $0.01 ;)
> >>
> >> Regards
> >> JB
> >>
> >> ⁣
> >>
> >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> > wrote:
> >>>I find "MakeDistinct" more confusing. My votes in decreasing
> >>>preference:
> >>>
> >>>1. Keep `RemoveDuplicates` name, ensure that important keywords are
> >in
> >>>the
> >>>Javadoc. This reduces churn on our users and is honestly pretty dang
> >>> descriptive.
> >>>2. Rename to `Distinct`, which is clear if you're a SQL user and
> >likely
> >>>less clear otherwise. This is a backwards-incompatible API change, so
> >>>we
> >>>should do it before we go stable.
> >>>
> >>>I am not super strong that 1 > 2, but I am very strong that
> >"Distinct"
> >>
> >>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
> >>>
> >>>Dan
> >>>
> >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
> >>>
> >>>wrote:
> >>>
>  The precedent that we use verbs has many exceptions. We have
>  ApproximateQuantiles, Values, Keys, WithTimestamps, and I would
> >even
>  include Sum (at least when I read it).
> 
>  Historical note: the predilection towards verbs is from the Google
> >>>Style
>  Guide for Java method names
> 
> >>> >,
>  which states "Method names are typically verbs or verb phrases".
> >But
> >>>even
>  in Google code there are lots of exceptions when it makes sense,
> >like
>  Guava's
>  Iterables.any(), Iterables.all(), Iterables.toArray(), the entire
>  Predicates module, etc. Just an aside; Beam isn't Google code. I
> >>>suggest we
>  use our judgment rather than a policy.
> 
>  I think "Distinct" is one of those exceptions. It is a standard
> >>>widespread
>  name and also reads better as an adjective. I prefer it, but also
> >>>don't
>  care strongly enough to change it or to change it back :-)
> 
>  If we must have a verb, I like it as-is more than MakeDistinct and
>  AvoidDuplicate.
> 
>  On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
> >>>
>  wrote:
> 
>  > My original thought for this change was that Crunch uses the
> >class
> >>>name
>  > Distinct. SQL also uses the keyword distinct.
>  >
>  > Maybe the rule should be changed to adjectives or verbs depending
> >>>on the
>  > context.
>  >
>  > Using a verb to describe this class really doesn't connote what
> >the
> >>>class
>  > does as succinctly as the adjective.
>  >
>  > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
> >>>
>  > wrote:
>  >
>  > > Hello,
>  > >
>  > > First of all, thank you to Daniel, Robert and Jesse for their
> >>>review on
>  > > this: https://issues.apache.org/jira/browse/BEAM-239
>  > >
>  > > A point that came up was using verbs explicitly for Transforms.
>  > > Here is the PR:
> >>>https://github.com/apache/incubator-beam/pull/1164
>  > >
>  > > Posting it to help understand if we have a consensus for it and
> >>>if yes,
>  > we
>  > > could perhaps document it for future changes.
>  > >
>  > > Thank you.
>  > >
>  > > --
>  > > Neelesh Srinivas Salian
>  > > Engineer
>  > >
>  >
> 
>


Re: Start of release 0.3.0-incubating

2016-10-26 Thread Aljoscha Krettek
The release guide [1] has a section about that. Before doing a release we
check whether there are blocker issues or issues that have the
to-be-released version as the fix version. If there are any those have to
be resolved before going forward with the release.

[1] http://beam.incubator.apache.org/contribute/release-guide/

On Wed, 26 Oct 2016 at 10:00 Maximilian Michels  wrote:

> For releases, legal matters have top priority, e.g. licensing issues
> can really get a project into trouble. Apart from that, what about
> testing various functionality of Beam with different runners before an
> actual release? Also, should we have a look at the list of open issues
> and decide whether we want to fix some of those for the upcoming
> release?
>
> For example, it would have been nice to update the Flink version of
> the Flink Runner to 1.1.3. Perhaps we can do that for the first minor
> release :)
>
> -Max
>
>
> On Mon, Oct 24, 2016 at 4:28 PM, Dan Halperin
>  wrote:
> > Thanks JB! (et al.) Excellent suggestions.
> >
> > Thanks,
> > Dan
> >
> > On Thu, Oct 20, 2016 at 9:32 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> >> Hi Dan,
> >>
> >> No problem, MQTT and other IOs will be in the next release..
> >>
> >> IMHO, it would be great to have:
> >> 1. A release reminder couple of days before a release. Just to ask
> >> everyone if there's no objection (something like this:
> >> https://lists.apache.org/thread.html/80de75df0115940ca402132
> >> 338b221e5dd5f669fd1bf915cd95e15c3@%3Cdev.karaf.apache.org%3E)
> >> 2. A roughly release schedule on the website (something like this:
> >> http://karaf.apache.org/download.html#container-schedule for instance).
> >>
> >> Just my $0.01 ;)
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 10/20/2016 06:30 PM, Dan Halperin wrote:
> >>
> >>> Hi JB,
> >>>
> >>> This is a great discussion to have! IMO, there's no special
> functionality
> >>> requirements for these pre-TLP releases. It's more important to make
> sure
> >>> we keep the process going. (I think we should start the release as
> soon as
> >>> possible, because it's been 2 months since the last one.)
> >>>
> >>> If we hold a release a week for MQTT, we'll hold it another week for
> some
> >>> other new feature, and then hold it again for some other new feature.
> >>>
> >>> Can you make a strong argument for why MQTT in particular should be
> >>> release
> >>> blocking?
> >>>
> >>> Dan
> >>>
> >>> On Thu, Oct 20, 2016 at 9:26 AM, Jean-Baptiste Onofré  >
> >>> wrote:
> >>>
> >>> +1
> 
>  Thanks Aljosha !!
> 
>  Do you mind to wait the week end or Monday to start the release ? I
> would
>  like to include MqttIO if possible.
> 
>  Thanks !
>  Regards
>  JB
> 
>  ⁣
> 
>  On Oct 20, 2016, 18:07, at 18:07, Dan Halperin
>  
>  wrote:
> 
> > On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> > 
> > wrote:
> >
> > Hi,
> >> thanks for taking the time and writing this extensive doc!
> >>
> >> If no-one is against this I would like to be the release manager for
> >>
> > the
> >
> >> next (0.3.0-incubating) release. I would work with the guide and
> >>
> > update it
> >
> >> with anything that I learn along the way. Should I open a new thread
> >>
> > for
> >
> >> this or is it ok of nobody objects here?
> >>
> >> Cheers,
> >> Aljoscha
> >>
> >>
> > Spinning this out as a separate thread.
> >
> > +1 -- Sounds great to me!
> >
> > Dan
> >
> > On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> > 
> > wrote:
> >
> > Hi,
> >> thanks for taking the time and writing this extensive doc!
> >>
> >> If no-one is against this I would like to be the release manager for
> >>
> > the
> >
> >> next (0.3.0-incubating) release. I would work with the guide and
> >>
> > update it
> >
> >> with anything that I learn along the way. Should I open a new thread
> >>
> > for
> >
> >> this or is it ok of nobody objects here?
> >>
> >> Cheers,
> >> Aljoscha
> >>
> >> On Thu, 20 Oct 2016 at 07:10 Jean-Baptiste Onofré 
> >>
> > wrote:
> >
> >>
> >> Hi,
> >>>
> >>> well done.
> >>>
> >>> As already discussed, it looks good to me ;)
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 10/20/2016 01:24 AM, Davor Bonaci wrote:
> >>>
>  Hi everybody,
>  As a project, I think we should have a Release Guide to document
> 
> >>> the
> >
> >> process, have consistent releases, on-board additional release
> 
> >>> managers,
> >>
> >>> and generally share knowledge. It is also one of the project
> 
> >>> 

Re: [DISCUSS] Merging master -> feature branch

2016-10-26 Thread Amit Sela
I generally agree with Kenneth.

While working on the SparkRunnerV2 branch, it was a pain - i avoided
frequent merges to avoid trivial PRs, but it cost me with very large and
non-trivial merges later.
I think that frequent merges for feature-branches should most of the time
be trivial (no conflicts) and a committer should be allowed to self-merge
once tests pass.
As for conflicts, even for the smallest once I'd go with review just so
it's very clear when self-merging is OK - we can always revisit this later
and further discuss if we think we can improve this process.

I guess +1 from me.

Thanks,
Amit.

On Wed, Oct 26, 2016 at 8:10 AM Frances Perry 
wrote:

> On Tue, Oct 25, 2016 at 9:44 PM, Jean-Baptiste Onofré 
> wrote:
>
> > Agree. When possible it would be great to have the branch merged on
> master
> > quickly, even when it's not fully ready. It would give more visibility to
> > potential contributors.
> >
>
> This thread is about the opposite, I think -- merging master into feature
> branches regularly to prevent them from getting out of sync.
>
> As for increasing the visibility of feature branches, we have these new
> webpages:
> http://beam.incubator.apache.org/contribute/work-in-progress/
> http://beam.incubator.apache.org/contribute/contribution-
> guide/#feature-branches
> with more changes coming in the basic SDK/Runner landing pages too.
>


Re: build failed with dependency problems

2016-10-26 Thread Manu Zhang
Thanks, it succeeded after `maven clean`.

On Wed, Oct 26, 2016 at 2:58 PM Amit Sela  wrote:

I just fetched and pulled latest master and build succeeded, maybe try
again ?

On Wed, Oct 26, 2016 at 9:19 AM Manu Zhang  wrote:

> Hi All,
>
> I tried to build latest master but failed with the following dependency
> problems.
>
> [INFO] --- maven-dependency-plugin:2.10:analyze-only (default) @
> beam-sdks-java-maven-archetypes-starter ---
> [WARNING] Used undeclared dependencies found:
> [WARNING]org.slf4j:slf4j-api:jar:1.7.14:runtime
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Beam :: Parent .. SUCCESS [
>  3.428 s]
> [INFO] Apache Beam :: SDKs  SUCCESS [
>  0.063 s]
> [INFO] Apache Beam :: SDKs :: Java  SUCCESS [
>  0.057 s]
> [INFO] Apache Beam :: SDKs :: Java :: Build Tools . SUCCESS [
>  1.070 s]
> [INFO] Apache Beam :: SDKs :: Java :: Core  SUCCESS [03:49
> min]
> [INFO] Apache Beam :: Runners . SUCCESS [
>  0.191 s]
> [INFO] Apache Beam :: Runners :: Core Java  SUCCESS [
> 43.679 s]
> [INFO] Apache Beam :: Runners :: Direct Java .. SUCCESS [
> 32.827 s]
> [INFO] Apache Beam :: Runners :: Google Cloud Dataflow  SUCCESS [
> 17.201 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO .. SUCCESS [
>  0.071 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: Google Cloud Platform SUCCESS
[
> 17.278 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: HDFS .. SUCCESS [
> 11.424 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: JMS ... SUCCESS [
>  2.571 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: Kafka . SUCCESS [
>  1.658 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: Kinesis ... SUCCESS [
>  3.208 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: MongoDB ... SUCCESS [
>  3.026 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: JDBC .. SUCCESS [
>  1.732 s]
> [INFO] Apache Beam :: SDKs :: Java :: Extensions .. SUCCESS [
>  0.028 s]
> [INFO] Apache Beam :: SDKs :: Java :: Extensions :: Join library SUCCESS [
>  1.057 s]
> [INFO] Apache Beam :: SDKs :: Java :: Microbenchmarks . SUCCESS [
>  6.892 s]
> [INFO] Apache Beam :: SDKs :: Java :: Java 8 Tests  SUCCESS [
>  2.396 s]
> [INFO] Apache Beam :: Runners :: Flink  SUCCESS [
>  0.028 s]
> [INFO] Apache Beam :: Runners :: Flink :: Core  SUCCESS [
>  6.688 s]
> [INFO] Apache Beam :: Runners :: Flink :: Examples  SUCCESS [
>  2.966 s]
> [INFO] Apache Beam :: Runners :: Spark  SUCCESS [
> 27.347 s]
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes  SUCCESS [
>  0.040 s]
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Starter FAILURE
[
>  3.281 s]
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Examples SKIPPED
> [INFO] Apache Beam :: Examples  SKIPPED
> [INFO] Apache Beam :: Examples :: Java  SKIPPED
> [INFO] Apache Beam :: Examples :: Java 8 .. SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 07:01 min
> [INFO] Finished at: 2016-10-26T14:10:23+08:00
> [INFO] Final Memory: 143M/551M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-dependency-plugin:2.10:analyze-only
> (default) on project beam-sdks-java-maven-archetypes-starter: Dependency
> problems found -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
-e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :beam-sdks-java-maven-archetypes-starter
>
> Has anyone seen the same thing ?
>
> Thanks,
> Manu
>


Re: [VOTE] Release 0.3.0-incubating, release candidate #1

2016-10-26 Thread Maximilian Michels
+1 (binding)

Thanks for managing the release, Aljoscha!

-Max


On Wed, Oct 26, 2016 at 6:46 AM, Jean-Baptiste Onofré  wrote:
> Agree. We already discussed about that on the mailing list. I mentionned this 
> some weeks ago.
>
> Regards
> JB
>
> ⁣
>
> On Oct 26, 2016, 02:26, at 02:26, Dan Halperin  
> wrote:
>>My reading of the LEGAL threads is that since we are not including
>>(shading
>>or bundling) the ASL-licensed code we are fine to distribute kinesis-io
>>module. This was the original conclusion that LEGAL-198 got to, and
>>that
>>thread has not been resolved differently (even if Spark went ahead and
>>broke the assembly). The beam-sdks-java-io-kinesis module is an
>>optional
>>part (Beam materially works just fine without it).
>>
>>So I think we're fine to keep this vote open.
>>
>>+1 (binding) on the release
>>
>>Thanks Aljoscha!
>>
>>
>>On Tue, Oct 25, 2016 at 12:07 PM, Aljoscha Krettek
>>
>>wrote:
>>
>>> Yep, I was looking at those same threads when I reviewing the
>>artefacts.
>>> The release was already close to being finished so I went through
>>with it
>>> but if we think it's not good to have them in we should quickly
>>cancel in
>>> favour of a new RC without a published Kinesis connector.
>>>
>>> On Tue, 25 Oct 2016 at 20:46 Dan Halperin
>>
>>> wrote:
>>>
>>> > I can't tell whether it is a problem that we are distributing the
>>> > beam-sdks-java-io-kinesis module [0].
>>> >
>>> > Here is the dev@ discussion thread [1] and the (unanswered)
>>relevant
>>> LEGAL
>>> > thread [2].
>>> > We linked through to a Spark-related discussion [3], and here is
>>how to
>>> > disable distribution of the KinesisIO module [4].
>>> >
>>> > [0]
>>> >
>>> > https://repository.apache.org/content/repositories/staging/
>>> org/apache/beam/beam-sdks-java-io-kinesis/
>>> > [1]
>>> >
>>> > https://lists.apache.org/thread.html/6784bc005f329d93fd59d0f8759ed4
>>> 745e72f105e39d869e094d9645@%3Cdev.beam.apache.org%3E
>>> > [2]
>>> >
>>> > https://issues.apache.org/jira/browse/LEGAL-198?
>>> focusedCommentId=15471529=com.atlassian.jira.
>>> plugin.system.issuetabpanels:comment-tabpanel#comment-15471529
>>> > [3] https://issues.apache.org/jira/browse/SPARK-17418
>>> > [4] https://github.com/apache/spark/pull/15167/files
>>> >
>>> > Dan
>>> >
>>> > On Tue, Oct 25, 2016 at 11:01 AM, Seetharam Venkatesh <
>>> > venkat...@innerzeal.com> wrote:
>>> >
>>> > > +1
>>> > >
>>> > > Thanks!
>>> > >
>>> > > On Mon, Oct 24, 2016 at 2:30 PM Aljoscha Krettek
>>
>>> > > wrote:
>>> > >
>>> > > > Hi Team!
>>> > > >
>>> > > > Please review and vote at your leisure on release candidate #1
>>for
>>> > > version
>>> > > > 0.3.0-incubating, as follows:
>>> > > > [ ] +1, Approve the release
>>> > > > [ ] -1, Do not approve the release (please provide specific
>>comments)
>>> > > >
>>> > > > The complete staging area is available for your review, which
>>> includes:
>>> > > > * JIRA release notes [1],
>>> > > > * the official Apache source release to be deployed to
>>> dist.apache.org
>>> > > > [2],
>>> > > > * all artifacts to be deployed to the Maven Central Repository
>>[3],
>>> > > > * source code tag "v0.3.0-incubating-RC1" [4],
>>> > > > * website pull request listing the release and publishing the
>>API
>>> > > reference
>>> > > > manual [5].
>>> > > >
>>> > > > Please keep in mind that this release is not focused on
>>providing new
>>> > > > functionality. We want to refine the release process and make
>>stable
>>> > > source
>>> > > > and binary artefacts available to our users.
>>> > > >
>>> > > > The vote will be open for at least 72 hours. It is adopted by
>>> majority
>>> > > > approval, with at least 3 PPMC affirmative votes.
>>> > > >
>>> > > > Cheers,
>>> > > > Aljoscha
>>> > > >
>>> > > > [1]
>>> > > >
>>> > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>>> > > projectId=12319527=12338051
>>> > > > [2]
>>> > > >
>>> >
>>https://dist.apache.org/repos/dist/dev/incubator/beam/0.3.0-incubating/
>>> > > > [3]
>>> > > > https://repository.apache.org/content/repositories/staging/
>>> > > org/apache/beam/
>>> > > > [4]
>>> > > >
>>> > > > https://git-wip-us.apache.org/repos/asf?p=incubator-beam.
>>> git;a=tag;h=
>>> > > 5d86ff7f04862444c266142b0d5acecb5a6b7144
>>> > > > [5] https://github.com/apache/incubator-beam-site/pull/52
>>> > > >
>>> > >
>>> >
>>>


Re: build failed with dependency problems

2016-10-26 Thread Amit Sela
I just fetched and pulled latest master and build succeeded, maybe try
again ?

On Wed, Oct 26, 2016 at 9:19 AM Manu Zhang  wrote:

> Hi All,
>
> I tried to build latest master but failed with the following dependency
> problems.
>
> [INFO] --- maven-dependency-plugin:2.10:analyze-only (default) @
> beam-sdks-java-maven-archetypes-starter ---
> [WARNING] Used undeclared dependencies found:
> [WARNING]org.slf4j:slf4j-api:jar:1.7.14:runtime
> [INFO]
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Beam :: Parent .. SUCCESS [
>  3.428 s]
> [INFO] Apache Beam :: SDKs  SUCCESS [
>  0.063 s]
> [INFO] Apache Beam :: SDKs :: Java  SUCCESS [
>  0.057 s]
> [INFO] Apache Beam :: SDKs :: Java :: Build Tools . SUCCESS [
>  1.070 s]
> [INFO] Apache Beam :: SDKs :: Java :: Core  SUCCESS [03:49
> min]
> [INFO] Apache Beam :: Runners . SUCCESS [
>  0.191 s]
> [INFO] Apache Beam :: Runners :: Core Java  SUCCESS [
> 43.679 s]
> [INFO] Apache Beam :: Runners :: Direct Java .. SUCCESS [
> 32.827 s]
> [INFO] Apache Beam :: Runners :: Google Cloud Dataflow  SUCCESS [
> 17.201 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO .. SUCCESS [
>  0.071 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: Google Cloud Platform SUCCESS [
> 17.278 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: HDFS .. SUCCESS [
> 11.424 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: JMS ... SUCCESS [
>  2.571 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: Kafka . SUCCESS [
>  1.658 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: Kinesis ... SUCCESS [
>  3.208 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: MongoDB ... SUCCESS [
>  3.026 s]
> [INFO] Apache Beam :: SDKs :: Java :: IO :: JDBC .. SUCCESS [
>  1.732 s]
> [INFO] Apache Beam :: SDKs :: Java :: Extensions .. SUCCESS [
>  0.028 s]
> [INFO] Apache Beam :: SDKs :: Java :: Extensions :: Join library SUCCESS [
>  1.057 s]
> [INFO] Apache Beam :: SDKs :: Java :: Microbenchmarks . SUCCESS [
>  6.892 s]
> [INFO] Apache Beam :: SDKs :: Java :: Java 8 Tests  SUCCESS [
>  2.396 s]
> [INFO] Apache Beam :: Runners :: Flink  SUCCESS [
>  0.028 s]
> [INFO] Apache Beam :: Runners :: Flink :: Core  SUCCESS [
>  6.688 s]
> [INFO] Apache Beam :: Runners :: Flink :: Examples  SUCCESS [
>  2.966 s]
> [INFO] Apache Beam :: Runners :: Spark  SUCCESS [
> 27.347 s]
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes  SUCCESS [
>  0.040 s]
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Starter FAILURE [
>  3.281 s]
> [INFO] Apache Beam :: SDKs :: Java :: Maven Archetypes :: Examples SKIPPED
> [INFO] Apache Beam :: Examples  SKIPPED
> [INFO] Apache Beam :: Examples :: Java  SKIPPED
> [INFO] Apache Beam :: Examples :: Java 8 .. SKIPPED
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 07:01 min
> [INFO] Finished at: 2016-10-26T14:10:23+08:00
> [INFO] Final Memory: 143M/551M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-dependency-plugin:2.10:analyze-only
> (default) on project beam-sdks-java-maven-archetypes-starter: Dependency
> problems found -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :beam-sdks-java-maven-archetypes-starter
>
> Has anyone seen the same thing ?
>
> Thanks,
> Manu
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Jean-Baptiste Onofré
Agree. It was more a transition proposal.

Regards
JB

⁣​

On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw  
wrote:
>On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> wrote:
>> And what about use RemoveDuplicates and create an alias Distinct ?
>
>I'd really like to avoid (long term) aliases--you end up having to
>document (and maintain) them both, and it adds confusion as to which
>one to use (especially if they every diverge), and means searching for
>one or the other yields half the results.
>
>> It doesn't break the API and would address both SQL users and more
>"big data" users.
>>
>> My $0.01 ;)
>>
>> Regards
>> JB
>>
>> ⁣
>>
>> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> wrote:
>>>I find "MakeDistinct" more confusing. My votes in decreasing
>>>preference:
>>>
>>>1. Keep `RemoveDuplicates` name, ensure that important keywords are
>in
>>>the
>>>Javadoc. This reduces churn on our users and is honestly pretty dang
>>> descriptive.
>>>2. Rename to `Distinct`, which is clear if you're a SQL user and
>likely
>>>less clear otherwise. This is a backwards-incompatible API change, so
>>>we
>>>should do it before we go stable.
>>>
>>>I am not super strong that 1 > 2, but I am very strong that
>"Distinct"
>>
>>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
>>>
>>>Dan
>>>
>>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
>>>
>>>wrote:
>>>
 The precedent that we use verbs has many exceptions. We have
 ApproximateQuantiles, Values, Keys, WithTimestamps, and I would
>even
 include Sum (at least when I read it).

 Historical note: the predilection towards verbs is from the Google
>>>Style
 Guide for Java method names

>>>,
 which states "Method names are typically verbs or verb phrases".
>But
>>>even
 in Google code there are lots of exceptions when it makes sense,
>like
 Guava's
 Iterables.any(), Iterables.all(), Iterables.toArray(), the entire
 Predicates module, etc. Just an aside; Beam isn't Google code. I
>>>suggest we
 use our judgment rather than a policy.

 I think "Distinct" is one of those exceptions. It is a standard
>>>widespread
 name and also reads better as an adjective. I prefer it, but also
>>>don't
 care strongly enough to change it or to change it back :-)

 If we must have a verb, I like it as-is more than MakeDistinct and
 AvoidDuplicate.

 On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
>>>
 wrote:

 > My original thought for this change was that Crunch uses the
>class
>>>name
 > Distinct. SQL also uses the keyword distinct.
 >
 > Maybe the rule should be changed to adjectives or verbs depending
>>>on the
 > context.
 >
 > Using a verb to describe this class really doesn't connote what
>the
>>>class
 > does as succinctly as the adjective.
 >
 > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
>>>
 > wrote:
 >
 > > Hello,
 > >
 > > First of all, thank you to Daniel, Robert and Jesse for their
>>>review on
 > > this: https://issues.apache.org/jira/browse/BEAM-239
 > >
 > > A point that came up was using verbs explicitly for Transforms.
 > > Here is the PR:
>>>https://github.com/apache/incubator-beam/pull/1164
 > >
 > > Posting it to help understand if we have a consensus for it and
>>>if yes,
 > we
 > > could perhaps document it for future changes.
 > >
 > > Thank you.
 > >
 > > --
 > > Neelesh Srinivas Salian
 > > Engineer
 > >
 >



Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Robert Bradshaw
On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré  
wrote:
> And what about use RemoveDuplicates and create an alias Distinct ?

I'd really like to avoid (long term) aliases--you end up having to
document (and maintain) them both, and it adds confusion as to which
one to use (especially if they every diverge), and means searching for
one or the other yields half the results.

> It doesn't break the API and would address both SQL users and more "big data" 
> users.
>
> My $0.01 ;)
>
> Regards
> JB
>
> ⁣
>
> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin  
> wrote:
>>I find "MakeDistinct" more confusing. My votes in decreasing
>>preference:
>>
>>1. Keep `RemoveDuplicates` name, ensure that important keywords are in
>>the
>>Javadoc. This reduces churn on our users and is honestly pretty dang
>> descriptive.
>>2. Rename to `Distinct`, which is clear if you're a SQL user and likely
>>less clear otherwise. This is a backwards-incompatible API change, so
>>we
>>should do it before we go stable.
>>
>>I am not super strong that 1 > 2, but I am very strong that "Distinct"
>
>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
>>
>>Dan
>>
>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
>>
>>wrote:
>>
>>> The precedent that we use verbs has many exceptions. We have
>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I would even
>>> include Sum (at least when I read it).
>>>
>>> Historical note: the predilection towards verbs is from the Google
>>Style
>>> Guide for Java method names
>>>
>>,
>>> which states "Method names are typically verbs or verb phrases". But
>>even
>>> in Google code there are lots of exceptions when it makes sense, like
>>> Guava's
>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the entire
>>> Predicates module, etc. Just an aside; Beam isn't Google code. I
>>suggest we
>>> use our judgment rather than a policy.
>>>
>>> I think "Distinct" is one of those exceptions. It is a standard
>>widespread
>>> name and also reads better as an adjective. I prefer it, but also
>>don't
>>> care strongly enough to change it or to change it back :-)
>>>
>>> If we must have a verb, I like it as-is more than MakeDistinct and
>>> AvoidDuplicate.
>>>
>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
>>
>>> wrote:
>>>
>>> > My original thought for this change was that Crunch uses the class
>>name
>>> > Distinct. SQL also uses the keyword distinct.
>>> >
>>> > Maybe the rule should be changed to adjectives or verbs depending
>>on the
>>> > context.
>>> >
>>> > Using a verb to describe this class really doesn't connote what the
>>class
>>> > does as succinctly as the adjective.
>>> >
>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
>>
>>> > wrote:
>>> >
>>> > > Hello,
>>> > >
>>> > > First of all, thank you to Daniel, Robert and Jesse for their
>>review on
>>> > > this: https://issues.apache.org/jira/browse/BEAM-239
>>> > >
>>> > > A point that came up was using verbs explicitly for Transforms.
>>> > > Here is the PR:
>>https://github.com/apache/incubator-beam/pull/1164
>>> > >
>>> > > Posting it to help understand if we have a consensus for it and
>>if yes,
>>> > we
>>> > > could perhaps document it for future changes.
>>> > >
>>> > > Thank you.
>>> > >
>>> > > --
>>> > > Neelesh Srinivas Salian
>>> > > Engineer
>>> > >
>>> >
>>>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Jesse Anderson
A recap of options for RemoveDuplicates:

   - Leave the name as is and update the JavaDocs
   - Rename to Distinct
   - Rename to MakeDistinct
   - Rename to Deduplicate



On Wed, Oct 26, 2016 at 8:10 AM Jean-Baptiste Onofré 
wrote:

> OK. No problem.
>
> Regards
> JB
>
> ⁣​
>
> On Oct 26, 2016, 07:56, at 07:56, Kenneth Knowles 
> wrote:
> >To be clear: I am not saying that I think the discussion has concluded.
> >I
> >think we should give some more time for different time zone rotations
> >to
> >occur. I just meant to say that if it does come to a vote, I'd prefer
> >to
> >keep it focused rather than generalizing.
> >
> >On Tue, Oct 25, 2016 at 10:51 PM Kenneth Knowles 
> >wrote:
> >
> >> I'd prefer to keep the vote focused on this rename, not a general
> >policy.
> >>
> >> On Tue, Oct 25, 2016 at 10:26 PM Jean-Baptiste Onofré
> >
> >> wrote:
> >>
> >> Yes I would start a formal vote with the three proposals: descriptive
> >> verb, adjective, verbs + adjective.
> >>
> >> Regards
> >> JB
> >>
> >> ⁣​
> >>
> >> On Oct 26, 2016, 07:16, at 07:16, Jesse Anderson
> >
> >> wrote:
> >> >We need to make a decision on this so Neelesh can finish his commit.
> >> >Should
> >> >we take a vote or something?
> >> >
> >> >On Tue, Oct 25, 2016, 7:55 AM Jean-Baptiste Onofré 
> >> >wrote:
> >> >
> >> >> Sounds good to me.
> >> >>
> >> >> ⁣​
> >> >>
> >> >> On Oct 24, 2016, 19:11, at 19:11, je...@smokinghand.com wrote:
> >> >> >I prefer MakeDistinct if we have to make it a verb.
> >> >>
> >>
> >>
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Jean-Baptiste Onofré
OK. No problem.

Regards
JB

⁣​

On Oct 26, 2016, 07:56, at 07:56, Kenneth Knowles  
wrote:
>To be clear: I am not saying that I think the discussion has concluded.
>I
>think we should give some more time for different time zone rotations
>to
>occur. I just meant to say that if it does come to a vote, I'd prefer
>to
>keep it focused rather than generalizing.
>
>On Tue, Oct 25, 2016 at 10:51 PM Kenneth Knowles 
>wrote:
>
>> I'd prefer to keep the vote focused on this rename, not a general
>policy.
>>
>> On Tue, Oct 25, 2016 at 10:26 PM Jean-Baptiste Onofré
>
>> wrote:
>>
>> Yes I would start a formal vote with the three proposals: descriptive
>> verb, adjective, verbs + adjective.
>>
>> Regards
>> JB
>>
>> ⁣​
>>
>> On Oct 26, 2016, 07:16, at 07:16, Jesse Anderson
>
>> wrote:
>> >We need to make a decision on this so Neelesh can finish his commit.
>> >Should
>> >we take a vote or something?
>> >
>> >On Tue, Oct 25, 2016, 7:55 AM Jean-Baptiste Onofré 
>> >wrote:
>> >
>> >> Sounds good to me.
>> >>
>> >> ⁣​
>> >>
>> >> On Oct 24, 2016, 19:11, at 19:11, je...@smokinghand.com wrote:
>> >> >I prefer MakeDistinct if we have to make it a verb.
>> >>
>>
>>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Jean-Baptiste Onofré
And what about use RemoveDuplicates and create an alias Distinct ?

It doesn't break the API and would address both SQL users and more "big data" 
users.

My $0.01 ;)

Regards
JB

⁣​

On Oct 24, 2016, 22:23, at 22:23, Dan Halperin  
wrote:
>I find "MakeDistinct" more confusing. My votes in decreasing
>preference:
>
>1. Keep `RemoveDuplicates` name, ensure that important keywords are in
>the
>Javadoc. This reduces churn on our users and is honestly pretty dang
> descriptive.
>2. Rename to `Distinct`, which is clear if you're a SQL user and likely
>less clear otherwise. This is a backwards-incompatible API change, so
>we
>should do it before we go stable.
>
>I am not super strong that 1 > 2, but I am very strong that "Distinct"

>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
>
>Dan
>
>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
>
>wrote:
>
>> The precedent that we use verbs has many exceptions. We have
>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I would even
>> include Sum (at least when I read it).
>>
>> Historical note: the predilection towards verbs is from the Google
>Style
>> Guide for Java method names
>>
>,
>> which states "Method names are typically verbs or verb phrases". But
>even
>> in Google code there are lots of exceptions when it makes sense, like
>> Guava's
>> Iterables.any(), Iterables.all(), Iterables.toArray(), the entire
>> Predicates module, etc. Just an aside; Beam isn't Google code. I
>suggest we
>> use our judgment rather than a policy.
>>
>> I think "Distinct" is one of those exceptions. It is a standard
>widespread
>> name and also reads better as an adjective. I prefer it, but also
>don't
>> care strongly enough to change it or to change it back :-)
>>
>> If we must have a verb, I like it as-is more than MakeDistinct and
>> AvoidDuplicate.
>>
>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
>
>> wrote:
>>
>> > My original thought for this change was that Crunch uses the class
>name
>> > Distinct. SQL also uses the keyword distinct.
>> >
>> > Maybe the rule should be changed to adjectives or verbs depending
>on the
>> > context.
>> >
>> > Using a verb to describe this class really doesn't connote what the
>class
>> > does as succinctly as the adjective.
>> >
>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
>
>> > wrote:
>> >
>> > > Hello,
>> > >
>> > > First of all, thank you to Daniel, Robert and Jesse for their
>review on
>> > > this: https://issues.apache.org/jira/browse/BEAM-239
>> > >
>> > > A point that came up was using verbs explicitly for Transforms.
>> > > Here is the PR:
>https://github.com/apache/incubator-beam/pull/1164
>> > >
>> > > Posting it to help understand if we have a consensus for it and
>if yes,
>> > we
>> > > could perhaps document it for future changes.
>> > >
>> > > Thank you.
>> > >
>> > > --
>> > > Neelesh Srinivas Salian
>> > > Engineer
>> > >
>> >
>>