Re: [DISCUSS] Graduation to a top-level project

2016-12-08 Thread Jesse Anderson
Excellent!

On Thu, Dec 8, 2016 at 3:43 PM Davor Bonaci  wrote:

> A quick update: the Apache Incubator has adopted the proposed graduation
> resolution [1], and it is now presented to the ASF Board of Directors for
> their consideration.
>
> Davor
>
> [1]
>
> https://lists.apache.org/thread.html/71a1c63837a7d1506a10af9c70af1c24db988451ac5b53fa2467b9b8@%3Cgeneral.incubator.apache.org%3E
>
> On Mon, Dec 5, 2016 at 10:35 AM, Neelesh Salian 
> wrote:
>
> > Quite an interesting discussion. Looking forward to the graduation. :)
> > Thanks for putting this together.
> >
> > On Mon, Dec 5, 2016 at 10:30 AM, Davor Bonaci  wrote:
> >
> > > A quick update: the vote within the Incubator has been started [1].
> > >
> > > Davor
> > >
> > > [1]
> > > https://lists.apache.org/thread.html/a8e9cecfe93f0e464cc7c1774d2761
> > > ca14326df1101b7670ca8b1dc3@%3Cgeneral.incubator.apache.org%3E
> > >
> > > On Fri, Dec 2, 2016 at 11:40 AM, Davor Bonaci 
> wrote:
> > >
> > > > A quick update on the progress: the PPMC is nearly complete drafting
> > the
> > > > proposed resolution, and I've just kicked off the discussion within
> the
> > > > Incubator community [1].
> > > >
> > > > I'd encourage everyone to participate in the discussion and carry
> your
> > > > enthusiasm there. Thanks!
> > > >
> > > > Davor
> > > >
> > > > [1] https://lists.apache.org/thread.html/b9c1071b355588468368145
> > 75ada3c
> > > > dca61c72dc1e672ab994a9c936@%3Cgeneral.incubator.apache.org%3E
> > > >
> > > > On Thu, Nov 24, 2016 at 1:52 AM, Maximilian Michels 
> > > > wrote:
> > > >
> > > >> +1
> > > >>
> > > >> I see a healthy project which deserves to graduate.
> > > >>
> > > >> On Wed, Nov 23, 2016 at 6:03 PM, Davor Bonaci 
> > wrote:
> > > >> > Thanks everyone for the enthusiastic support!
> > > >> >
> > > >> > Please keep the thread going, as we kick off the process on
> private@
> > .
> > > >> > Please don’t forget to bring up any data points that might help
> > > >> strengthen
> > > >> > our case.
> > > >> >
> > > >> > Thanks!
> > > >> >
> > > >> > On Wed, Nov 23, 2016 at 8:45 AM, Scott Wegner
> > > >> 
> > > >> > wrote:
> > > >> >
> > > >> >> +1 (beaming)
> > > >> >>
> > > >> >> On Wed, Nov 23, 2016 at 8:25 AM Robert Bradshaw
> > > >> >> 
> > > >> >> wrote:
> > > >> >>
> > > >> >> +1
> > > >> >>
> > > >> >> On Wed, Nov 23, 2016 at 7:36 AM, Lukasz Cwik
> > >  > > >> >
> > > >> >> wrote:
> > > >> >> > +1
> > > >> >> >
> > > >> >> > On Wed, Nov 23, 2016 at 9:48 AM, Stephan Ewen <
> se...@apache.org>
> > > >> wrote:
> > > >> >> >
> > > >> >> >> +1
> > > >> >> >> The community if doing very well and behaving very Apache
> > > >> >> >>
> > > >> >> >> On Wed, Nov 23, 2016 at 9:54 AM, Etienne Chauchot <
> > > >> echauc...@gmail.com>
> > > >> >> >> wrote:
> > > >> >> >>
> > > >> >> >> > A big +1 of course, very excited to go forward
> > > >> >> >> >
> > > >> >> >> > Etienne
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> > Le 22/11/2016 à 19:19, Davor Bonaci a écrit :
> > > >> >> >> >
> > > >> >> >> >> Hi everyone,
> > > >> >> >> >> With all the progress we’ve had recently in Apache Beam, I
> > > think
> > > >> it
> > > >> >> is
> > > >> >> >> >> time
> > > >> >> >> >> we start the discussion about graduation as a new top-level
> > > >> project
> > > >> >> at
> > > >> >> >> the
> > > >> >> >> >> Apache Software Foundation.
> > > >> >> >> >>
> > > >> >> >> >> Graduation means we are a self-sustaining and
> self-governing
> > > >> >> community,
> > > >> >> >> >> and
> > > >> >> >> >> ready to be a full participant in the Apache Software
> > > >> Foundation. It
> > > >> >> >> does
> > > >> >> >> >> not imply that our community growth is complete or that a
> > > >> particular
> > > >> >> >> level
> > > >> >> >> >> of technical maturity has been reached, rather that we are
> > on a
> > > >> solid
> > > >> >> >> >> trajectory in those areas. After graduation, we will still
> > > >> >> periodically
> > > >> >> >> >> report to, and be overseen by, the ASF Board to ensure
> > > continued
> > > >> >> growth
> > > >> >> >> of
> > > >> >> >> >> a healthy community.
> > > >> >> >> >>
> > > >> >> >> >> Graduation is an important milestone for the project. It is
> > > also
> > > >> key
> > > >> >> to
> > > >> >> >> >> further grow the user community: many users (incorrectly)
> see
> > > >> >> incubation
> > > >> >> >> >> as
> > > >> >> >> >> a sign of instability and are much less likely to consider
> us
> > > >> for a
> > > >> >> >> >> production use.
> > > >> >> >> >>
> > > >> >> >> >> A way to think about graduation readiness is through the
> > Apache
> > > >> >> Maturity
> > > >> >> >> >> Model [1]. I think we clearly satisfy all the requirements
> > [2].
> > > >> It is
> > > >> >> >> >> probably worth emphasizing the recent community growth:
> over
> > > >> 

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jesse Anderson
I went through the string conversions. Do you have an example of writing
out XML/JSON/etc too?

On Tue, Nov 29, 2016 at 3:46 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Jesse,
>
>
> https://github.com/jbonofre/incubator-beam/tree/DATAFORMAT/sdks/java/extensions/dataformat
>
> it's very simple and stupid and of course not complete at all (I have
> other commits but not merged as they need some polishing), but as I
> said, it's a base of discussion.
>
> Regards
> JB
>
> On 11/29/2016 09:23 PM, Jesse Anderson wrote:
> > @jb Sounds good. Just let us know once you've pushed.
> >
> > On Tue, Nov 29, 2016 at 2:54 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >
> >> Good point Eugene.
> >>
> >> Right now, it's a DoFn collection to experiment a bit (a pure
> >> extension). It's pretty stupid ;)
> >>
> >> But, you are right, depending the direction of such extension, it could
> >> cover more use cases (even if it's not my first intention ;)).
> >>
> >> Let me push the branch (pretty small) as an illustration, and in the
> >> mean time, I'm preparing a document (more focused on the use cases).
> >>
> >> WDYT ?
> >>
> >> Regards
> >> JB
> >>
> >> On 11/29/2016 08:47 PM, Eugene Kirpichov wrote:
> >>> Hi JB,
> >>> Depending on the scope of what you want to ultimately accomplish with
> >> this
> >>> extension, I think it may make sense to write a proposal document and
> >>> discuss it.
> >>> If it's just a collection of utility DoFn's for various well-defined
> >>> source/target format pairs, then that's probably not needed, but if
> it's
> >>> anything more, then I think it is.
> >>> That will help avoid a lot of churn if people propose reasonable
> >>> significant changes.
> >>>
> >>> On Tue, Nov 29, 2016 at 11:15 AM Jean-Baptiste Onofré <j...@nanthrax.net
> >
> >>> wrote:
> >>>
> >>>> By the way Jesse, I gonna push my DATAFORMAT branch on my github and I
> >>>> will post on the dev mailing list when done.
> >>>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 11/29/2016 07:01 PM, Jesse Anderson wrote:
> >>>>> I want to bring this thread back up since we've had time to think
> about
> >>>> it
> >>>>> more and make a plan.
> >>>>>
> >>>>> I think a format-specific converter will be more time consuming task
> >> than
> >>>>> we originally thought. It'd have to be a writer that takes another
> >> writer
> >>>>> as a parameter.
> >>>>>
> >>>>> I think a string converter can be done as a simple transform.
> >>>>>
> >>>>> I think we should start with a simple string converter and plan for a
> >>>>> format-specific writer.
> >>>>>
> >>>>> What are your thoughts?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Jesse
> >>>>>
> >>>>> On Thu, Nov 10, 2016 at 10:33 AM Jesse Anderson <
> je...@smokinghand.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>> I was thinking about what the outputs would look like last night. I
> >>>>> realized that more complex formats like JSON and XML may or may not
> >>>> output
> >>>>> the data in a valid format.
> >>>>>
> >>>>> Doing a direct conversion on unbounded collections would work just
> >> fine.
> >>>>> They're self-contained. For writing out bounded collections, that's
> >> where
> >>>>> we'll hit the issues. This changes the uber conversion transform
> into a
> >>>>> transform that needs to be a writer.
> >>>>>
> >>>>> If a transform executes a JSON conversion on a per element basis,
> we'd
> >>>> get
> >>>>> this:
> >>>>> {
> >>>>> "key": "value"
> >>>>> }, {
> >>>>> "key": "value"
> >>>>> },
> >>>>>
> >>>>> That isn't valid JSON.
> >>>>>
> >>>>> The conversion transform would need to know do sev

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jesse Anderson
@jb Sounds good. Just let us know once you've pushed.

On Tue, Nov 29, 2016 at 2:54 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Good point Eugene.
>
> Right now, it's a DoFn collection to experiment a bit (a pure
> extension). It's pretty stupid ;)
>
> But, you are right, depending the direction of such extension, it could
> cover more use cases (even if it's not my first intention ;)).
>
> Let me push the branch (pretty small) as an illustration, and in the
> mean time, I'm preparing a document (more focused on the use cases).
>
> WDYT ?
>
> Regards
> JB
>
> On 11/29/2016 08:47 PM, Eugene Kirpichov wrote:
> > Hi JB,
> > Depending on the scope of what you want to ultimately accomplish with
> this
> > extension, I think it may make sense to write a proposal document and
> > discuss it.
> > If it's just a collection of utility DoFn's for various well-defined
> > source/target format pairs, then that's probably not needed, but if it's
> > anything more, then I think it is.
> > That will help avoid a lot of churn if people propose reasonable
> > significant changes.
> >
> > On Tue, Nov 29, 2016 at 11:15 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> >
> >> By the way Jesse, I gonna push my DATAFORMAT branch on my github and I
> >> will post on the dev mailing list when done.
> >>
> >> Regards
> >> JB
> >>
> >> On 11/29/2016 07:01 PM, Jesse Anderson wrote:
> >>> I want to bring this thread back up since we've had time to think about
> >> it
> >>> more and make a plan.
> >>>
> >>> I think a format-specific converter will be more time consuming task
> than
> >>> we originally thought. It'd have to be a writer that takes another
> writer
> >>> as a parameter.
> >>>
> >>> I think a string converter can be done as a simple transform.
> >>>
> >>> I think we should start with a simple string converter and plan for a
> >>> format-specific writer.
> >>>
> >>> What are your thoughts?
> >>>
> >>> Thanks,
> >>>
> >>> Jesse
> >>>
> >>> On Thu, Nov 10, 2016 at 10:33 AM Jesse Anderson <je...@smokinghand.com
> >
> >>> wrote:
> >>>
> >>> I was thinking about what the outputs would look like last night. I
> >>> realized that more complex formats like JSON and XML may or may not
> >> output
> >>> the data in a valid format.
> >>>
> >>> Doing a direct conversion on unbounded collections would work just
> fine.
> >>> They're self-contained. For writing out bounded collections, that's
> where
> >>> we'll hit the issues. This changes the uber conversion transform into a
> >>> transform that needs to be a writer.
> >>>
> >>> If a transform executes a JSON conversion on a per element basis, we'd
> >> get
> >>> this:
> >>> {
> >>> "key": "value"
> >>> }, {
> >>> "key": "value"
> >>> },
> >>>
> >>> That isn't valid JSON.
> >>>
> >>> The conversion transform would need to know do several things when
> >> writing
> >>> out a file. It would need to add brackets for an array. Now we have:
> >>> [
> >>> {
> >>> "key": "value"
> >>> }, {
> >>> "key": "value"
> >>> },
> >>> ]
> >>>
> >>> We still don't have valid JSON. We have to remove the last comma or
> have
> >>> the uber transform start putting in the commas, except for the last
> >> element.
> >>>
> >>> [
> >>> {
> >>> "key": "value"
> >>> }, {
> >>> "key": "value"
> >>> }
> >>> ]
> >>>
> >>> Only by doing this do we have valid JSON.
> >>>
> >>> I'd argue we'd have a similar issue with XML. Some parsers require a
> root
> >>> element for everything. The uber transform would have to put the root
> >>> element tags at the beginning and end of the file.
> >>>
> >>> On Wed, Nov 9, 2016 at 11:36 PM Manu Zhang <owenzhang1...@gmail.com>
> >> wrote:
> >>>
> >>> I would love to see a lean core and abundant Transforms at the same
&

Re: [DISCUSS] Graduation to a top-level project

2016-11-22 Thread Jesse Anderson
+1

On Tue, Nov 22, 2016 at 12:35 PM Frances Perry 
wrote:

> +1  You might even say I'm beaming with pride ;-)
>
> On Tue, Nov 22, 2016 at 11:58 AM, Kenneth Knowles 
> wrote:
>
> > +1 !!!
> >
> > I especially love how the diversity of the community has contributed to
> the
> > conceptual growth and quality of Beam. I can't wait for more!
> >
> > On Tue, Nov 22, 2016 at 11:22 AM, Thomas Groh 
> > wrote:
> >
> > > +1
> > >
> > > It's been a thrilling experience thus far, and I'm excited for the
> > future.
> > >
> > > On Tue, Nov 22, 2016 at 11:07 AM, Aljoscha Krettek <
> aljos...@apache.org>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > I'm quite enthusiastic about the growth of the community and the open
> > > > discussions!
> > > >
> > > > On Tue, 22 Nov 2016 at 19:51 Jason Kuster  > > invalid>
> > > > wrote:
> > > >
> > > > > An enthusiastic +1!
> > > > >
> > > > > In particular it's been really great to see the commitment and
> > interest
> > > > of
> > > > > the community in different kinds of testing. Between what we
> > currently
> > > > have
> > > > > on Jenkins and Travis and the in-progress work on IO integration
> > tests
> > > > and
> > > > > performance tests (plus, I'm sure, other things I'm not aware of)
> > we're
> > > > in
> > > > > a really good place.
> > > > >
> > > > > On Tue, Nov 22, 2016 at 10:49 AM, Amit Sela 
> > > > wrote:
> > > > >
> > > > > > +1, super exciting!
> > > > > >
> > > > > > Thanks to JB, Davor and the whole team for creating this
> > community. I
> > > > > think
> > > > > > we've achieved a lot in a short time.
> > > > > >
> > > > > > Amit.
> > > > > >
> > > > > > On Tue, Nov 22, 2016, 20:36 Tyler Akidau
> >  > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1, thanks to everyone who's invested time getting us to this
> > > point.
> > > > > :-)
> > > > > > >
> > > > > > > -Tyler
> > > > > > >
> > > > > > > On Tue, Nov 22, 2016 at 10:33 AM Jean-Baptiste Onofré <
> > > > j...@nanthrax.net
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > First of all, I would like to thank the whole team, and
> > > especially
> > > > > > Davor
> > > > > > > > for the great work and commitment to Apache and the
> community.
> > > > > > > >
> > > > > > > > Of course, a big +1 to move forward on graduation !
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > JB
> > > > > > > >
> > > > > > > > On 11/22/2016 07:19 PM, Davor Bonaci wrote:
> > > > > > > > > Hi everyone,
> > > > > > > > > With all the progress we’ve had recently in Apache Beam, I
> > > think
> > > > it
> > > > > > is
> > > > > > > > time
> > > > > > > > > we start the discussion about graduation as a new top-level
> > > > project
> > > > > > at
> > > > > > > > the
> > > > > > > > > Apache Software Foundation.
> > > > > > > > >
> > > > > > > > > Graduation means we are a self-sustaining and
> self-governing
> > > > > > community,
> > > > > > > > and
> > > > > > > > > ready to be a full participant in the Apache Software
> > > Foundation.
> > > > > It
> > > > > > > does
> > > > > > > > > not imply that our community growth is complete or that a
> > > > > particular
> > > > > > > > level
> > > > > > > > > of technical maturity has been reached, rather that we are
> > on a
> > > > > solid
> > > > > > > > > trajectory in those areas. After graduation, we will still
> > > > > > periodically
> > > > > > > > > report to, and be overseen by, the ASF Board to ensure
> > > continued
> > > > > > growth
> > > > > > > > of
> > > > > > > > > a healthy community.
> > > > > > > > >
> > > > > > > > > Graduation is an important milestone for the project. It is
> > > also
> > > > > key
> > > > > > to
> > > > > > > > > further grow the user community: many users (incorrectly)
> see
> > > > > > > incubation
> > > > > > > > as
> > > > > > > > > a sign of instability and are much less likely to consider
> us
> > > > for a
> > > > > > > > > production use.
> > > > > > > > >
> > > > > > > > > A way to think about graduation readiness is through the
> > Apache
> > > > > > > Maturity
> > > > > > > > > Model [1]. I think we clearly satisfy all the requirements
> > [2].
> > > > It
> > > > > is
> > > > > > > > > probably worth emphasizing the recent community growth:
> over
> > > each
> > > > > of
> > > > > > > the
> > > > > > > > > past three months, no single organization contributing to
> > Beam
> > > > has
> > > > > > had
> > > > > > > > more
> > > > > > > > > than ~50% of the unique contributors per month [2, see
> > > > > assumptions].
> > > > > > > > That’s
> > > > > > > > > a great statistic that shows how much we’ve grown our
> > > diversity!
> > > > > > > > >
> > > > > > > > > Process-wise, graduation consists of drafting a board
> > > resolution,
> > > > > > which
> > > > > > > > > needs to identify the full Project Management Committee,
> 

Re: PCollection to PCollection Conversion

2016-11-10 Thread Jesse Anderson
; to reside in the Beam repository at least for visibility reasons.
> > >
> > > One additional question is if these transforms represent a different
> DSL
> > or
> > > if those could be grouped with the current extensions (e.g. Join and
> > > SortValues) into something more general that we as a community could
> > > maintain, but well even if it is not the case, it would be really nice
> to
> > > start working on something like this.
> > >
> > > Ismaël Mejía​
> > >
> > >
> > > On Wed, Nov 9, 2016 at 11:59 AM, Jean-Baptiste Onofré <j...@nanthrax.net
> >
> > > wrote:
> > >
> > > > Related to spark-package, we also have Apache Bahir to host
> > > > connectors/transforms for Spark and Flink.
> > > >
> > > > IMHO, right now, Beam should host this, not sure if it makes sense
> > > > directly in the core.
> > > >
> > > > It reminds me the "Integration" DSL we discussed in the technical
> > vision
> > > > document.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 11/09/2016 11:17 AM, Amit Sela wrote:
> > > >
> > > >> I think Jesse has a very good point on one hand, while Luke's and
> > > >> Kenneth's
> > > >> worries about committing users to specific implementations is in
> > place.
> > > >>
> > > >> The Spark community has a 3rd party repository for useful libraries
> > that
> > > >> for various reasons are not a part of the Apache Spark project:
> > > >> https://spark-packages.org/.
> > > >>
> > > >> Maybe a "common-transformations" package would serve both users
> quick
> > > >> ramp-up and ease-of-use while keeping Beam more "enabling" ?
> > > >>
> > > >> On Tue, Nov 8, 2016 at 9:03 PM Kenneth Knowles
> <k...@google.com.invalid
> > >
> > > >> wrote:
> > > >>
> > > >> It seems useful for small scale debugging / demoing to have
> > > >>> Dump.toString(). I think it should be named to clearly indicate its
> > > >>> limited
> > > >>> scope. Maybe other stuff could go in the Dump namespace, but
> > > >>> "Dump.toJson()" would be for humans to read - so it should be
> pretty
> > > >>> printed, not treated as a machine-to-machine wire format.
> > > >>>
> > > >>> The broader question of representing data in JSON or XML, etc, is
> > > already
> > > >>> the subject of many mature libraries which are already easy to use
> > with
> > > >>> Beam.
> > > >>>
> > > >>> The more esoteric practice of implicit or semi-implicit coercions
> > seems
> > > >>> like it is also already addressed in many ways elsewhere.
> > > >>> Transform.via(TypeConverter) is basically the same as
> > > >>> MapElements.via() and also easy to use with Beam.
> > > >>>
> > > >>> In both of the last cases, there are many reasonable approaches,
> and
> > we
> > > >>> shouldn't commit our users to one of them.
> > > >>>
> > > >>> On Tue, Nov 8, 2016 at 10:15 AM, Lukasz Cwik
> > <lc...@google.com.invalid
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> The suggestions you give seem good except for the the XML cases.
> > > >>>>
> > > >>>> Might want to have the XML be a document per line similar to the
> > JSON
> > > >>>> examples you have been giving.
> > > >>>>
> > > >>>> On Tue, Nov 8, 2016 at 12:00 PM, Jesse Anderson <
> > > je...@smokinghand.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>> @lukasz Agreed there would have to be KV handling. I was more
> think
> > > >>>>>
> > > >>>> that
> > > >>>
> > > >>>> whatever the addition, it shouldn't just handle KV. It should
> handle
> > > >>>>> Iterables, Lists, Sets, and KVs.
> > > >>>>>
> > > >>>>> For JSON and XML, I wonder if we'd be able to give someone
> > something
> > > >>>>> 

Re: PCollection to PCollection Conversion

2016-11-08 Thread Jesse Anderson
@lukasz Agreed there would have to be KV handling. I was more think that
whatever the addition, it shouldn't just handle KV. It should handle
Iterables, Lists, Sets, and KVs.

For JSON and XML, I wonder if we'd be able to give someone something
general purpose enough that you would just end up writing your own code to
handle it anyway.

Here are some ideas on what it could look like with a method and the
resulting string output:
*Stringify.toJSON()*

With KV:
{"key": "value"}

With Iterables:
["one", "two", "three"]

*Stringify.toXML("rootelement")*

With KV:


With Iterables:

  one
  two
  three


*Stringify.toDelimited(",")*

With KV:
key,value

With Iterables:
one,two,three

Do you think that would strike a good balance between reusable code and
writing your own for more difficult formatting?

Thanks,

Jesse

On Tue, Nov 8, 2016 at 11:01 AM Lukasz Cwik <lc...@google.com.invalid>
wrote:

Jesse, I believe if one format gets special treatment in TextIO, people
will then ask why doesn't JSON, XML, ... also not supported.

Also, the example that you provide is using the fact that the input format
is an Iterable. You had posted a question about using KV with
TextIO.Write which wouldn't align with the proposed input format and still
would require to write a type conversion function, this time from KV to
Iterable instead of KV to string.

On Tue, Nov 8, 2016 at 9:50 AM, Jesse Anderson <je...@smokinghand.com>
wrote:

> Lukasz,
>
> I don't think you'd need complicated logic for TextIO.Write. For CSV the
> call would look like:
> Stringify.to("", ",", "\n");
>
> Where the arguments would be Stringify.to(prefix, delimiter, suffix).
>
> The code would be something like:
> StringBuffer buffer = new StringBuffer(prefix);
>
> for (Item item : list) {
>   buffer.append(item.toString());
>
>   if(notLast) {
> buffer.append(delimiter);
>   }
> }
>
> buffer.append(suffix);
>
> c.output(buffer.toString());
>
> That would allow you to do the basic CSV, TSV, and other formats without
> complicated logic. The same sort of thing could be done for TextIO.Write.
>
> Thanks,
>
> Jesse
>
> On Tue, Nov 8, 2016 at 10:30 AM Lukasz Cwik <lc...@google.com.invalid>
> wrote:
>
> > The conversion from object to string will have uses outside of just
> > TextIO.Write so it seems logical that we would want to have a ParDo do
> the
> > conversion.
> >
> > Text file formats have a lot of variance, even if you consider the
subset
> > of CSV like formats where it could have fixed width fields, or escaping
> and
> > quoting around other fields, or headers that should be placed at the
top.
> >
> > Having all these format conversions within TextIO.Write seems like a lot
> of
> > logic to contain in that transform which should just focus on writing to
> > files.
> >
> > On Tue, Nov 8, 2016 at 8:15 AM, Jesse Anderson <je...@smokinghand.com>
> > wrote:
> >
> > > This is a thread moved over from the user mailing list.
> > >
> > > I think there needs to be a way to convert a PCollection to
> > > PCollection Conversion.
> > >
> > > To do a minimal WordCount, you have to manually convert the KV to a
> > String:
> > > p
> > > .apply(TextIO.Read.from("playing_cards.tsv"))
> > > .apply(Regex.split("\\W+"))
> > > .apply(Count.perElement())
> > > *.apply(MapElements.via((KV<String, Long> count) ->*
> > > *count.getKey() + ":" + count.getValue()*
> > > *).withOutputType(TypeDescriptors.strings()))*
> > > .apply(TextIO.Write.to("output/stringcounts"));
> > >
> > > This code really should be something like:
> > > p
> > > .apply(TextIO.Read.from("playing_cards.tsv"))
> > > .apply(Regex.split("\\W+"))
> > > .apply(Count.perElement())
> > > *.apply(ToString.stringify())*
> > > .apply(TextIO.Write.to("output/stringcounts"));
> > >
> > > To summarize the discussion:
> > >
> > >- JA: Add a method to StringDelegateCoder to output any KV or list
> > >- JA and DH: Add a SimpleFunction that takes an type and runs
> > toString()
> > >on it:
> > >class ToStringFn extends SimpleFunction<InputT, String> {
> > >public static String apply(

Re: PCollection to PCollection Conversion

2016-11-08 Thread Jesse Anderson
Lukasz,

I don't think you'd need complicated logic for TextIO.Write. For CSV the
call would look like:
Stringify.to("", ",", "\n");

Where the arguments would be Stringify.to(prefix, delimiter, suffix).

The code would be something like:
StringBuffer buffer = new StringBuffer(prefix);

for (Item item : list) {
  buffer.append(item.toString());

  if(notLast) {
buffer.append(delimiter);
  }
}

buffer.append(suffix);

c.output(buffer.toString());

That would allow you to do the basic CSV, TSV, and other formats without
complicated logic. The same sort of thing could be done for TextIO.Write.

Thanks,

Jesse

On Tue, Nov 8, 2016 at 10:30 AM Lukasz Cwik <lc...@google.com.invalid>
wrote:

> The conversion from object to string will have uses outside of just
> TextIO.Write so it seems logical that we would want to have a ParDo do the
> conversion.
>
> Text file formats have a lot of variance, even if you consider the subset
> of CSV like formats where it could have fixed width fields, or escaping and
> quoting around other fields, or headers that should be placed at the top.
>
> Having all these format conversions within TextIO.Write seems like a lot of
> logic to contain in that transform which should just focus on writing to
> files.
>
> On Tue, Nov 8, 2016 at 8:15 AM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > This is a thread moved over from the user mailing list.
> >
> > I think there needs to be a way to convert a PCollection to
> > PCollection Conversion.
> >
> > To do a minimal WordCount, you have to manually convert the KV to a
> String:
> > p
> > .apply(TextIO.Read.from("playing_cards.tsv"))
> > .apply(Regex.split("\\W+"))
> > .apply(Count.perElement())
> > *.apply(MapElements.via((KV<String, Long> count) ->*
> > *count.getKey() + ":" + count.getValue()*
> > *).withOutputType(TypeDescriptors.strings()))*
> > .apply(TextIO.Write.to("output/stringcounts"));
> >
> > This code really should be something like:
> > p
> > .apply(TextIO.Read.from("playing_cards.tsv"))
> > .apply(Regex.split("\\W+"))
> > .apply(Count.perElement())
> > *.apply(ToString.stringify())*
> > .apply(TextIO.Write.to("output/stringcounts"));
> >
> > To summarize the discussion:
> >
> >- JA: Add a method to StringDelegateCoder to output any KV or list
> >- JA and DH: Add a SimpleFunction that takes an type and runs
> toString()
> >on it:
> >class ToStringFn extends SimpleFunction<InputT, String> {
> >public static String apply(InputT input) {
> >return input.toString();
> >}
> >}
> >- JB: Add a general purpose type converter like in Apache Camel.
> >- JA: Add Object support to TextIO.Write that would write out the
> >toString of any Object.
> >
> > My thoughts:
> >
> > Is converting to a PCollection mostly needed when you're using
> > TextIO.Write? Will a general purpose transform only work in certain cases
> > and you'll normally have to write custom code format the strings the way
> > you want them?
> >
> > IMHO, it's yes to both. I'd prefer to add Object support to TextIO.Write
> or
> > a SimpleFunction that takes a delimiter as an argument. Making a
> > SimpleFunction that's able to specify a delimiter (and perhaps a prefix
> and
> > suffix) should cover the majority of formats and cases.
> >
> > Thanks,
> >
> > Jesse
> >
>


PCollection to PCollection Conversion

2016-11-08 Thread Jesse Anderson
This is a thread moved over from the user mailing list.

I think there needs to be a way to convert a PCollection to
PCollection Conversion.

To do a minimal WordCount, you have to manually convert the KV to a String:
p
.apply(TextIO.Read.from("playing_cards.tsv"))
.apply(Regex.split("\\W+"))
.apply(Count.perElement())
*.apply(MapElements.via((KV count) ->*
*count.getKey() + ":" + count.getValue()*
*).withOutputType(TypeDescriptors.strings()))*
.apply(TextIO.Write.to("output/stringcounts"));

This code really should be something like:
p
.apply(TextIO.Read.from("playing_cards.tsv"))
.apply(Regex.split("\\W+"))
.apply(Count.perElement())
*.apply(ToString.stringify())*
.apply(TextIO.Write.to("output/stringcounts"));

To summarize the discussion:

   - JA: Add a method to StringDelegateCoder to output any KV or list
   - JA and DH: Add a SimpleFunction that takes an type and runs toString()
   on it:
   class ToStringFn extends SimpleFunction {
   public static String apply(InputT input) {
   return input.toString();
   }
   }
   - JB: Add a general purpose type converter like in Apache Camel.
   - JA: Add Object support to TextIO.Write that would write out the
   toString of any Object.

My thoughts:

Is converting to a PCollection mostly needed when you're using
TextIO.Write? Will a general purpose transform only work in certain cases
and you'll normally have to write custom code format the strings the way
you want them?

IMHO, it's yes to both. I'd prefer to add Object support to TextIO.Write or
a SimpleFunction that takes a delimiter as an argument. Making a
SimpleFunction that's able to specify a delimiter (and perhaps a prefix and
suffix) should cover the majority of formats and cases.

Thanks,

Jesse


Re: [DISCUSS] Using Verbs for Transforms

2016-11-01 Thread Jesse Anderson
;modifications between choices A and B doesn't necessarily end with
> >> >a
> >> >>> >decision A or B -- a single (qualified) -1 vote is a veto and
> >> >cannot be
> >> >>> >overridden [1]. Said differently, the guideline is that code
> >> >changes
> >> >>> >should
> >> >>> >be made by consensus; not by one group outvoting another. I'd like
> >> >to
> >> >>> >avoid
> >> >>> >setting such precedent; we should try to drive consensus, as
> >> >opposed to
> >> >>> >attempting to outvote another part of the community.
> >> >>> >
> >> >>> >In this particular case, we have had a great discussion. Many
> >> >>> >contributors
> >> >>> >brought different perspectives. Consequently, some opinions have
> >> >been
> >> >>> >likely changed. At this point, someone should summarize the
> >> >arguments,
> >> >>> >try
> >> >>> >to critique them from a neutral standpoint, and suggest a refined
> >> >>> >proposal
> >> >>> >that takes these perspectives into account. If nobody objects in a
> >> >>> >short
> >> >>> >time, we should consider this decided. [ I can certainly help here,
> >> >but
> >> >>> >I'd
> >> >>> >love to see somebody else do it! ]
> >> >>> >
> >> >>> >[1] http://www.apache.org/foundation/voting.html
> >> >>> >
> >> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
> >> >>> ><bchamb...@google.com.invalid>
> >> >>> >wrote:
> >> >>> >
> >> >>> >> I also like Distinct since it doesn't make it sound like it
> >> >modifies
> >> >>> >any
> >> >>> >> underlying collection. RemoveDuplicates makes it sound like the
> >> >>> >duplicates
> >> >>> >> are removed, rather than a new PCollection without duplicates
> >> >being
> >> >>> >> returned.
> >> >>> >>
> >> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré
> >> ><j...@nanthrax.net>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >> > Agree. It was more a transition proposal.
> >> >>> >> >
> >> >>> >> > Regards
> >> >>> >> > JB
> >> >>> >> >
> >> >>> >> > ⁣
> >> >>> >> >
> >> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> >> >>> >> > <rober...@google.com.INVALID> wrote:
> >> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> >> >>> >> > ><j...@nanthrax.net> wrote:
> >> >>> >> > >> And what about use RemoveDuplicates and create an alias
> >> >Distinct
> >> >>> >?
> >> >>> >> > >
> >> >>> >> > >I'd really like to avoid (long term) aliases--you end up
> >> >having to
> >> >>> >> > >document (and maintain) them both, and it adds confusion as to
> >> >>> >which
> >> >>> >> > >one to use (especially if they every diverge), and means
> >> >searching
> >> >>> >for
> >> >>> >> > >one or the other yields half the results.
> >> >>> >> > >
> >> >>> >> > >> It doesn't break the API and would address both SQL users
> >> >and
> >> >>> >more
> >> >>> >> > >"big data" users.
> >> >>> >> > >>
> >> >>> >> > >> My $0.01 ;)
> >> >>> >> > >>
> >> >>> >> > >> Regards
> >> >>> >> > >> JB
> >> >>> >> > >>
> >> >>> >> > >> ⁣
> >> >>> >> > >>
> >> >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> >> >>&g

Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Jesse Anderson
 them both, and it adds confusion as to
> > >which
> > >> > >one to use (especially if they every diverge), and means searching
> > >for
> > >> > >one or the other yields half the results.
> > >> > >
> > >> > >> It doesn't break the API and would address both SQL users and
> > >more
> > >> > >"big data" users.
> > >> > >>
> > >> > >> My $0.01 ;)
> > >> > >>
> > >> > >> Regards
> > >> > >> JB
> > >> > >>
> > >> > >> ⁣
> > >> > >>
> > >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> > >> > ><dhalp...@google.com.INVALID> wrote:
> > >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
> > >> > >>>preference:
> > >> > >>>
> > >> > >>>1. Keep `RemoveDuplicates` name, ensure that important keywords
> > >are
> > >> > >in
> > >> > >>>the
> > >> > >>>Javadoc. This reduces churn on our users and is honestly pretty
> > >dang
> > >> > >>> descriptive.
> > >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user and
> > >> > >likely
> > >> > >>>less clear otherwise. This is a backwards-incompatible API
> > >change, so
> > >> > >>>we
> > >> > >>>should do it before we go stable.
> > >> > >>>
> > >> > >>>I am not super strong that 1 > 2, but I am very strong that
> > >> > >"Distinct"
> > >> > >>>>>>
> > >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
> > >> > >>>
> > >> > >>>Dan
> > >> > >>>
> > >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
> > >> > >>><k...@google.com.invalid>
> > >> > >>>wrote:
> > >> > >>>
> > >> > >>>> The precedent that we use verbs has many exceptions. We have
> > >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I
> > >would
> > >> > >even
> > >> > >>>> include Sum (at least when I read it).
> > >> > >>>>
> > >> > >>>> Historical note: the predilection towards verbs is from the
> > >Google
> > >> > >>>Style
> > >> > >>>> Guide for Java method names
> > >> > >>>>
> > >> > >>><https://google.github.io/styleguide/javaguide.html#s5.
> > >> 2.3-method-names
> > >> > >,
> > >> > >>>> which states "Method names are typically verbs or verb
> > >phrases".
> > >> > >But
> > >> > >>>even
> > >> > >>>> in Google code there are lots of exceptions when it makes
> > >sense,
> > >> > >like
> > >> > >>>> Guava's
> > >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the
> > >entire
> > >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google code.
> > >I
> > >> > >>>suggest we
> > >> > >>>> use our judgment rather than a policy.
> > >> > >>>>
> > >> > >>>> I think "Distinct" is one of those exceptions. It is a
> > >standard
> > >> > >>>widespread
> > >> > >>>> name and also reads better as an adjective. I prefer it, but
> > >also
> > >> > >>>don't
> > >> > >>>> care strongly enough to change it or to change it back :-)
> > >> > >>>>
> > >> > >>>> If we must have a verb, I like it as-is more than MakeDistinct
> > >and
> > >> > >>>> AvoidDuplicate.
> > >> > >>>>
> > >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
> > >> > >>><je...@smokinghand.com>
> > >> > >>>> wrote:
> > >> > >>>>
> > >> > >>>> > My original thought for this change was that Crunch uses the
> > >> > >class
> > >> > >>>name
> > >> > >>>> > Distinct. SQL also uses the keyword distinct.
> > >> > >>>> >
> > >> > >>>> > Maybe the rule should be changed to adjectives or verbs
> > >depending
> > >> > >>>on the
> > >> > >>>> > context.
> > >> > >>>> >
> > >> > >>>> > Using a verb to describe this class really doesn't connote
> > >what
> > >> > >the
> > >> > >>>class
> > >> > >>>> > does as succinctly as the adjective.
> > >> > >>>> >
> > >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
> > >> > >>><nsal...@cloudera.com>
> > >> > >>>> > wrote:
> > >> > >>>> >
> > >> > >>>> > > Hello,
> > >> > >>>> > >
> > >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse for
> > >their
> > >> > >>>review on
> > >> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239
> > >> > >>>> > >
> > >> > >>>> > > A point that came up was using verbs explicitly for
> > >Transforms.
> > >> > >>>> > > Here is the PR:
> > >> > >>>https://github.com/apache/incubator-beam/pull/1164
> > >> > >>>> > >
> > >> > >>>> > > Posting it to help understand if we have a consensus for
> > >it and
> > >> > >>>if yes,
> > >> > >>>> > we
> > >> > >>>> > > could perhaps document it for future changes.
> > >> > >>>> > >
> > >> > >>>> > > Thank you.
> > >> > >>>> > >
> > >> > >>>> > > --
> > >> > >>>> > > Neelesh Srinivas Salian
> > >> > >>>> > > Engineer
> > >> > >>>> > >
> > >> > >>>> >
> > >> > >>>>
> > >> >
> > >>
> >
>


Re: Can we have more quick start examples ?

2016-10-27 Thread Jesse Anderson
Those tutorials help. I was going through the example code and had the same
thought. We need to take a pass through the examples and remove some of the
Google Cloud dependencies.

On Thu, Oct 27, 2016, 5:13 PM Thomas Weise  wrote:

> The Beam tutorials seem to address this:
>
> https://github.com/eljefe6a/beamexample/blob/master/README.md
>
>
> On Thu, Oct 27, 2016 at 8:04 AM, Manu Zhang 
> wrote:
>
> > Hey guys,
> >
> > I find Beam examples under the examples folder are not easy to run due to
> > dependency on Google specific services. Even the MinimalWordCount
> >  >
> examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java
> > >
> > requires
> > input and output to be on Google Cloud Storage. Others like
> > WindowedWordCount
> >  > examples/java/src/main/java/org/apache/beam/examples/
> > WindowedWordCount.java>
> > require
> > BigQuery.  I wouldn't expect newcomers to tweak IO themselves.
> >
> > Can we have more quick start examples that can be run anywhere ?
> >
> > Thanks,
> > Manu Zhang
> >
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-26 Thread Jesse Anderson
A recap of options for RemoveDuplicates:

   - Leave the name as is and update the JavaDocs
   - Rename to Distinct
   - Rename to MakeDistinct
   - Rename to Deduplicate



On Wed, Oct 26, 2016 at 8:10 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> OK. No problem.
>
> Regards
> JB
>
> ⁣​
>
> On Oct 26, 2016, 07:56, at 07:56, Kenneth Knowles <k...@google.com.INVALID>
> wrote:
> >To be clear: I am not saying that I think the discussion has concluded.
> >I
> >think we should give some more time for different time zone rotations
> >to
> >occur. I just meant to say that if it does come to a vote, I'd prefer
> >to
> >keep it focused rather than generalizing.
> >
> >On Tue, Oct 25, 2016 at 10:51 PM Kenneth Knowles <k...@google.com>
> >wrote:
> >
> >> I'd prefer to keep the vote focused on this rename, not a general
> >policy.
> >>
> >> On Tue, Oct 25, 2016 at 10:26 PM Jean-Baptiste Onofré
> ><j...@nanthrax.net>
> >> wrote:
> >>
> >> Yes I would start a formal vote with the three proposals: descriptive
> >> verb, adjective, verbs + adjective.
> >>
> >> Regards
> >> JB
> >>
> >> ⁣​
> >>
> >> On Oct 26, 2016, 07:16, at 07:16, Jesse Anderson
> ><je...@smokinghand.com>
> >> wrote:
> >> >We need to make a decision on this so Neelesh can finish his commit.
> >> >Should
> >> >we take a vote or something?
> >> >
> >> >On Tue, Oct 25, 2016, 7:55 AM Jean-Baptiste Onofré <j...@nanthrax.net>
> >> >wrote:
> >> >
> >> >> Sounds good to me.
> >> >>
> >> >> ⁣​
> >> >>
> >> >> On Oct 24, 2016, 19:11, at 19:11, je...@smokinghand.com wrote:
> >> >> >I prefer MakeDistinct if we have to make it a verb.
> >> >>
> >>
> >>
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-25 Thread Jesse Anderson
We need to make a decision on this so Neelesh can finish his commit. Should
we take a vote or something?

On Tue, Oct 25, 2016, 7:55 AM Jean-Baptiste Onofré  wrote:

> Sounds good to me.
>
> ⁣​
>
> On Oct 24, 2016, 19:11, at 19:11, je...@smokinghand.com wrote:
> >I prefer MakeDistinct if we have to make it a verb.
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-24 Thread Jesse Anderson
That's how the mainframe programmers I've dealt with refer to it. I agree
with Dan. We should either not change the name or change it to Distinct.
It's just not worth the effort otherwise.

On Mon, Oct 24, 2016, 3:10 PM Eugene Kirpichov <kirpic...@google.com.invalid>
wrote:

> $0.02: Deduplicate? (lends to extensions like Deduplicate.by(some key
> extractor function))
>
> On Mon, Oct 24, 2016 at 1:22 PM Dan Halperin <dhalp...@google.com.invalid>
> wrote:
>
> > I find "MakeDistinct" more confusing. My votes in decreasing preference:
> >
> > 1. Keep `RemoveDuplicates` name, ensure that important keywords are in
> the
> > Javadoc. This reduces churn on our users and is honestly pretty dang
> >  descriptive.
> > 2. Rename to `Distinct`, which is clear if you're a SQL user and likely
> > less clear otherwise. This is a backwards-incompatible API change, so we
> > should do it before we go stable.
> >
> > I am not super strong that 1 > 2, but I am very strong that "Distinct"
> >>>
> > "MakeDistinct" or and "RemoveDuplicates" >>> "AvoidDuplicate".
> >
> > Dan
> >
> > On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles <k...@google.com.invalid
> >
> > wrote:
> >
> > > The precedent that we use verbs has many exceptions. We have
> > > ApproximateQuantiles, Values, Keys, WithTimestamps, and I would even
> > > include Sum (at least when I read it).
> > >
> > > Historical note: the predilection towards verbs is from the Google
> Style
> > > Guide for Java method names
> > > <
> https://google.github.io/styleguide/javaguide.html#s5.2.3-method-names
> > >,
> > > which states "Method names are typically verbs or verb phrases". But
> even
> > > in Google code there are lots of exceptions when it makes sense, like
> > > Guava's
> > > Iterables.any(), Iterables.all(), Iterables.toArray(), the entire
> > > Predicates module, etc. Just an aside; Beam isn't Google code. I
> suggest
> > we
> > > use our judgment rather than a policy.
> > >
> > > I think "Distinct" is one of those exceptions. It is a standard
> > widespread
> > > name and also reads better as an adjective. I prefer it, but also don't
> > > care strongly enough to change it or to change it back :-)
> > >
> > > If we must have a verb, I like it as-is more than MakeDistinct and
> > > AvoidDuplicate.
> > >
> > > On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson <je...@smokinghand.com>
> > > wrote:
> > >
> > > > My original thought for this change was that Crunch uses the class
> name
> > > > Distinct. SQL also uses the keyword distinct.
> > > >
> > > > Maybe the rule should be changed to adjectives or verbs depending on
> > the
> > > > context.
> > > >
> > > > Using a verb to describe this class really doesn't connote what the
> > class
> > > > does as succinctly as the adjective.
> > > >
> > > > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian <nsal...@cloudera.com
> >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > First of all, thank you to Daniel, Robert and Jesse for their
> review
> > on
> > > > > this: https://issues.apache.org/jira/browse/BEAM-239
> > > > >
> > > > > A point that came up was using verbs explicitly for Transforms.
> > > > > Here is the PR: https://github.com/apache/incubator-beam/pull/1164
> > > > >
> > > > > Posting it to help understand if we have a consensus for it and if
> > yes,
> > > > we
> > > > > could perhaps document it for future changes.
> > > > >
> > > > > Thank you.
> > > > >
> > > > > --
> > > > > Neelesh Srinivas Salian
> > > > > Engineer
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Using Verbs for Transforms

2016-10-24 Thread Jesse Anderson
My original thought for this change was that Crunch uses the class name
Distinct. SQL also uses the keyword distinct.

Maybe the rule should be changed to adjectives or verbs depending on the
context.

Using a verb to describe this class really doesn't connote what the class
does as succinctly as the adjective.

On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian  wrote:

> Hello,
>
> First of all, thank you to Daniel, Robert and Jesse for their review on
> this: https://issues.apache.org/jira/browse/BEAM-239
>
> A point that came up was using verbs explicitly for Transforms.
> Here is the PR: https://github.com/apache/incubator-beam/pull/1164
>
> Posting it to help understand if we have a consensus for it and if yes, we
> could perhaps document it for future changes.
>
> Thank you.
>
> --
> Neelesh Srinivas Salian
> Engineer
>


Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Jesse Anderson
Thanks for the welcomes everyone!

On Fri, Oct 21, 2016 at 4:02 PM Mark Liu <mark...@google.com.invalid> wrote:

> Congrats for all of you!
>
> Mark
>
> On Fri, Oct 21, 2016 at 3:34 PM, Kenneth Knowles <k...@google.com.invalid>
> wrote:
>
> > Huzzah!
> >
> > I've personally enjoyed working together, and I am glad to extend this
> > acknowledgement and welcome this addition to the Beam community.
> >
> > Kenn
> >
> > On Fri, Oct 21, 2016 at 3:18 PM Davor Bonaci <da...@apache.org> wrote:
> >
> > > Hi everyone,
> > > Please join me and the rest of Beam PPMC in welcoming the following
> > > contributors as our newest committers. They have significantly
> > contributed
> > > to the project in different ways, and we look forward to many more
> > > contributions in the future.
> > >
> > > * Thomas Weise
> > > Thomas authored the Apache Apex runner for Beam [1]. This is an
> exciting
> > > new runner that opens a new user base. It is a large contribution,
> which
> > > starts the whole new component with a great potential.
> > >
> > > * Jesse Anderson
> > > Jesse has contributed significantly by promoting Beam. He has
> > co-developed
> > > a Beam tutorial and delivered it at a top big data conference. He
> > published
> > > several blog posts positioning Beam, Q with the Apache Beam team,
> and a
> > > demo video how to run Beam on multiple runners [2]. On the side, he has
> > > authored 7 pull requests and reported 6 JIRA issues.
> > >
> > > * Thomas Groh
> > > Since starting incubation, Thomas has contributed the most commits to
> the
> > > project [3], a total of 226 commits, which is more than anybody else.
> He
> > > has contributed broadly to the project, most significantly by
> developing
> > > from scratch the DirectRunner that supports the full model semantics.
> > > Additionally, he has contributed a new set of APIs for testing
> unbounded
> > > pipelines. He published a blog highlighting this work.
> > >
> > > Congratulations to all three! Welcome!
> > >
> > > Davor
> > >
> > > [1] https://github.com/apache/incubator-beam/tree/apex-runner
> > > [2] http://www.smokinghand.com/
> > > [3] https://github.com/apache/incubator-beam/graphs/contributors
> > > ?from=2016-02-01=2016-10-14=c
> > >
> >
>


Re: Start of release 0.3.0-incubating

2016-10-20 Thread Jesse Anderson
+1 to Davor's. I'd really like to see an 0.3.0 release because there have
been big API changes between 0.2.0 and 0.3.0 like the DoFN changes. It'd be
nice to stop pointing people to HEAD and back to a release.

On Thu, Oct 20, 2016 at 10:17 AM Davor Bonaci 
wrote:

> It's been a while since the last release, and I think we have accumulated
> plenty of improvements across the board [1]. There are new IOs to be
> released, performance improvements, and a ton of fixes.
>
> As a general principle, I'm always advocating for delaying releases when
> there are outstanding bug fixes. For new features, however, I'm usually on
> the fence. It happens sometimes that new features are rushed to make a
> release, then we discover important issues later on, and sometimes regret
> the decision.
>
> Of course, UnboundedSource for the SparkRunner and MqttIo would be
> additional great improvements, and we should get that out to our users as
> soon as possible too.
>
> In this particular case, I think it is perfectly reasonable either to:
> * try to get 0.3.0 out now and follow it quickly with 0.4.0, as soon as
> these improvements are ready, or
> * delay the release, but with a specific time box of a few days.
>
> I'd give some preference to the first option now, since it is important to
> keep a cadence of releases during incubation and build experience with the
> process. If we were post-graduation, I'd almost certainly give a preference
> to the second approach.
>
> Davor
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12338051
>
> On Thu, Oct 20, 2016 at 9:32 AM, Amit Sela  wrote:
>
> > +1
> >
> > I would like to have my standing PRs merged please - they should provide
> > support for UnboundedSource for the SparkRunner.
> > If it won't be ready for merge at the beginning of next week, don't hold
> > for me.
> >
> > Thanks,
> > Amit
> >
> > On Thu, Oct 20, 2016 at 7:27 PM Jean-Baptiste Onofré 
> > wrote:
> >
> > > +1
> > >
> > > Thanks Aljosha !!
> > >
> > > Do you mind to wait the week end or Monday to start the release ? I
> would
> > > like to include MqttIO if possible.
> > >
> > > Thanks !
> > > Regards
> > > JB
> > >
> > > ⁣​
> > >
> > > On Oct 20, 2016, 18:07, at 18:07, Dan Halperin
> > 
> > > wrote:
> > > >On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> > > >
> > > > wrote:
> > > >
> > > >> Hi,
> > > >> thanks for taking the time and writing this extensive doc!
> > > >>
> > > >> If no-one is against this I would like to be the release manager for
> > > >the
> > > >> next (0.3.0-incubating) release. I would work with the guide and
> > > >update it
> > > >> with anything that I learn along the way. Should I open a new thread
> > > >for
> > > >> this or is it ok of nobody objects here?
> > > >>
> > > >> Cheers,
> > > >> Aljoscha
> > > >>
> > > >
> > > >Spinning this out as a separate thread.
> > > >
> > > >+1 -- Sounds great to me!
> > > >
> > > >Dan
> > > >
> > > >On Thu, Oct 20, 2016 at 12:37 AM, Aljoscha Krettek
> > > >
> > > >wrote:
> > > >
> > > >> Hi,
> > > >> thanks for taking the time and writing this extensive doc!
> > > >>
> > > >> If no-one is against this I would like to be the release manager for
> > > >the
> > > >> next (0.3.0-incubating) release. I would work with the guide and
> > > >update it
> > > >> with anything that I learn along the way. Should I open a new thread
> > > >for
> > > >> this or is it ok of nobody objects here?
> > > >>
> > > >> Cheers,
> > > >> Aljoscha
> > > >>
> > > >> On Thu, 20 Oct 2016 at 07:10 Jean-Baptiste Onofré 
> > > >wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > well done.
> > > >> >
> > > >> > As already discussed, it looks good to me ;)
> > > >> >
> > > >> > Regards
> > > >> > JB
> > > >> >
> > > >> > On 10/20/2016 01:24 AM, Davor Bonaci wrote:
> > > >> > > Hi everybody,
> > > >> > > As a project, I think we should have a Release Guide to document
> > > >the
> > > >> > > process, have consistent releases, on-board additional release
> > > >> managers,
> > > >> > > and generally share knowledge. It is also one of the project
> > > >graduation
> > > >> > > guidelines.
> > > >> > >
> > > >> > > Dan and I wrote a draft version, documenting the process we did
> > > >for the
> > > >> > > first two releases. It is currently in a pull request [1]. I'd
> > > >invite
> > > >> > > everyone interested to take a peek and comment, either on the
> > > >pull
> > > >> > request
> > > >> > > itself or here on mailing list, as appropriate.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Davor
> > > >> > >
> > > >> > > [1] https://github.com/apache/incubator-beam-site/pull/49
> > > >> > >
> > > >> >
> > > >> > --
> > > >> > Jean-Baptiste Onofré
> > > >> > jbono...@apache.org
> > > >> > http://blog.nanthrax.net
> > > >> > Talend - http://www.talend.com
> > > >> >
> > > >>
> > >
> >
>


Re: Exploring Performance Testing

2016-10-18 Thread Jesse Anderson
@Dan before starting with Beam, I'd want to know how much performance I've
giving up by not programming directly to the API.

On Tue, Oct 18, 2016 at 10:03 AM Dan Halperin <dhalp...@google.com.invalid>
wrote:

> I think there are lots of excellent one-off performance studies, but I'm
> not sure how useful that is to Beam.
>
> From a test infra point of view, I'm wondering more about tracking of
> performance over time, identifying regressions, etc.
>
> Google has some tools like PerfKit
> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker> which is
> basically a skin on a database + some scripts to load and query data; but I
> don't love it. Do other Apache projects do public, long-term benchmarking
> and performance regression testing?
>
> Dan
>
> On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > I found data Artisan's benchmarking post
> > <http://data-artisans.com/high-throughput-low-latency-and-
> > exactly-once-stream-processing-with-apache-flink/>.
> > They also shared the code <https://github.com/dataArtisans/performance>.
> I
> > didn't dig in much, but they did a wide range of algorithms. They have
> the
> > native code, so you write the Beam code and check against the native
> > performance.
> >
> > On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari
> > <amirto...@yahoo.com.invalid>
> > wrote:
> >
> > > Hi Jason,I have been busy bench-marking Flink Cluster (Spark next)
> under
> > > Beam.I can share my experience. Can you list items of interest to know
> > so I
> > > can answer them to the best of my knowledge.Cheers
> > >
> > >   From: Jason Kuster <jasonkus...@google.com.INVALID>
> > >  To: dev@beam.incubator.apache.org
> > >  Sent: Monday, October 17, 2016 5:06 PM
> > >  Subject: Exploring Performance Testing
> > >
> > > Hey all,
> > >
> > > Now that we've covered some of the initial ground with regard to
> > > correctness testing, I'm going to be starting work on performance
> testing
> > > and benchmarking. I wanted to reach out and see what people's
> experiences
> > > have been with performance testing and benchmarking
> > > frameworks, particularly in other Apache projects. Anyone have any
> > > experience or thoughts?
> > >
> > > Best,
> > >
> > > Jason
> > >
> > > --
> > > ---
> > > Jason Kuster
> > > Apache Beam (Incubating) / Google Cloud Dataflow
> > >
> > >
> > >
> >
>


Re: Exploring Performance Testing

2016-10-18 Thread Jesse Anderson
I found data Artisan's benchmarking post
.
They also shared the code . I
didn't dig in much, but they did a wide range of algorithms. They have the
native code, so you write the Beam code and check against the native
performance.

On Mon, Oct 17, 2016 at 5:14 PM amir bahmanyari 
wrote:

> Hi Jason,I have been busy bench-marking Flink Cluster (Spark next) under
> Beam.I can share my experience. Can you list items of interest to know so I
> can answer them to the best of my knowledge.Cheers
>
>   From: Jason Kuster 
>  To: dev@beam.incubator.apache.org
>  Sent: Monday, October 17, 2016 5:06 PM
>  Subject: Exploring Performance Testing
>
> Hey all,
>
> Now that we've covered some of the initial ground with regard to
> correctness testing, I'm going to be starting work on performance testing
> and benchmarking. I wanted to reach out and see what people's experiences
> have been with performance testing and benchmarking
> frameworks, particularly in other Apache projects. Anyone have any
> experience or thoughts?
>
> Best,
>
> Jason
>
> --
> ---
> Jason Kuster
> Apache Beam (Incubating) / Google Cloud Dataflow
>
>
>


Re: Documentation for IDE setup

2016-10-17 Thread Jesse Anderson
That was the compilation error I got with Eclipse too. Thanks for
sorting it out.

On 10/17/16, Daniel Kulp <dk...@apache.org> wrote:
> Just a follow up based on some discoveries while trying to rebase my branch
> on master this morning.
>
> Eclipse JDT outputs methods/fields into class files in a different order
> than Oracle compiler.   That’s perfectly acceptable from a “binary
> compatibility” standpoint, but it has a side effect of causing potential
> problems with AutoValue.   If the AutoValue class tries to get it’s values
> from an interface, the methods on the interface will come in a different
> order than with Oracle and the resulting constructor/fields/etc… will be
> different.   Based on some experiments and back and forth with Dan H., I
> believe the best fix is to explicitly define the properties on the AutoValue
> class as if it didn’t pull those via the interface.   Thus, the APT
> processing gets the attributes in the order intended and generates the right
> code. The alternative would be to use the Builder pattern instead of the
> constructor, but that requires more code to be written than just defining
> the attributes in the right order.   However, if you are already defining a
> Builder, that might be the best option.
>
> Anyway, something to be aware of when using the AutoValue things.   Once we
> get the branch merged, travis should automatically pick this up.
>
> Dan
>
>
>
>
>> On Oct 14, 2016, at 11:37 AM, Daniel Kulp <dk...@apache.org> wrote:
>>
>>
>>> On Oct 14, 2016, at 10:06 AM, Jesse Anderson <je...@smokinghand.com>
>>> wrote:
>>>
>>> Last week I imported Beam with IntelliJ and everything worked.
>>>
>>> That said, I tried to import the Eclipse project and that doesn't
>>> compile
>>> anymore. I didn't have time to figure out what happened though.
>>>
>>
>> I have a pull request https://github.com/apache/incubator-beam/pull/1094
>> that fixes the compile issues.  It has two LGTM’s, just needs someone to
>> merge it.
>>
>> With eclipse, you need to have all the needed m2e connectors.   Some of
>> them (find bugs, check style) can be auto-detected and installed when beam
>> is first imported.   The apt one doesn’t.   You need to go to the eclipse
>> marketplace, install it, then configure it in the Eclipse properties to
>> turn on the “experimental” m2e-apt processing.   Once you do that, a
>> refresh of the maven projects should result in it building/compiling.
>>
>> Running tests is another matter.   Since eclipse compiles everything in a
>> module in one pass (instead of two like maven), one of the apt processors
>> doesn’t know where to output files and always dumps the files in /classes
>> instead of /test-classes.   Thus, any test that relies on a runner will
>> likely fail as it results in the “test” versions of various services from
>> core being picked up.  A simple:
>>
>> rm sdks/java/core/target/classes/META-INF/services/*
>>
>> From the command line will fix that.   That should also be documented on
>> the IDE page until someone can figure out how to work around it.
>>
>> Dan
>>
>>
>>
>>> On Fri, Oct 14, 2016 at 1:21 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>>
>>>> Hi Christian,
>>>>
>>>> IntelliJ doesn't need any special config (maybe the code style can be
>>>> documented or imported).
>>>>
>>>> Anyway, agree to add such on website in the contribute directory. I
>>>> think it could be part of the contribution-guide as it's first setup
>>>> step.
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 10/14/2016 10:17 AM, Christian Schneider wrote:
>>>>> Hello all,
>>>>>
>>>>> I am new to the beam community and currently start making myself
>>>>> familiar with the code.  I quickly found the contribution guide and
>>>>> was
>>>>> able to clone the code and build beam using maven.
>>>>>
>>>>> The first obstacle I faced was getting the code build in eclipse. I
>>>>> naively imported as existing maven projects but got lots of compile
>>>>> errors. After talking to Dan Kulp we found that this is due to the apt
>>>>> annotation processing for auto value types. Dan explained me how I
>>>>> need
>>>>> to setup eclipse to make it work.
>>>>>
>>>>> I still got 5 compile errors (Some bound mismat

Re: Introduction

2016-10-17 Thread Jesse Anderson
Neelesh,

I saw you talked about the Hadoop MapReduce runner support too. I'd love to
see that happen. When Tyler and I spoke at Strata NYC, I was surprised how
many people were there with only MR code.

This would definitely ease the testing burden if they can port to Beam and
run on MR before going to another runner.

Thanks,

Jesse

On Mon, Oct 17, 2016 at 11:28 AM Amit Sela  wrote:

> Done.
>
> Feel free to take a pick at the Spark runner since you have Spark
> experience and that's great!
>
> Most open issues are usually automatically assigned to me, but ping me (dev
> list/Slack) if you want to work on something and not sure what's the status
> there.
>
> Thanks,
> Amit
>
>
> On Mon, Oct 17, 2016 at 9:14 PM Neelesh Salian 
> wrote:
>
> > Hello folks,
> >
> > I am Neelesh Salian; I recently joined the Beam community and I wanted to
> > take this opportunity to formally introduce myself.
> >
> > I have been working with the Hadoop and Spark ecosystems over the past
> two
> > years and started working on Flink over the past few weeks as well.
> >
> >
> > If someone in the community could please add me to the list of
> contributors
> > to help assign JIRAs to myself to work on, that would be super helpful.
> >
> >
> > Excited to start working and help build the community. :)
> > Thank you.
> >
> > --
> > Neelesh Srinivas Salian
> > Engineer
> >
>


Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-17 Thread Jesse Anderson
Awesome!

On Mon, Oct 17, 2016 at 10:41 AM Thomas Weise  wrote:

> Thanks to Kenn for helping with the review and many questions!
>
> The focus till here has been on making the runner functional. I will start
> creating JIRAs for follow-up work.
>
> Looking forward to the next steps to make it a top-level runner and input
> from the community on the same.
>
> Thanks!
> Thomas
>
>
> On Mon, Oct 17, 2016 at 10:35 AM, Amit Sela  wrote:
>
> > Congrats and thanks to everyone who was involved in this effort!
> >
> > On Mon, Oct 17, 2016 at 8:07 PM Neelesh Salian 
> > wrote:
> >
> > > Awesome. Great work.
> > >
> > > On Mon, Oct 17, 2016 at 10:03 AM, Aljoscha Krettek <
> aljos...@apache.org>
> > > wrote:
> > >
> > > > Congrats! :-)
> > > >
> > > > On Mon, 17 Oct 2016 at 18:55 Kenneth Knowles  >
> > > > wrote:
> > > >
> > > > > *I would like to :-)
> > > > >
> > > > > On Mon, Oct 17, 2016 at 9:51 AM Kenneth Knowles 
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I would to, once again, call attention to a great addition to
> > Beam: a
> > > > > > runner for Apache Apex.
> > > > > >
> > > > > > After lots of review and much thoughtful revision, pull request
> > #540
> > > > has
> > > > > > been merged to the apex-runner feature branch today. Please do
> > take a
> > > > > look,
> > > > > > and help us put the finishing touches on it to get it ready for
> the
> > > > > master
> > > > > > branch.
> > > > > >
> > > > > > And please also congratulate and thank Thomas Weise for this
> large
> > > > > > endeavor, Vlad Rosov who helped get the integration tests
> working,
> > > and
> > > > > > Guarav Gupta who contributed review comments.
> > > > > >
> > > > > > Kenn
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Neelesh Srinivas Salian
> > > Customer Operations Engineer
> > >
> >
>


Re: Documentation for IDE setup

2016-10-14 Thread Jesse Anderson
Last week I imported Beam with IntelliJ and everything worked.

That said, I tried to import the Eclipse project and that doesn't compile
anymore. I didn't have time to figure out what happened though.

On Fri, Oct 14, 2016 at 1:21 AM Jean-Baptiste Onofré 
wrote:

> Hi Christian,
>
> IntelliJ doesn't need any special config (maybe the code style can be
> documented or imported).
>
> Anyway, agree to add such on website in the contribute directory. I
> think it could be part of the contribution-guide as it's first setup step.
>
> Regards
> JB
>
> On 10/14/2016 10:17 AM, Christian Schneider wrote:
> > Hello all,
> >
> > I am new to the beam community and currently start making myself
> > familiar with the code.  I quickly found the contribution guide and was
> > able to clone the code and build beam using maven.
> >
> > The first obstacle I faced was getting the code build in eclipse. I
> > naively imported as existing maven projects but got lots of compile
> > errors. After talking to Dan Kulp we found that this is due to the apt
> > annotation processing for auto value types. Dan explained me how I need
> > to setup eclipse to make it work.
> >
> > I still got 5 compile errors (Some bound mismatch at Read.bounded, and
> > one ambiguous method empty). These errors seem to be present for
> > everyone using eclipse and Dan works on it. So I think this is not a
> > permanent problem.
> >
> > To make it easier for new people I would like to write a documentation
> > about the IDE setup. I can cover the eclipse part but I think intellij
> > should also be described.
> >
> > I already started with it and placed it in /contribute/ide-setup. Does
> > that make sense?
> >
> > I currently did not link to it from anywhere. I think it should be
> > linked in the contribute/index and in the Contribute menu.
> >
> > Christian
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Strata+Hadoop World

2016-09-29 Thread Jesse Anderson
Tyler and I did a Beam tutorial at Strata+Hadoop World. It was 3 hours long
and covered some of the basics of windowing and triggers.

We had 66 people in attendance. That's a really good attendance given the
newness of Beam. We demonstrated the Spark, Flink, Direct and DataFlow
runners.

The code and slides are up on my GitHub repository
. We even
had exercises and sample solutions you can do on your own.

Thanks,

Jesse


Re: Preferred locations (or data locality) for batch pipelines.

2016-09-22 Thread Jesse Anderson
I've only ever seen that being used to figure out which file the
runner/mapper/operation is working on. Otherwise, I haven't seen those
operations care where in the file they're working.

On Thu, Sep 22, 2016 at 5:57 AM Amit Sela  wrote:

> Wouldn't it force all runners to implement this for all distributed
> filesystems ? It's true that each runner has it's own "partitioning"
> mechanism, but I assume (maybe I'm wrong) that open-source runners use the
> Hadoop InputFormat/InputSplit for that.. and the proper connectors for that
> to run on top of s3/gs.
>
> If this is wrong, each runner should take care of it's own, but if not, we
> could have a generic solution for runners, no ?
>
> Thanks,
> Amit
>
> On Thu, Sep 22, 2016 at 3:30 PM Jean-Baptiste Onofré 
> wrote:
>
> > Hi Amit,
> >
> > as the purpose is to remove IOChannelFactory, then I would suggest it's
> > a runner concern. The Read.Bounded should "locate" the bundles on a
> > executor close to the read data (even if it's not always possible
> > depending of the source).
> >
> > My $0.01
> >
> > Regards
> > JB
> >
> > On 09/22/2016 02:26 PM, Amit Sela wrote:
> > > It's not new that batch pipeline can optimize on data locality, my
> > question
> > > is regarding this responsibility in Beam.
> > > If runners should implement a generic Read.Bounded support, should they
> > > also implement locating the input blocks ? or should it be a part
> > > of IOChannelFactory implementations ? or another way to go at it that
> I'm
> > > missing ?
> > >
> > > Thanks,
> > > Amit.
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: CEP / Pattern matching on top of Beam pipeline

2016-09-21 Thread Jesse Anderson
Still a feature that some would really like to see
http://www.jesse-anderson.com/2016/07/question-and-answers-with-the-apache-beam-team/

On Wed, Sep 21, 2016 at 4:56 PM Aparup Banerjee (apbanerj) <
apban...@cisco.com> wrote:

> Hi Folks,
>
> Is anyone familiar with a CEP / Pattern matching library / framework on
> top of Beam pipeline?
>
> Thanks,
> Aparup
>


IntervalWindow toString()

2016-09-19 Thread Jesse Anderson
The toString() to IntervalWindow starts with a square bracket and ends with
a parenthesis. Is this a type of notation or a bug? Code:

  @Override
  public String toString() {
return "[" + start + ".." + end + ")";
  }

Thanks,

Jesse


Maven Compile Fails

2016-09-16 Thread Jesse Anderson
Is anyone else experiencing this while building with Maven? I'm having to
clean each time. It only happens on beam-sdks-java-core.

[INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @
beam-sdks-java-core ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 378 source files to
/home/vmuser/host/repos/incubator-beam/sdks/java/core/target/classes
An exception has occurred in the compiler (1.8.0_101). Please file a bug
against the Java compiler via the Java bug reporting page (
http://bugreport.java.com) after checking the Bug Database (
http://bugs.java.com) for duplicates. Include your program and the
following diagnostic in your report. Thank you.
java.lang.IllegalStateException: endPosTable already set
at
com.sun.tools.javac.util.DiagnosticSource.setEndPosTable(DiagnosticSource.java:136)
at com.sun.tools.javac.util.Log.setEndPosTable(Log.java:350)
at com.sun.tools.javac.main.JavaCompiler.parse(JavaCompiler.java:667)
at com.sun.tools.javac.main.JavaCompiler.parseFiles(JavaCompiler.java:950)
at
com.sun.tools.javac.processing.JavacProcessingEnvironment$Round.(JavacProcessingEnvironment.java:892)
at
com.sun.tools.javac.processing.JavacProcessingEnvironment$Round.next(JavacProcessingEnvironment.java:921)
at
com.sun.tools.javac.processing.JavacProcessingEnvironment.doProcessing(JavacProcessingEnvironment.java:1187)
at
com.sun.tools.javac.main.JavaCompiler.processAnnotations(JavaCompiler.java:1170)
at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:856)
at com.sun.tools.javac.main.Main.compile(Main.java:523)
at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
at
org.codehaus.plexus.compiler.javac.JavaxToolsCompiler.compileInProcess(JavaxToolsCompiler.java:125)
at
org.codehaus.plexus.compiler.javac.JavacCompiler.performCompile(JavacCompiler.java:169)
at
org.apache.maven.plugin.compiler.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:825)
at
org.apache.maven.plugin.compiler.CompilerMojo.execute(CompilerMojo.java:129)
at
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:207)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:863)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[INFO] -
[ERROR] COMPILATION ERROR :
[INFO] -
[ERROR] An unknown compilation problem occurred
[INFO] 1 error

vmuser@packer-virtualbox-iso:~/host/repos/incubator-beam$ mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5;
2015-11-10T11:41:47-05:00)
Maven home: /usr/share/maven3
Java version: 1.8.0_101, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-8-oracle/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.2.0-27-generic", arch: "amd64", family: "unix"


Re: JavaDoc

2016-09-16 Thread Jesse Anderson
@Ismael here's a pom.xml with the repository specified
https://github.com/eljefe6a/beamexample/blob/master/BeamTutorial/pom.xml

On Thu, Sep 15, 2016 at 11:38 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Jesse,
>
> good point, we gonna fix that.
>
> Anyway, you have the nightly javadoc on the Apache SNAPSHOT repo:
>
>
> https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-sdks-java-core/0.3.0-incubating-SNAPSHOT/*javadoc.jar
>
> Regards
> JB
>
> On 09/15/2016 05:26 PM, Jesse Anderson wrote:
> > Only the 0.1.0 JavaDoc is on the website
> > <http://beam.incubator.apache.org/learn/sdks/javadoc/>. It should have
> > 0.2.0.
> >
> > Thanks,
> >
> > Jesse
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: JavaDoc

2016-09-15 Thread Jesse Anderson
Do you know if that includes putting the 0.3.0 nightlies up? Right now,
only 0.2.0 is there
<https://repository.apache.org/content/repositories/snapshots/org/apache/beam/apache-beam/>
.

On Thu, Sep 15, 2016 at 9:10 AM Frances Perry <f...@google.com.invalid>
wrote:

> + Dan
>
> Thanks, Jesse. I believe Dan has pending PRs (pull/38) to update the site
> after 0.2.0.
>
> On Thu, Sep 15, 2016 at 8:26 AM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > Only the 0.1.0 JavaDoc is on the website
> > <http://beam.incubator.apache.org/learn/sdks/javadoc/>. It should have
> > 0.2.0.
> >
> > Thanks,
> >
> > Jesse
> >
>


JavaDoc

2016-09-15 Thread Jesse Anderson
Only the 0.1.0 JavaDoc is on the website
. It should have
0.2.0.

Thanks,

Jesse


Re: Remove legacy import-order?

2016-08-23 Thread Jesse Anderson
Please. That's the one that always trips me up.

On Tue, Aug 23, 2016, 4:10 PM Ben Chambers  wrote:

> When Beam was contributed it inherited an import order [1] that was pretty
> arbitrary. We've added org.apache.beam [2], but continue to use this
> ordering.
>
> Both Eclipse and IntelliJ default to grouping imports into alphabetic
> order. I think it would simplify development if we switched our checkstyle
> ordering to agree with these IDEs. This also removes special treatment for
> specific packages.
>
> If people agree, I'll send out a PR that changes the checkstyle
> configuration and runs IntelliJ's sort-imports on the existing files.
>
> -- Ben
>
> [1]
> org.apache.beam,com.google,android,com,io,Jama,junit,net,org,sun,java,javax
> [2] com.google,android,com,io,Jama,junit,net,org,sun,java,javax
>


Re: Beam Interview

2016-07-13 Thread Jesse Anderson
A really big thanks to everyone for making this Q and A such a success. I'm
really surprised by how many people (20 to be exact) took the time to
answer it.

This is a really good way to educate other developers and managers on Beam.
It answers the pressing questions people ask before using Beam. Please
promote it as much as you can. Here's some sample copy to tweet or put on
LinkedIn:

Q and A with 20 committers and users of Apache Beam. See how to use it or
if you should use in production.
http://www.jesse-anderson.com/2016/07/question-and-answers-with-the-apache-beam-team/

Thanks,

Jesse

On Tue, Jul 12, 2016 at 1:51 PM Jesse Anderson <je...@smokinghand.com>
wrote:

> Last call. If you want your words of wisdom forever kept in the annals of
> Apache Beam lore, I'm publishing tomorrow (7-13) at 9 AM PT.
>
>
> On Mon, Jul 11, 2016 at 11:13 PM Tyler Akidau <taki...@apache.org> wrote:
>
>> +1. Thanks a lot for putting this together. :-)
>>
>> On Mon, Jul 11, 2016 at 9:33 PM Frances Perry <f...@google.com.invalid>
>> wrote:
>>
>> > Love this, Jesse! And pretty inspired reading the answers so far ;-)
>> >
>> > On Mon, Jul 11, 2016 at 1:42 PM, Jesse Anderson <je...@smokinghand.com>
>> > wrote:
>> >
>> > > Thanks!
>> > >
>> > > On Mon, Jul 11, 2016 at 1:02 PM Ismaël Mejía <ieme...@gmail.com>
>> wrote:
>> > >
>> > > > Great Idea, I just added my answers, English is not my native
>> language,
>> > > so
>> > > > feel free to edit if you find any grammatical mistakes, sorry.
>> > > >
>> > > > Ismael
>> > > >
>> > > > On Mon, Jul 11, 2016 at 7:12 PM, Jesse Anderson <
>> je...@smokinghand.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > I really appreciate the turnout. I'm pleasantly surprised with the
>> > > varied
>> > > > > responses I've received.
>> > > > >
>> > > > > I plan to publish this post on July 13 at 9 AM PT. If you'd like
>> to
>> > add
>> > > > > your input, please do it before that time.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jesse
>> > > > >
>> > > > > On Fri, Jul 8, 2016 at 1:30 PM Amit Sela <amitsel...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > That's great Jesse!
>> > > > > >
>> > > > > > Added my comments.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Amit
>> > > > > >
>> > > > > > On Fri, Jul 8, 2016 at 8:56 PM Shiv Shankar <
>> > > > shiv.shivshan...@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > > I am a User and learner. I just added my view points.
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > > SV
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, Jul 8, 2016 at 1:51 AM, Sergio Fernández <
>> > > wik...@apache.org>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Great idea!
>> > > > > > > >
>> > > > > > > > On Fri, Jul 8, 2016 at 7:44 AM, Jean-Baptiste Onofré <
>> > > > > j...@nanthrax.net>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Jesse,
>> > > > > > > > >
>> > > > > > > > > good idea. Just complete the doc.
>> > > > > > > > >
>> > > > > > > > > Regards
>> > > > > > > > > JB
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On 07/08/2016 02:18 AM, Jesse Anderson wrote:
>> > > > > > > > >
>> > > > > > > > >> I've been thinking about ways to get more Beam
>> information
>> > out
>> > > > > there
>> > > > > > > > >> without too much fuss over getting everything right. I
>> came
>

Re: Beam Interview

2016-07-12 Thread Jesse Anderson
Last call. If you want your words of wisdom forever kept in the annals of
Apache Beam lore, I'm publishing tomorrow (7-13) at 9 AM PT.

On Mon, Jul 11, 2016 at 11:13 PM Tyler Akidau <taki...@apache.org> wrote:

> +1. Thanks a lot for putting this together. :-)
>
> On Mon, Jul 11, 2016 at 9:33 PM Frances Perry <f...@google.com.invalid>
> wrote:
>
> > Love this, Jesse! And pretty inspired reading the answers so far ;-)
> >
> > On Mon, Jul 11, 2016 at 1:42 PM, Jesse Anderson <je...@smokinghand.com>
> > wrote:
> >
> > > Thanks!
> > >
> > > On Mon, Jul 11, 2016 at 1:02 PM Ismaël Mejía <ieme...@gmail.com>
> wrote:
> > >
> > > > Great Idea, I just added my answers, English is not my native
> language,
> > > so
> > > > feel free to edit if you find any grammatical mistakes, sorry.
> > > >
> > > > Ismael
> > > >
> > > > On Mon, Jul 11, 2016 at 7:12 PM, Jesse Anderson <
> je...@smokinghand.com
> > >
> > > > wrote:
> > > >
> > > > > I really appreciate the turnout. I'm pleasantly surprised with the
> > > varied
> > > > > responses I've received.
> > > > >
> > > > > I plan to publish this post on July 13 at 9 AM PT. If you'd like to
> > add
> > > > > your input, please do it before that time.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jesse
> > > > >
> > > > > On Fri, Jul 8, 2016 at 1:30 PM Amit Sela <amitsel...@gmail.com>
> > wrote:
> > > > >
> > > > > > That's great Jesse!
> > > > > >
> > > > > > Added my comments.
> > > > > >
> > > > > > Thanks,
> > > > > > Amit
> > > > > >
> > > > > > On Fri, Jul 8, 2016 at 8:56 PM Shiv Shankar <
> > > > shiv.shivshan...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > I am a User and learner. I just added my view points.
> > > > > > >
> > > > > > > Thanks
> > > > > > > SV
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 8, 2016 at 1:51 AM, Sergio Fernández <
> > > wik...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Great idea!
> > > > > > > >
> > > > > > > > On Fri, Jul 8, 2016 at 7:44 AM, Jean-Baptiste Onofré <
> > > > > j...@nanthrax.net>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jesse,
> > > > > > > > >
> > > > > > > > > good idea. Just complete the doc.
> > > > > > > > >
> > > > > > > > > Regards
> > > > > > > > > JB
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 07/08/2016 02:18 AM, Jesse Anderson wrote:
> > > > > > > > >
> > > > > > > > >> I've been thinking about ways to get more Beam information
> > out
> > > > > there
> > > > > > > > >> without too much fuss over getting everything right. I
> came
> > up
> > > > > with
> > > > > > a
> > > > > > > > >> written Q and A that represents the most common questions
> I
> > > get.
> > > > > > > > >>
> > > > > > > > >> Answering the questions should take 5-10 minutes. I think
> it
> > > > will
> > > > > > go a
> > > > > > > > >> long
> > > > > > > > >> ways towards getting more Beam users.
> > > > > > > > >>
> > > > > > > > >> 1. Here is the Google Doc link:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1IQt6FfQI7W4d2QxZm6WwGnZFdA8JmaseKZrMGPu8zgY/edit#
> > > > > > > > >> 2. Add your name and initials.
> > > > > > > > >> 3. When you answer a question, just prefix it with
> your
> > > > > > initials.
> > > > > > > > >>
> > > > > > > > >> I really appreciate you taking the time to answer things.
> > I'll
> > > > > > publish
> > > > > > > > the
> > > > > > > > >> results of the Q and A on my blog and email out the link
> > once
> > > > it's
> > > > > > up
> > > > > > > > >> there.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >> Jesse
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > > --
> > > > > > > > > Jean-Baptiste Onofré
> > > > > > > > > jbono...@apache.org
> > > > > > > > > http://blog.nanthrax.net
> > > > > > > > > Talend - http://www.talend.com
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Sergio Fernández
> > > > > > > > Partner Technology Manager
> > > > > > > > Redlink GmbH
> > > > > > > > m: +43 6602747925
> > > > > > > > e: sergio.fernan...@redlink.co
> > > > > > > > w: http://redlink.co
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Beam Interview

2016-07-11 Thread Jesse Anderson
Thanks!

On Mon, Jul 11, 2016 at 1:02 PM Ismaël Mejía <ieme...@gmail.com> wrote:

> Great Idea, I just added my answers, English is not my native language, so
> feel free to edit if you find any grammatical mistakes, sorry.
>
> Ismael
>
> On Mon, Jul 11, 2016 at 7:12 PM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > I really appreciate the turnout. I'm pleasantly surprised with the varied
> > responses I've received.
> >
> > I plan to publish this post on July 13 at 9 AM PT. If you'd like to add
> > your input, please do it before that time.
> >
> > Thanks,
> >
> > Jesse
> >
> > On Fri, Jul 8, 2016 at 1:30 PM Amit Sela <amitsel...@gmail.com> wrote:
> >
> > > That's great Jesse!
> > >
> > > Added my comments.
> > >
> > > Thanks,
> > > Amit
> > >
> > > On Fri, Jul 8, 2016 at 8:56 PM Shiv Shankar <
> shiv.shivshan...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > > I am a User and learner. I just added my view points.
> > > >
> > > > Thanks
> > > > SV
> > > >
> > > >
> > > > On Fri, Jul 8, 2016 at 1:51 AM, Sergio Fernández <wik...@apache.org>
> > > > wrote:
> > > >
> > > > > Great idea!
> > > > >
> > > > > On Fri, Jul 8, 2016 at 7:44 AM, Jean-Baptiste Onofré <
> > j...@nanthrax.net>
> > > > > wrote:
> > > > >
> > > > > > Hi Jesse,
> > > > > >
> > > > > > good idea. Just complete the doc.
> > > > > >
> > > > > > Regards
> > > > > > JB
> > > > > >
> > > > > >
> > > > > > On 07/08/2016 02:18 AM, Jesse Anderson wrote:
> > > > > >
> > > > > >> I've been thinking about ways to get more Beam information out
> > there
> > > > > >> without too much fuss over getting everything right. I came up
> > with
> > > a
> > > > > >> written Q and A that represents the most common questions I get.
> > > > > >>
> > > > > >> Answering the questions should take 5-10 minutes. I think it
> will
> > > go a
> > > > > >> long
> > > > > >> ways towards getting more Beam users.
> > > > > >>
> > > > > >> 1. Here is the Google Doc link:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1IQt6FfQI7W4d2QxZm6WwGnZFdA8JmaseKZrMGPu8zgY/edit#
> > > > > >> 2. Add your name and initials.
> > > > > >> 3. When you answer a question, just prefix it with your
> > > initials.
> > > > > >>
> > > > > >> I really appreciate you taking the time to answer things. I'll
> > > publish
> > > > > the
> > > > > >> results of the Q and A on my blog and email out the link once
> it's
> > > up
> > > > > >> there.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Jesse
> > > > > >>
> > > > > >>
> > > > > > --
> > > > > > Jean-Baptiste Onofré
> > > > > > jbono...@apache.org
> > > > > > http://blog.nanthrax.net
> > > > > > Talend - http://www.talend.com
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sergio Fernández
> > > > > Partner Technology Manager
> > > > > Redlink GmbH
> > > > > m: +43 6602747925
> > > > > e: sergio.fernan...@redlink.co
> > > > > w: http://redlink.co
> > > > >
> > > >
> > >
> >
>


Re: Beam Interview

2016-07-11 Thread Jesse Anderson
I really appreciate the turnout. I'm pleasantly surprised with the varied
responses I've received.

I plan to publish this post on July 13 at 9 AM PT. If you'd like to add
your input, please do it before that time.

Thanks,

Jesse

On Fri, Jul 8, 2016 at 1:30 PM Amit Sela <amitsel...@gmail.com> wrote:

> That's great Jesse!
>
> Added my comments.
>
> Thanks,
> Amit
>
> On Fri, Jul 8, 2016 at 8:56 PM Shiv Shankar <shiv.shivshan...@gmail.com>
> wrote:
>
> > Hi,
> > I am a User and learner. I just added my view points.
> >
> > Thanks
> > SV
> >
> >
> > On Fri, Jul 8, 2016 at 1:51 AM, Sergio Fernández <wik...@apache.org>
> > wrote:
> >
> > > Great idea!
> > >
> > > On Fri, Jul 8, 2016 at 7:44 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> > > wrote:
> > >
> > > > Hi Jesse,
> > > >
> > > > good idea. Just complete the doc.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 07/08/2016 02:18 AM, Jesse Anderson wrote:
> > > >
> > > >> I've been thinking about ways to get more Beam information out there
> > > >> without too much fuss over getting everything right. I came up with
> a
> > > >> written Q and A that represents the most common questions I get.
> > > >>
> > > >> Answering the questions should take 5-10 minutes. I think it will
> go a
> > > >> long
> > > >> ways towards getting more Beam users.
> > > >>
> > > >> 1. Here is the Google Doc link:
> > > >>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1IQt6FfQI7W4d2QxZm6WwGnZFdA8JmaseKZrMGPu8zgY/edit#
> > > >> 2. Add your name and initials.
> > > >> 3. When you answer a question, just prefix it with your
> initials.
> > > >>
> > > >> I really appreciate you taking the time to answer things. I'll
> publish
> > > the
> > > >> results of the Q and A on my blog and email out the link once it's
> up
> > > >> there.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jesse
> > > >>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> > >
> > >
> > > --
> > > Sergio Fernández
> > > Partner Technology Manager
> > > Redlink GmbH
> > > m: +43 6602747925
> > > e: sergio.fernan...@redlink.co
> > > w: http://redlink.co
> > >
> >
>


Re: Beam Interview

2016-07-07 Thread Jesse Anderson
Thanks Aparup

On Thu, Jul 7, 2016, 9:26 PM Aparup Banerjee (apbanerj) <apban...@cisco.com>
wrote:

> I have just given my thoughts in it. We at Cisco are using beam at
> multiple of our projects.
>
> Thanks,
> Aparup
>
>
>
>
> On 7/7/16, 5:18 PM, "Jesse Anderson" <je...@smokinghand.com> wrote:
>
> >I've been thinking about ways to get more Beam information out there
> >without too much fuss over getting everything right. I came up with a
> >written Q and A that represents the most common questions I get.
> >
> >Answering the questions should take 5-10 minutes. I think it will go a
> long
> >ways towards getting more Beam users.
> >
> >   1. Here is the Google Doc link:
> >
> https://docs.google.com/document/d/1IQt6FfQI7W4d2QxZm6WwGnZFdA8JmaseKZrMGPu8zgY/edit#
> >   2. Add your name and initials.
> >   3. When you answer a question, just prefix it with your initials.
> >
> >I really appreciate you taking the time to answer things. I'll publish the
> >results of the Q and A on my blog and email out the link once it's up
> there.
> >
> >Thanks,
> >
> >Jesse
>


Re: Beam Interview

2016-07-07 Thread Jesse Anderson
Neville,

Thanks for responding to the interview. I've changed it to say both
committers and users.

Thanks,

Jesse

On Thu, Jul 7, 2016 at 9:19 PM Neville Li <neville@gmail.com> wrote:

> Not a committer but I shared some thoughts since we (Spotify) are heavy
> users of Dataflow/Beam and contribute back to the code base.
>
> On Thu, Jul 7, 2016 at 8:18 PM Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > I've been thinking about ways to get more Beam information out there
> > without too much fuss over getting everything right. I came up with a
> > written Q and A that represents the most common questions I get.
> >
> > Answering the questions should take 5-10 minutes. I think it will go a
> long
> > ways towards getting more Beam users.
> >
> >1. Here is the Google Doc link:
> >
> >
> https://docs.google.com/document/d/1IQt6FfQI7W4d2QxZm6WwGnZFdA8JmaseKZrMGPu8zgY/edit#
> >2. Add your name and initials.
> >3. When you answer a question, just prefix it with your initials.
> >
> > I really appreciate you taking the time to answer things. I'll publish
> the
> > results of the Q and A on my blog and email out the link once it's up
> > there.
> >
> > Thanks,
> >
> > Jesse
> >
>


Beam Interview

2016-07-07 Thread Jesse Anderson
I've been thinking about ways to get more Beam information out there
without too much fuss over getting everything right. I came up with a
written Q and A that represents the most common questions I get.

Answering the questions should take 5-10 minutes. I think it will go a long
ways towards getting more Beam users.

   1. Here is the Google Doc link:
   
https://docs.google.com/document/d/1IQt6FfQI7W4d2QxZm6WwGnZFdA8JmaseKZrMGPu8zgY/edit#
   2. Add your name and initials.
   3. When you answer a question, just prefix it with your initials.

I really appreciate you taking the time to answer things. I'll publish the
results of the Q and A on my blog and email out the link once it's up there.

Thanks,

Jesse


Re: Talking About Beam

2016-06-15 Thread Jesse Anderson
Amit,

I've written that piece too, but I haven't published it yet.

Thanks,

Jesse

On Wed, Jun 15, 2016, 3:38 PM Amit Sela <amitsel...@gmail.com> wrote:

> Great writing Jesse!
>
> From my experience in the last year, working on a stream processing (and
> generally data processing) platform at PayPal, Beam could also offer a
> great approach for large projects - up until now (and in my case as well),
> the process was:
>
>1. Research and paper analysis of existing frameworks.
>2. Understand your needs.
>3. Choose (and commit to) a specific technology - example: Spark.
>4. Get to work..
>
> I believe Beam could change this into something better, such as:
>
>1. Understand your needs, and start working on them.
>2. Combine your research with actually running (your) same code on
>different frameworks - probably better then "WordCount" benchmarks.
>3. Choose the best framework for you, or choose more than one if the
>benefit is worth the overhead.
>4. While working on 2 & 3, you keep going forward with your project!
>
> I talked about Beam in Barclays-Techstars Accelerator in Israel last month
> because I totally agree that it's a great starting point for startups, but
> I think this is an example why not just startups :)
>
> Thanks,
> Amit
>
> On Wed, Jun 15, 2016 at 9:58 AM Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > I wrote a piece published on O'Reilly about Beam
> >
> >
> https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code?utm_medium=social_source=twitter.com_campaign=lgen_content=data+article+ki=tw-data-na-article-lgen_tw_article
> > .
> > It gives some of the thoughts and ideas that will help Beam adoption. I
> > suggest reading it to get some ideas for how to talk about Beam at talks
> > and conferences.
> >
> > Before writing the piece, I tested how it resonates with people. These
> > really help people understand why Beam is used and how it solves the
> future
> > proofing and scale proofing problems small companies face.
> >
> > Thanks,
> >
> > Jesse
> >
>


Talking About Beam

2016-06-15 Thread Jesse Anderson
I wrote a piece published on O'Reilly about Beam
https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code?utm_medium=social_source=twitter.com_campaign=lgen_content=data+article+ki=tw-data-na-article-lgen_tw_article.
It gives some of the thoughts and ideas that will help Beam adoption. I
suggest reading it to get some ideas for how to talk about Beam at talks
and conferences.

Before writing the piece, I tested how it resonates with people. These
really help people understand why Beam is used and how it solves the future
proofing and scale proofing problems small companies face.

Thanks,

Jesse


Re: One more streaming engine in OSS

2016-06-07 Thread Jesse Anderson
Here's a writeup I did on Heron.
http://www.jesse-anderson.com/2016/06/the-case-for-heron/

@nitin are you going to write a Concord runner?

On Tue, Jun 7, 2016 at 12:37 PM Nitin Lamba  wrote:

> It gets better:
> http://concord.io
>
> :)
>
> On Tue, Jun 7, 2016 at 9:28 AM, Dan Halperin 
> wrote:
>
> > Yep! Without having done any analysis of Heron itself, I'd say that we'd
> > love to have a Beam-on-Heron runner as well!
> >
> > On Wed, May 25, 2016 at 2:30 PM, Seetharam Venkatesh <
> > venkat...@innerzeal.com> wrote:
> >
> > > https://blog.twitter.com/2016/open-sourcing-twitter-heron
> > >
> > > More the merrier for Beam? :-)
> > >
> > > Venkatesh
> > >
> >
>


Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
Another perspective is to look at other projects in the Hadoop ecosystem.

Impala had to have a LIMIT any time you did an ORDER BY. They're since
removed this limitation.

Hive has two sorting options. ORDER BY does a global order. SORT BY orders
everything in that partition.

On Thu, May 26, 2016 at 12:35 PM Jesse Anderson <je...@smokinghand.com>
wrote:

> I had a similar thought, but wasn't sure if that violated a tenet of Beam.
>
> I'm thinking an ordered sink could wrap around another sink. I could see
> something like:
> collection.apply(OrderedSink.Timestamp.write(TextIO.Write.To(...)));
>
> On Thu, May 26, 2016 at 12:26 PM Robert Bradshaw
> <rober...@google.com.invalid> wrote:
>
>> As Frances alluded to, it's also really hard to reconcile the notion
>> of a globally ordered PCollection in the context of a streaming
>> pipeline. Sorting also imposes conditions on partitioning, which we
>> intentionally leave unspecified for maximum flexibility in the
>> runtime. One also gets into the question of whether particular
>> operations are order-creating, order-preserving, or order-destroying
>> and how much extra overhead is required to maintain these properties
>> for intermediate collections.
>>
>> Your mention of sorting by time is interesting, as this is the
>> inherent sort dimension is streaming (and we use features like
>> windowing and triggering to do correct time-based grouping despite
>> real-time skew). Other than that, all the uses of sorting I've seen
>> have been limited to portions of data small enough to be produced by
>> (and consumed by) a single machine (so tops GBs, not TBs or PBs).
>>
>> All that aside, I could see more tractable case being made for
>> ordering (partitioning, etc.) a particular materialization of a
>> PCollection, i.e. being sorted would not be a property of a
>> PCollection itself, but could be provided by a sink (e.g. one could
>> have a sink that promises to write its records in a particular order
>> within and across shards). It's not inconceivable that this could be
>> done in a way that is composible with (a large class of) existing
>> sinks, e.g. given a FileBasedSink and intra/inter-shard-sorting
>> specifications, one could produce a bounded sink that writes "sorted"
>> files. Lots of design work TBD...
>>
>> - Robert
>>
>>
>>
>>
>> On Thu, May 26, 2016 at 11:32 AM, Jesse Anderson <je...@smokinghand.com>
>> wrote:
>> > @frances great analysis. I'm hoping this serves as the starting point
>> for
>> > the discussion.
>> >
>> > It really comes down to: is this a nice to have or a show stopping
>> > requirement? As you mention, it comes down to the use case. I've taught
>> at
>> > large financial companies where (global) sorting was a real and show
>> > stopping use case. Theirs was for a large end of day report that had to
>> be
>> > globally sorted and consumed by many other groups. Sorry, I can't be
>> more
>> > specific.
>> >
>> > Thanks,
>> >
>> > Jesse
>> >
>> > On Thu, May 26, 2016 at 10:19 AM Frances Perry <f...@google.com.invalid>
>> > wrote:
>> >
>> >> Currently the Beam model doesn't provide the functionality to do
>> sorting,
>> >> so this is a pretty deep feature request. Let's separate the discussion
>> >> into value sorting and global sorting.
>> >>
>> >> For value sorting, you need to be able to specify some property of the
>> >> value (often called a secondary key) and have the GroupByKey/shuffle
>> >> implementation sort values for a given key by the secondary key. This
>> is a
>> >> pretty common use case, and I think exposing this in Beam would make a
>> lot
>> >> of sense. The Hadoop and the Cloud Dataflow shuffle implementation
>> supports
>> >> this, for example. So it may just be a matter of figuring out how best
>> to
>> >> expose it to users. In FlumeJava we had you explicitly ParDo to pair
>> values
>> >> with a string "sort key" so you'd GroupByKey on a PCollection<KV<Key,
>> >> KV<String, Value>> and get back the Values sorted lexicographically by
>> >> String. It's a bit gross for users to think about a way to order things
>> >> that sorts lexicographically. Looks like Crunch[1] uses a general sort
>> key
>> >> -- but that likely won't interact cleanly with Beam's use of encoded
>> keys
>> >> for comparisons. Would

Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
I had a similar thought, but wasn't sure if that violated a tenet of Beam.

I'm thinking an ordered sink could wrap around another sink. I could see
something like:
collection.apply(OrderedSink.Timestamp.write(TextIO.Write.To(...)));

On Thu, May 26, 2016 at 12:26 PM Robert Bradshaw
<rober...@google.com.invalid> wrote:

> As Frances alluded to, it's also really hard to reconcile the notion
> of a globally ordered PCollection in the context of a streaming
> pipeline. Sorting also imposes conditions on partitioning, which we
> intentionally leave unspecified for maximum flexibility in the
> runtime. One also gets into the question of whether particular
> operations are order-creating, order-preserving, or order-destroying
> and how much extra overhead is required to maintain these properties
> for intermediate collections.
>
> Your mention of sorting by time is interesting, as this is the
> inherent sort dimension is streaming (and we use features like
> windowing and triggering to do correct time-based grouping despite
> real-time skew). Other than that, all the uses of sorting I've seen
> have been limited to portions of data small enough to be produced by
> (and consumed by) a single machine (so tops GBs, not TBs or PBs).
>
> All that aside, I could see more tractable case being made for
> ordering (partitioning, etc.) a particular materialization of a
> PCollection, i.e. being sorted would not be a property of a
> PCollection itself, but could be provided by a sink (e.g. one could
> have a sink that promises to write its records in a particular order
> within and across shards). It's not inconceivable that this could be
> done in a way that is composible with (a large class of) existing
> sinks, e.g. given a FileBasedSink and intra/inter-shard-sorting
> specifications, one could produce a bounded sink that writes "sorted"
> files. Lots of design work TBD...
>
> - Robert
>
>
>
>
> On Thu, May 26, 2016 at 11:32 AM, Jesse Anderson <je...@smokinghand.com>
> wrote:
> > @frances great analysis. I'm hoping this serves as the starting point for
> > the discussion.
> >
> > It really comes down to: is this a nice to have or a show stopping
> > requirement? As you mention, it comes down to the use case. I've taught
> at
> > large financial companies where (global) sorting was a real and show
> > stopping use case. Theirs was for a large end of day report that had to
> be
> > globally sorted and consumed by many other groups. Sorry, I can't be more
> > specific.
> >
> > Thanks,
> >
> > Jesse
> >
> > On Thu, May 26, 2016 at 10:19 AM Frances Perry <f...@google.com.invalid>
> > wrote:
> >
> >> Currently the Beam model doesn't provide the functionality to do
> sorting,
> >> so this is a pretty deep feature request. Let's separate the discussion
> >> into value sorting and global sorting.
> >>
> >> For value sorting, you need to be able to specify some property of the
> >> value (often called a secondary key) and have the GroupByKey/shuffle
> >> implementation sort values for a given key by the secondary key. This
> is a
> >> pretty common use case, and I think exposing this in Beam would make a
> lot
> >> of sense. The Hadoop and the Cloud Dataflow shuffle implementation
> supports
> >> this, for example. So it may just be a matter of figuring out how best
> to
> >> expose it to users. In FlumeJava we had you explicitly ParDo to pair
> values
> >> with a string "sort key" so you'd GroupByKey on a PCollection<KV<Key,
> >> KV<String, Value>> and get back the Values sorted lexicographically by
> >> String. It's a bit gross for users to think about a way to order things
> >> that sorts lexicographically. Looks like Crunch[1] uses a general sort
> key
> >> -- but that likely won't interact cleanly with Beam's use of encoded
> keys
> >> for comparisons. Would be nice to think about if there's a cleaner way.
> >>
> >> For global sorting, you need to be able to be able to generate and
> maintain
> >> orderedness across the elements in a PCollection and have a way to know
> how
> >> to partition the PCollection into balanced, sorted subchunks. This would
> >> have a pretty large impact on the Beam model and potentially on many of
> the
> >> runners. Looking at the Crunch sort [1], it requires users to provide
> the
> >> partitioning function if they want it to scale beyond a single reduce.
> I'd
> >> love to see if there's a way to do better. It also can have a pretty big
> >> imp

Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
@frances great analysis. I'm hoping this serves as the starting point for
the discussion.

It really comes down to: is this a nice to have or a show stopping
requirement? As you mention, it comes down to the use case. I've taught at
large financial companies where (global) sorting was a real and show
stopping use case. Theirs was for a large end of day report that had to be
globally sorted and consumed by many other groups. Sorry, I can't be more
specific.

Thanks,

Jesse

On Thu, May 26, 2016 at 10:19 AM Frances Perry <f...@google.com.invalid>
wrote:

> Currently the Beam model doesn't provide the functionality to do sorting,
> so this is a pretty deep feature request. Let's separate the discussion
> into value sorting and global sorting.
>
> For value sorting, you need to be able to specify some property of the
> value (often called a secondary key) and have the GroupByKey/shuffle
> implementation sort values for a given key by the secondary key. This is a
> pretty common use case, and I think exposing this in Beam would make a lot
> of sense. The Hadoop and the Cloud Dataflow shuffle implementation supports
> this, for example. So it may just be a matter of figuring out how best to
> expose it to users. In FlumeJava we had you explicitly ParDo to pair values
> with a string "sort key" so you'd GroupByKey on a PCollection<KV<Key,
> KV<String, Value>> and get back the Values sorted lexicographically by
> String. It's a bit gross for users to think about a way to order things
> that sorts lexicographically. Looks like Crunch[1] uses a general sort key
> -- but that likely won't interact cleanly with Beam's use of encoded keys
> for comparisons. Would be nice to think about if there's a cleaner way.
>
> For global sorting, you need to be able to be able to generate and maintain
> orderedness across the elements in a PCollection and have a way to know how
> to partition the PCollection into balanced, sorted subchunks. This would
> have a pretty large impact on the Beam model and potentially on many of the
> runners. Looking at the Crunch sort [1], it requires users to provide the
> partitioning function if they want it to scale beyond a single reduce. I'd
> love to see if there's a way to do better. It also can have a pretty big
> impact on the ability to efficiently parallelize execution (things like
> dynamic work rebalancing [2] become trickier). Within Google [3], we've
> found that this tends to be something that users ask for, but don't really
> have a strong use case for. It's usually the case that Top suffices or that
> they would rather redo the algorithm into something that can parallelize
> more efficiently without relying on a global sort. Though of course, with
> out this, we can't actually do the TeraSort benchmark in Beam. ;-)
>
> And of course there's the impact of the unified model on all this ;-) I
> think these ideas would translated to windowed PCollections ok, but would
> want to think carefully about it.
>
> [1] https://crunch.apache.org/user-guide.html#sorting
> [2]
>
> https://cloud.google.com/blog/big-data/2016/05/no-shard-left-behind-dynamic-work-rebalancing-in-google-cloud-dataflow
>
> [3]
>
> https://cloud.google.com/blog/big-data/2016/02/history-of-massive-scale-sorting-experiments-at-google
>
>
> On Thu, May 26, 2016 at 8:56 AM, Jesse Anderson <je...@smokinghand.com>
> wrote:
>
> > This is somewhat the continuation of my thread "Writing Out
> List."
> >
> > Right now, the only way to do sorting is with the Top class. This works
> > well, but has the constraint of fitting in memory.
> >
> > A common batch use case is to take a large file and sort it. For example,
> > this would be sorting a large report (several GB) file by timestamp. As
> of
> > right now, this isn't built into Beam. I think it should be.
> >
> > I'll hold out Crunch's Sort
> > <
> https://crunch.apache.org/apidocs/0.11.0/org/apache/crunch/lib/Sort.html>
> > class as an example of what this class could look like.
> >
> > Thanks,
> >
> > Jesse
> >
>


Add Sorting Class?

2016-05-26 Thread Jesse Anderson
This is somewhat the continuation of my thread "Writing Out List."

Right now, the only way to do sorting is with the Top class. This works
well, but has the constraint of fitting in memory.

A common batch use case is to take a large file and sort it. For example,
this would be sorting a large report (several GB) file by timestamp. As of
right now, this isn't built into Beam. I think it should be.

I'll hold out Crunch's Sort

class as an example of what this class could look like.

Thanks,

Jesse


Re: Process / contribution guide

2016-05-09 Thread Jesse Anderson
I'd add some more information about checkstyle. That's the one that trips
me up. I haven't dealt with checkstyle before and the rules violations
output in mvn isn't very clear.

Right now it's discussed in the committer section, but it affects
contributors too.

I'd add some discussion of the common ones I hit like extra whitespace and
import order. I think the other ones were pretty clear. I'd also add the
mvn command to only run a checkstyle check. Also add the command to skip
the checkstyle (mvn clean install -Dcheckstyle.skip=true).

Thanks,

Jesse

On Sun, May 8, 2016 at 12:52 PM Jean-Baptiste Onofré 
wrote:

> Hi Davor,
>
> My bad: I did a cherry-pick instead of a merge on a PR.
>
> I will now be careful to apply a merge.
>
> Sorry about that.
>
> Regards
> JB
>
> On 05/08/2016 09:08 PM, Davor Bonaci wrote:
> > Hi everyone,
> > I wanted to send a quick remainder that we should all try to follow our
> own
> > contribution guide.
> >
> > Recently, there have been several cases where commits didn't go through
> the
> > pull requests / review, pull requests that were merge differently, not
> > closed automatically by tooling, etc.
> >
> > I'd kindly ask to try your best to follow our own process. That said, we
> > now have more experience in this type of development -- if there's any
> > point that should be re-discussed, please bring it up for consideration.
> >
> > Thanks!
> >
> > Davor
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>