Re: (Virtual) Beam Developers Meetup - 5/4 @ 8am PDT

2016-05-04 Thread Jesse Anderson
Is the link still coming out? On Tue, May 3, 2016 at 10:55 PM Jean-Baptiste Onofré wrote: > Hi Frances, > > thanks for the update. It looks good to me. > > Regards > JB > > On 05/04/2016 12:40 AM, Frances Perry wrote: > > Current plan is to stick with Google Hangouts. We'll send a link 10 > minu

Re: (Virtual) Beam Developers Meetup - 5/4 @ 8am PDT

2016-05-04 Thread Jesse Anderson
LID wrote: > > Hi JB, > > Did not see the link in the mailing list. was it shared somewhere else? > > K > > > > On Wed, May 4, 2016 at 8:05 AM, Jean-Baptiste Onofré > wrote: > > > > Hi Jesse, > > > > Frances already sent the link. > > >

Re: Process / contribution guide

2016-05-09 Thread Jesse Anderson
I'd add some more information about checkstyle. That's the one that trips me up. I haven't dealt with checkstyle before and the rules violations output in mvn isn't very clear. Right now it's discussed in the committer section, but it affects contributors too. I'd add some discussion of the commo

TypeDescriptors Example Code

2016-05-16 Thread Jesse Anderson
Does anyone have any thoughts or concerns with me changing the example code to use the new TypeDescriptors class from the inline creation of a TypeDescriptor? For example, MinimalWordCountJava8 would change from: p.apply(TextIO.Read.from("gs://dataflow-samples/shakespeare/*")) .apply(FlatMa

Re: [PROPOSAL] Beam FAQ

2016-05-23 Thread Jesse Anderson
I think Josh's Crunch User Guide is a great example of what a user guide should cover. https://crunch.apache.org/user-guide.html On Mon, May 23, 2016 at 2:00 AM Ismaël Mejía wrote: > Ok, I agree Davor for end users a getting started guide is not only > important but I would say critical at this

Re: [PROPOSAL] IRC or slack channel for Apache Beam

2016-05-24 Thread Jesse Anderson
Me too On Tue, May 24, 2016, 7:37 AM Jean-Baptiste Onofré wrote: > Done > > Regards > JB > > On 05/24/2016 04:31 PM, Simone Robutti wrote: > > I would like to join, if it's possible. Thanks :) > > > > 2016-05-24 14:55 GMT+02:00 Jean-Baptiste Onofré : > > > >> Good idea ! > >> > >> Thanks ! > >>

Add Sorting Class?

2016-05-26 Thread Jesse Anderson
This is somewhat the continuation of my thread "Writing Out List." Right now, the only way to do sorting is with the Top class. This works well, but has the constraint of fitting in memory. A common batch use case is to take a large file and sort it. For example, this would be sorting a large rep

Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
odel on all this ;-) I > think these ideas would translated to windowed PCollections ok, but would > want to think carefully about it. > > [1] https://crunch.apache.org/user-guide.html#sorting > [2] > > https://cloud.google.com/blog/big-data/2016/05/no-shard-left-behind-dynamic-

Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
s, e.g. given a FileBasedSink and intra/inter-shard-sorting > specifications, one could produce a bounded sink that writes "sorted" > files. Lots of design work TBD... > > - Robert > > > > > On Thu, May 26, 2016 at 11:32 AM, Jesse Anderson > wrote: > > @franc

Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
2016 at 12:35 PM Jesse Anderson wrote: > I had a similar thought, but wasn't sure if that violated a tenet of Beam. > > I'm thinking an ordered sink could wrap around another sink. I could see > something like: > collection.apply(OrderedSink.Timestamp.write(TextIO.Write

Re: One more streaming engine in OSS

2016-06-07 Thread Jesse Anderson
Here's a writeup I did on Heron. http://www.jesse-anderson.com/2016/06/the-case-for-heron/ @nitin are you going to write a Concord runner? On Tue, Jun 7, 2016 at 12:37 PM Nitin Lamba wrote: > It gets better: > http://concord.io > > :) > > On Tue, Jun 7, 2016 at 9:28 AM, Dan Halperin > wrote: >

Re: [RESULT] [VOTE] Release version 0.1.0-incubating

2016-06-11 Thread Jesse Anderson
Congrats on the first release! On Sun, Jun 12, 2016, 7:50 AM Davor Bonaci wrote: > I'm happy to announce that we have unanimously approved this release. > > There are 10 approving votes, 9 of which are binding: > * Davor Bonaci > * Robert Bradshaw > * Ben Chambers > * Dan Halperin > * Kenneth Kn

Talking About Beam

2016-06-14 Thread Jesse Anderson
I wrote a piece published on O'Reilly about Beam https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code?utm_medium=social&utm_source=twitter.com&utm_campaign=lgen&utm_content=data+article+ki&cmp=tw-data-na-article-lgen_tw_article. It gives some of the thoughts and ideas that will help

Re: Talking About Beam

2016-06-15 Thread Jesse Anderson
Israel last month > because I totally agree that it's a great starting point for startups, but > I think this is an example why not just startups :) > > Thanks, > Amit > > On Wed, Jun 15, 2016 at 9:58 AM Jesse Anderson > wrote: > > > I wrote a piece p

Beam Interview

2016-07-07 Thread Jesse Anderson
I've been thinking about ways to get more Beam information out there without too much fuss over getting everything right. I came up with a written Q and A that represents the most common questions I get. Answering the questions should take 5-10 minutes. I think it will go a long ways towards getti

Re: Beam Interview

2016-07-07 Thread Jesse Anderson
to the code base. > > On Thu, Jul 7, 2016 at 8:18 PM Jesse Anderson > wrote: > > > I've been thinking about ways to get more Beam information out there > > without too much fuss over getting everything right. I came up with a > > written Q and A t

Re: Beam Interview

2016-07-07 Thread Jesse Anderson
Thanks Aparup On Thu, Jul 7, 2016, 9:26 PM Aparup Banerjee (apbanerj) wrote: > I have just given my thoughts in it. We at Cisco are using beam at > multiple of our projects. > > Thanks, > Aparup > > > > > On 7/7/16, 5:18 PM, "Jesse Anderson" wrote: &

Re: Beam Interview

2016-07-11 Thread Jesse Anderson
; > > > On Fri, Jul 8, 2016 at 1:51 AM, Sergio Fernández > > wrote: > > > > > Great idea! > > > > > > On Fri, Jul 8, 2016 at 7:44 AM, Jean-Baptiste Onofré > > > wrote: > > > > > > > Hi Jesse, > > > > > > > &g

Re: Beam Interview

2016-07-11 Thread Jesse Anderson
Thanks! On Mon, Jul 11, 2016 at 1:02 PM Ismaël Mejía wrote: > Great Idea, I just added my answers, English is not my native language, so > feel free to edit if you find any grammatical mistakes, sorry. > > Ismael > > On Mon, Jul 11, 2016 at 7:12 PM, Jesse Anderson >

Re: Beam Interview

2016-07-12 Thread Jesse Anderson
Frances Perry > wrote: > > > Love this, Jesse! And pretty inspired reading the answers so far ;-) > > > > On Mon, Jul 11, 2016 at 1:42 PM, Jesse Anderson > > wrote: > > > > > Thanks! > > > > > > On Mon, Jul 11, 2016 at 1:02 PM Ismaël Mejía

Re: Beam Interview

2016-07-13 Thread Jesse Anderson
ks, Jesse On Tue, Jul 12, 2016 at 1:51 PM Jesse Anderson wrote: > Last call. If you want your words of wisdom forever kept in the annals of > Apache Beam lore, I'm publishing tomorrow (7-13) at 9 AM PT. > > > On Mon, Jul 11, 2016 at 11:13 PM Tyler Akidau wrote: > >

Re: [KUDOS] Contributed runner: Gearpump!

2016-07-20 Thread Jesse Anderson
Thanks and congrats! On Wed, Jul 20, 2016, 7:27 PM Kenneth Knowles wrote: > Hi all, > > I would like to call attention to a huge contribution to Beam: a runner for > Apache Gearpump (incubating). > > The runner landed on the gearpump-runner feature branch today. Check it > out! And contribute to

DoFN Lamdba

2016-08-08 Thread Jesse Anderson
Resurrecting a thread from the users list of the same name. I hacked together an example of what this code could look like. I created a modified MapElements

Re: Remove legacy import-order?

2016-08-23 Thread Jesse Anderson
Please. That's the one that always trips me up. On Tue, Aug 23, 2016, 4:10 PM Ben Chambers wrote: > When Beam was contributed it inherited an import order [1] that was pretty > arbitrary. We've added org.apache.beam [2], but continue to use this > ordering. > > Both Eclipse and IntelliJ default

JavaDoc

2016-09-15 Thread Jesse Anderson
Only the 0.1.0 JavaDoc is on the website . It should have 0.2.0. Thanks, Jesse

Re: JavaDoc

2016-09-15 Thread Jesse Anderson
has pending PRs (pull/38) to update the site > after 0.2.0. > > On Thu, Sep 15, 2016 at 8:26 AM, Jesse Anderson > wrote: > > > Only the 0.1.0 JavaDoc is on the website > > <http://beam.incubator.apache.org/learn/sdks/javadoc/>. It should have > > 0.2.0. > > > > Thanks, > > > > Jesse > > >

Re: FYI: All Runners Tested In Precommit

2016-09-15 Thread Jesse Anderson
Excellent! On Thu, Sep 15, 2016 at 12:18 PM Frances Perry wrote: > Awesome! Strong tests are hugely important in a project with so many > diverse components. > > On Thu, Sep 15, 2016 at 12:16 PM, Jason Kuster invalid> wrote: > > > Hi all, > > > > Just a quick update -- as of yesterday all new P

Re: JavaDoc

2016-09-15 Thread Jesse Anderson
Nevermind on the nightlies. They were they there. I had the wrong URL. On Thu, Sep 15, 2016 at 10:48 AM Jesse Anderson wrote: > Do you know if that includes putting the 0.3.0 nightlies up? Right now, > only 0.2.0 is there > <https://repository.apache.org/content/repositories/s

Re: JavaDoc

2016-09-16 Thread Jesse Anderson
avadoc on the Apache SNAPSHOT repo: > > > https://repository.apache.org/content/groups/snapshots/org/apache/beam/beam-sdks-java-core/0.3.0-incubating-SNAPSHOT/*javadoc.jar > > Regards > JB > > On 09/15/2016 05:26 PM, Jesse Anderson wrote: > > Only the 0.1.0 JavaDoc is on

Maven Compile Fails

2016-09-16 Thread Jesse Anderson
Is anyone else experiencing this while building with Maven? I'm having to clean each time. It only happens on beam-sdks-java-core. [INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @ beam-sdks-java-core --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 378 source

Re: IntervalWindow toString()

2016-09-19 Thread Jesse Anderson
TIL, thanks. On Mon, Sep 19, 2016 at 9:00 AM Ben Chambers wrote: > I think this is using http://www.mathwords.com/i/interval_notation.htm to > indicate that the interval includes the start time but not the end time. > > On Mon, Sep 19, 2016, 8:56 AM Jesse Anderson > wrote: >

IntervalWindow toString()

2016-09-19 Thread Jesse Anderson
The toString() to IntervalWindow starts with a square bracket and ends with a parenthesis. Is this a type of notation or a bug? Code: @Override public String toString() { return "[" + start + ".." + end + ")"; } Thanks, Jesse

Re: CEP / Pattern matching on top of Beam pipeline

2016-09-21 Thread Jesse Anderson
Still a feature that some would really like to see http://www.jesse-anderson.com/2016/07/question-and-answers-with-the-apache-beam-team/ On Wed, Sep 21, 2016 at 4:56 PM Aparup Banerjee (apbanerj) < apban...@cisco.com> wrote: > Hi Folks, > > Is anyone familiar with a CEP / Pattern matching library

Re: Preferred locations (or data locality) for batch pipelines.

2016-09-22 Thread Jesse Anderson
I've only ever seen that being used to figure out which file the runner/mapper/operation is working on. Otherwise, I haven't seen those operations care where in the file they're working. On Thu, Sep 22, 2016 at 5:57 AM Amit Sela wrote: > Wouldn't it force all runners to implement this for all di

Re: Preferred locations (or data locality) for batch pipelines.

2016-09-22 Thread Jesse Anderson
ion/data_locality.html > . > > As for Flink, it's a streaming-first engine (sort of the opposite of Spark, > being a batch-first engine) so I *assume* they don't have this notion and > simply "stream" input. > > Dataflow - no idea... > > On Thu, Sep 22

Strata+Hadoop World

2016-09-29 Thread Jesse Anderson
Tyler and I did a Beam tutorial at Strata+Hadoop World. It was 3 hours long and covered some of the basics of windowing and triggers. We had 66 people in attendance. That's a really good attendance given the newness of Beam. We demonstrated the Spark, Flink, Direct and DataFlow runners. The code

Command Line for Testing beamUseDummyRunner

2016-10-07 Thread Jesse Anderson
I've been trying to figure out the correct command to run the beamUseDummyRunner tests from the command line. I tried: mvn -DbeamUseDummyRunner=false -pl runners/direct-java,sdks/java/core/ test But that still doesn't put the DirectRunner on the classpath: java.lang.IllegalArgumentException: No R

Re: Documentation for IDE setup

2016-10-14 Thread Jesse Anderson
Last week I imported Beam with IntelliJ and everything worked. That said, I tried to import the Eclipse project and that doesn't compile anymore. I didn't have time to figure out what happened though. On Fri, Oct 14, 2016 at 1:21 AM Jean-Baptiste Onofré wrote: > Hi Christian, > > IntelliJ doesn

Re: Documentation for IDE setup

2016-10-14 Thread Jesse Anderson
of days ago. > > But IntelliJ is less worrisome than Eclipse. > > > > Straight Import. No Hassle. > > +1 to docs, though. > > > > On Fri, Oct 14, 2016 at 7:19 AM, Jean-Baptiste Onofré > > wrote: > > > > > [Troll] Who's using Eclipse

Re: [KUDOS] Contributed runner: Apache Apex!

2016-10-17 Thread Jesse Anderson
Awesome! On Mon, Oct 17, 2016 at 10:41 AM Thomas Weise wrote: > Thanks to Kenn for helping with the review and many questions! > > The focus till here has been on making the runner functional. I will start > creating JIRAs for follow-up work. > > Looking forward to the next steps to make it a to

Re: Introduction

2016-10-17 Thread Jesse Anderson
Neelesh, I saw you talked about the Hadoop MapReduce runner support too. I'd love to see that happen. When Tyler and I spoke at Strata NYC, I was surprised how many people were there with only MR code. This would definitely ease the testing burden if they can port to Beam and run on MR before goi

Re: Documentation for IDE setup

2016-10-17 Thread Jesse Anderson
n Oct 14, 2016, at 11:37 AM, Daniel Kulp wrote: >> >> >>> On Oct 14, 2016, at 10:06 AM, Jesse Anderson >>> wrote: >>> >>> Last week I imported Beam with IntelliJ and everything worked. >>> >>> That said, I tried to import the

Re: Exploring Performance Testing

2016-10-18 Thread Jesse Anderson
I found data Artisan's benchmarking post . They also shared the code . I didn't dig in much, but they did a wide range of algorithms. They have th

Re: Exploring Performance Testing

2016-10-18 Thread Jesse Anderson
cally a skin on a database + some scripts to load and query data; but I > don't love it. Do other Apache projects do public, long-term benchmarking > and performance regression testing? > > Dan > > On Tue, Oct 18, 2016 at 8:52 AM, Jesse Anderson > wrote: > > > I

Re: Start of release 0.3.0-incubating

2016-10-20 Thread Jesse Anderson
+1 to Davor's. I'd really like to see an 0.3.0 release because there have been big API changes between 0.2.0 and 0.3.0 like the DoFN changes. It'd be nice to stop pointing people to HEAD and back to a release. On Thu, Oct 20, 2016 at 10:17 AM Davor Bonaci wrote: > It's been a while since the las

Re: Introduction

2016-10-20 Thread Jesse Anderson
) > > Thanks, > Regards > JB > > On 10/17/2016 08:36 PM, Jesse Anderson wrote: > > Neelesh, > > > > I saw you talked about the Hadoop MapReduce runner support too. I'd love > to > > see that happen. When Tyler and I spoke at Strata NYC, I was surprised

Re: [ANNOUNCEMENT] New committers!

2016-10-21 Thread Jesse Anderson
t; Thomas authored the Apache Apex runner for Beam [1]. This is an > exciting > > > new runner that opens a new user base. It is a large contribution, > which > > > starts the whole new component with a great potential. > > > > > > * Jesse Anderson > &g

Re: [ANNOUNCEMENT] New committers!

2016-10-24 Thread Jesse Anderson
I suggest a seven feats model for becoming a committer https://en.m.wikipedia.org/wiki/Labours_of_Hercules. On Mon, Oct 24, 2016, 7:10 AM Christian Schneider wrote: > I was just a bit concerned that new committers might have to do > comparable contributions like the current three new committers

Re: [DISCUSS] Using Verbs for Transforms

2016-10-24 Thread Jesse Anderson
My original thought for this change was that Crunch uses the class name Distinct. SQL also uses the keyword distinct. Maybe the rule should be changed to adjectives or verbs depending on the context. Using a verb to describe this class really doesn't connote what the class does as succinctly as t

Re: [DISCUSS] Using Verbs for Transforms

2016-10-24 Thread Jesse Anderson
t; > Predicates module, etc. Just an aside; Beam isn't Google code. I > suggest > > we > > > use our judgment rather than a policy. > > > > > > I think "Distinct" is one of those exceptions. It is a standard > > widespread > > > name and als

Re: [DISCUSS] Using Verbs for Transforms

2016-10-25 Thread Jesse Anderson
We need to make a decision on this so Neelesh can finish his commit. Should we take a vote or something? On Tue, Oct 25, 2016, 7:55 AM Jean-Baptiste Onofré wrote: > Sounds good to me. > > ⁣​ > > On Oct 24, 2016, 19:11, at 19:11, je...@smokinghand.com wrote: > >I prefer MakeDistinct if we have to

Re: [DISCUSS] Using Verbs for Transforms

2016-10-25 Thread Jesse Anderson
. > >> > >> On Tue, Oct 25, 2016 at 10:26 PM Jean-Baptiste Onofré > > > >> wrote: > >> > >> Yes I would start a formal vote with the three proposals: descriptive > >> verb, adjective, verbs + adjective. > >> > >> Regards >

Re: Can we have more quick start examples ?

2016-10-27 Thread Jesse Anderson
Those tutorials help. I was going through the example code and had the same thought. We need to take a pass through the examples and remove some of the Google Cloud dependencies. On Thu, Oct 27, 2016, 5:13 PM Thomas Weise wrote: > The Beam tutorials seem to address this: > > https://github.com/e

Re: [DISCUSS] Using Verbs for Transforms

2016-10-27 Thread Jesse Anderson
before we go stable. > > >> > >>> > > >> > >>>I am not super strong that 1 > 2, but I am very strong that > > >> > >"Distinct" > > >> > >>>>>> > > >> > >>>"

Re: [DISCUSS] Using Verbs for Transforms

2016-11-01 Thread Jesse Anderson
t;> > >> It doesn't break the API and would address both SQL users > >> >and > >> >>> >more > >> >>> >> > >"big data" users. > >> >>> >> > >> > >> >>> >>

Tutorials

2016-11-04 Thread Jesse Anderson
Has anyone done a full day (~6 hours) tutorial on Beam yet? Thanks, Jesse

PCollection to PCollection Conversion

2016-11-08 Thread Jesse Anderson
This is a thread moved over from the user mailing list. I think there needs to be a way to convert a PCollection to PCollection Conversion. To do a minimal WordCount, you have to manually convert the KV to a String: p .apply(TextIO.Read.from("playing_cards.tsv"))

Re: PCollection to PCollection Conversion

2016-11-08 Thread Jesse Anderson
ems like a lot of > logic to contain in that transform which should just focus on writing to > files. > > On Tue, Nov 8, 2016 at 8:15 AM, Jesse Anderson > wrote: > > > This is a thread moved over from the user mailing list. > > > > I think there needs to be a way

Re: PCollection to PCollection Conversion

2016-11-08 Thread Jesse Anderson
had posted a question about using KV with TextIO.Write which wouldn't align with the proposed input format and still would require to write a type conversion function, this time from KV to Iterable instead of KV to string. On Tue, Nov 8, 2016 at 9:50 AM, Jesse Anderson wrote: > Lukasz, > &

Re: PCollection to PCollection Conversion

2016-11-10 Thread Jesse Anderson
gt;> > > > >> It seems useful for small scale debugging / demoing to have > > > >>> Dump.toString(). I think it should be named to clearly indicate its > > > >>> limited > > > >>> scope. Maybe other stuff co

Re: [DISCUSS] Graduation to a top-level project

2016-11-22 Thread Jesse Anderson
+1 On Tue, Nov 22, 2016 at 12:35 PM Frances Perry wrote: > +1 You might even say I'm beaming with pride ;-) > > On Tue, Nov 22, 2016 at 11:58 AM, Kenneth Knowles > wrote: > > > +1 !!! > > > > I especially love how the diversity of the community has contributed to > the > > conceptual growth an

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jesse Anderson
ter can be done as a simple transform. I think we should start with a simple string converter and plan for a format-specific writer. What are your thoughts? Thanks, Jesse On Thu, Nov 10, 2016 at 10:33 AM Jesse Anderson wrote: I was thinking about what the outputs would look like last night.

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jesse Anderson
> significant changes. > > > > On Tue, Nov 29, 2016 at 11:15 AM Jean-Baptiste Onofré > > wrote: > > > >> By the way Jesse, I gonna push my DATAFORMAT branch on my github and I > >> will post on the dev mailing list when done. > >> > >> R

Re: PCollection to PCollection Conversion

2016-11-29 Thread Jesse Anderson
very simple and stupid and of course not complete at all (I have > other commits but not merged as they need some polishing), but as I > said, it's a base of discussion. > > Regards > JB > > On 11/29/2016 09:23 PM, Jesse Anderson wrote: > > @jb Sounds good. Just

Re: [DISCUSS] Graduation to a top-level project

2016-12-08 Thread Jesse Anderson
Excellent! On Thu, Dec 8, 2016 at 3:43 PM Davor Bonaci wrote: > A quick update: the Apache Incubator has adopted the proposed graduation > resolution [1], and it is now presented to the ASF Board of Directors for > their consideration. > > Davor > > [1] > > https://lists.apache.org/thread.html/7