Re: Jacek's new Apache Beam Internals Project

2020-05-03 Thread Holden Karau
over time? > > On Tue, Apr 28, 2020 at 1:48 PM Ismaël Mejía wrote: > >> The tweet URL for ref in case someone wants to like/RT >> >> https://twitter.com/jaceklaskowski/status/1255046717277376512?s=19 >> >> On Tue, Apr 28, 2020, 8:04 PM Holden Karau wrote: >&g

Jacek's new Apache Beam Internals Project

2020-04-28 Thread Holden Karau
Hi Folks, I just saw Jacek's tweet about his new Beam Internals project (he's done a great job on his Spark Internals documentation and blog posts) and I figured I'd share the link https://leanpub.com/the-internals-of-apache-beam in case folks are interested :) Cheers, Holden :) -- Twitter:

Re: Reference to Beam in upcoming Kubeflow Book

2020-04-17 Thread Holden Karau
On Fri, Apr 17, 2020 at 3:52 PM Robert Bradshaw wrote: > On Fri, Apr 17, 2020 at 2:56 PM Holden Karau wrote: > >> >> On Fri, Apr 17, 2020 at 2:45 PM Robert Bradshaw >> wrote: >> >>> Hi Holden! >>> >>> I agree with Kyle that it makes se

Re: Reference to Beam in upcoming Kubeflow Book

2020-04-17 Thread Holden Karau
want to over-promise. >> >> I'm not so sure about the status of Dataflow here, perhaps someone else >> can comment on that. >> >> Looking forward to the book :) >> >> Kyle >> >> On Fri, Apr 17, 2020 at 1:14 PM Holden Karau >> wrote: >> &

Re: Reference to Beam in upcoming Kubeflow Book

2020-04-17 Thread Holden Karau
ponents that’s not supported. > >> Looking forward to the book :) >> >> Kyle >> >> On Fri, Apr 17, 2020 at 1:14 PM Holden Karau >> wrote: >> >>> Hi Apache Beam Developers, >>> >>> I'm working on a book about Kubeflow,

Reference to Beam in upcoming Kubeflow Book

2020-04-17 Thread Holden Karau
Hi Apache Beam Developers, I'm working on a book about Kubeflow, which naturally has a section on TFX. I want to set users expectations correctly so I wanted to know what y'all thought of this NOTE we were thinking of including in the early release: Apache Beam’s Python support outside of Google

Re: [ANNOUNCE] New committer: Robert Burke

2019-07-16 Thread Holden Karau
Congratulations! :) On Tue, Jul 16, 2019 at 10:50 AM Mikhail Gryzykhin wrote: > Congratulations! > > On Tue, Jul 16, 2019 at 10:36 AM Ankur Goenka wrote: > >> Congratulations Robert! >> >> Go GO! >> >> On Tue, Jul 16, 2019 at 10:34 AM Rui Wang wrote: >> >>> Congrats! >>> >>> >>> -Rui >>> >>>

Re: New Edit button on beam.apache.org pages

2018-10-29 Thread Holden Karau
So awesome :) On Mon, Oct 29, 2018, 7:50 AM Etienne Chauchot Cool ! > Thanks > Etienne > Le mercredi 24 octobre 2018 à 14:24 -0700, Alan Myrvold a écrit : > > To make small documentation changes easier, there is now an Edit button at > the top right of the pages on https://beam.apache.org. This

Re: Live coding & reviewing adventures

2018-08-02 Thread Holden Karau
Ok Gris has an even more delayed laptop so I'm going to push it out a week and hope it shows up for then. Sorry about that one and thanks for everyone who tuned in for the Go SDK one :) On Mon, Jul 30, 2018 at 1:54 PM, Holden Karau wrote: > So small schedule changes. > I’ll be doing some

Re: Live coding & reviewing adventures

2018-07-30 Thread Holden Karau
, 2018 at 8:41 PM Holden Karau wrote: > I'll be doing this again this week & next looking at a few different > topics. > > Tomorrow (July 25th @ 10am pacific) Gris & I will be updating the PR from > my last live stream (adding Python dependency handling) - > ht

Re: Live coding & reviewing adventures

2018-07-24 Thread Holden Karau
outube.com/user/holdenkarau>. On Fri, Jul 13, 2018 at 11:54 AM, Holden Karau wrote: > Hi folks! I've been doing some live coding in my other projects and I > figured I'd do some with Apache Beam as well. > > Today @ 3pm pacific I'm going be doing some impromptu exploration bett

Re: Proof-of-concept Beam PR dashboard (based off of Spark's PR dashboard) to improve discoverability

2018-07-24 Thread Holden Karau
6d164689a@%3Cdev.beam.apache.org%3E>, > "whose turn" feature was a popular request for the dashboard because it is > hard to know whose attention is needed at any moment. > How much effort is needed to implement such feature on top of the > dashboard? > > On Fri, Jul

Re: Live coding & reviewing adventures

2018-07-18 Thread Holden Karau
urllib3==1.23 wcwidth==0.1.7 Werkzeug==0.14.1 widgetsnbextension==3.2.1 On Wed, Jul 18, 2018 at 8:19 AM, Holden Karau wrote: > That’s a thing I’ve been thinking about but haven’t had the time to do > yet. It’s a bit tricky because I don’t always know what I’m doing before I > start and re

Re: Live coding & reviewing adventures

2018-07-18 Thread Holden Karau
ght about creating some sort of index page for your past > live streams? > At least for the non-review ones it can provide great value given that > searching videos is not the easiest thing to do. > On Wed, Jul 18, 2018 at 12:51 AM Holden Karau > wrote: > > > > Sure! I’ll

Re: Live coding & reviewing adventures

2018-07-17 Thread Holden Karau
Sure! I’ll respond with a pip freeze when I land. On Tue, Jul 17, 2018 at 2:28 PM Suneel Marthi wrote: > Could u publish the python transitive deps some place that have the > Beam-Flink runner working ? > > On Tue, Jul 17, 2018 at 5:26 PM, Holden Karau > wrote: > >&

Re: Live coding & reviewing adventures

2018-07-17 Thread Holden Karau
er if > you use local file system. > > > > > On Tue, Jul 17, 2018 at 2:27 PM Holden Karau wrote: > >> And I've got an hour to kill @ SFO today so at some of the suggestions >> from folks I'm going to do a more user focused one trying getting the TFT >>

Re: Live coding & reviewing adventures

2018-07-17 Thread Holden Karau
And I've got an hour to kill @ SFO today so at some of the suggestions from folks I'm going to do a more user focused one trying getting the TFT demo to work with the portable flink runner (hopefully) - https://www.youtube.com/watch?v=wL9mvQeN36E On Fri, Jul 13, 2018 at 11:54 AM, Holden Karau

Re: CODEOWNERS for apache/beam repo

2018-07-17 Thread Holden Karau
gt; > >> This sounds like a good plan. Did we want to rename the CODEOWNERS file > to disable github's mass adding of reviewers while we figure this out? > >> > >> Andrew > >> > >> On Mon, Jul 16, 2018 at 10:20 AM Jean-Baptiste Onofré > wrote: > &g

Re: CODEOWNERS for apache/beam repo

2018-07-16 Thread Holden Karau
Ok if no one objects I'll create the INFRA ticket after OSCON and we can test it for a week and decide if it helps or hinders. On Mon, Jul 16, 2018, 7:12 PM Jean-Baptiste Onofré wrote: > Agree to test it for a week. > > Regards > JB > Le 16 juil. 2018, à 18:59, Holden

Re: CODEOWNERS for apache/beam repo

2018-07-16 Thread Holden Karau
t; (path_pattern: /sdks/java/core*) >>> Suggested reviewers: @echauchot, @lukecwik, @pabloem >>> >>> Script is in: https://github.com/apache/beam/pull/5951 >>> >>> >>> What does the community think? Do you prefer blame-based or rules-based >>> rev

Proof-of-concept Beam PR dashboard (based off of Spark's PR dashboard) to improve discoverability

2018-07-13 Thread Holden Karau
Took me waaay longer than planed, and the regexes and components could use some work, but I've got a quick Beam PR dashboard up at https://boos-demo-projects-are-rad.appspot.com/. The code is a fork of the Spark one, and its at https://github.com/holdenk/spark-pr-dashboard/tree/support-beam in the

Re: Live coding & reviewing adventures

2018-07-13 Thread Holden Karau
Jul 13, 2018 at 12:33 PM Innocent Djiofack > wrote: > >> Thanks I think this will be super useful. I will tune in. >> >> On Fri, Jul 13, 2018 at 2:54 PM Holden Karau >> wrote: >> >>> Hi folks! I've been doing some live coding in my other projects and I

Live coding & reviewing adventures

2018-07-13 Thread Holden Karau
Hi folks! I've been doing some live coding in my other projects and I figured I'd do some with Apache Beam as well. Today @ 3pm pacific I'm going be doing some impromptu exploration better review tooling possibilities (looking at forking spark-pr-dashboard for other projects like beam and setting

Re: CODEOWNERS for apache/beam repo

2018-07-13 Thread Holden Karau
I'm looking at something similar in the Spark project, and while it's now archived by FB it seems like something like https://github.com/facebookarchive/mention-bot might do what we want. I'm going to spin up a version on my K8 cluster and see if I can ask infra to add a webhook and if it works

Re: Python Development Environments for Apache Beam

2018-06-20 Thread Holden Karau
Do you happen to have a tweet we reshould RT for reach? On Wed, Jun 20, 2018, 11:26 AM Josh McGinley wrote: > Beam Users and Dev - > > I recently published a medium article >

Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-06-01 Thread Holden Karau
Congrats all! On Fri, Jun 1, 2018 at 12:12 AM Ismaël Mejía wrote: > Congratulations! > > On Fri, Jun 1, 2018 at 8:26 AM Pei HE wrote: > > > > Congrats! > > > > On Fri, Jun 1, 2018 at 2:12 PM, Charles Chen wrote: > > > Congratulations everyone! > > > > > > > > > On Thu, May 31, 2018, 10:14 PM

Re: [VOTE] Go SDK

2018-05-22 Thread Holden Karau
+1 (non-binding), I've had a chance to work with the SDK and it's pretty neat to see Beam add support for a language before the most of the big data ecosystem. On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré wrote: > Hi Henning, > > SGA has been filed for the entire

Re: Splittable DoFN in Spark discussion

2018-04-26 Thread Holden Karau
a Kafka topic on which names of Kafka topics >>>>> arrive, and we may end up concurrently reading a continuously growing >>>>> number of topics. >>>>> - 2: The work per element is not necessarily infinite, it's just *not >>>>> guaranteed

Re: Add a (temporary) Portable Flink branch to the ASF repo?

2018-04-12 Thread Holden Karau
So I would be strongly in favour of adding it as a branch on the Apache repo. This way other folks are more likely to be able to help with the splitting up and merging process and also while Flink forward is behind us getting in the practice of doing feature branches on the ASF repo for

Re: [PROPOSAL] Python 3 support

2018-03-27 Thread Holden Karau
gt;> >>> We have drawn up a document [1] with a high level outline of the >>> proposed approach and would like to get your feedback on this. >>> >>> The main Jira issue [2] for python 3 support has been mostly inactive >>> for the past year. Othe

Re: Splittable DoFN in Spark discussion

2018-03-25 Thread Holden Karau
rk will enable SDF >> implementation (and real streaming)? >> >> Thanks, >> Thomas >> >> >> On Sat, Mar 24, 2018 at 3:22 PM, Holden Karau <hol...@pigscanfly.ca> >> wrote: >> >>> >>> On Sat, Mar 24, 2018 at 1:23 PM Eugene Kirp

Re: Splittable DoFN in Spark discussion

2018-03-25 Thread Holden Karau
That would certainly be good. On Sun, Mar 25, 2018 at 9:01 PM, Thomas Weise <t...@apache.org> wrote: > Hopefully the new "continuous processing mode" in Spark will enable SDF > implementation (and real streaming)? > > Thanks, > Thomas > > > On Sat, Ma

Re: Splittable DoFN in Spark discussion

2018-03-24 Thread Holden Karau
On Sat, Mar 24, 2018 at 1:23 PM Eugene Kirpichov <kirpic...@google.com> wrote: > > > On Fri, Mar 23, 2018, 11:17 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >> On Fri, Mar 23, 2018 at 7:00 PM Eugene Kirpichov <kirpic...@google.com> >> wrote: >>

Re: Splittable DoFN in Spark discussion

2018-03-23 Thread Holden Karau
On Fri, Mar 23, 2018 at 7:00 PM Eugene Kirpichov <kirpic...@google.com> wrote: > On Fri, Mar 23, 2018 at 6:49 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >> On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >&g

Re: Splittable DoFN in Spark discussion

2018-03-23 Thread Holden Karau
On Fri, Mar 23, 2018 at 6:20 PM Eugene Kirpichov <kirpic...@google.com> wrote: > On Fri, Mar 23, 2018 at 6:12 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >> On Fri, Mar 23, 2018 at 5:58 PM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>

Re: Splittable DoFN in Spark discussion

2018-03-14 Thread Holden Karau
wrote: > >> Could we alternatively use a state mapping function to keep track of the >> computation so far instead of outputting V each time? (also the progress so >> far is probably of a different type R rather than V). >> >> >> On Wed, Mar 14, 2018 at 4:28 PM Ho

Splittable DoFN in Spark discussion

2018-03-14 Thread Holden Karau
So we had a quick chat about what it would take to add something like SplittableDoFns to Spark. I'd done some sketchy thinking about this last year but didn't get very far. My back-of-the-envelope design was as follows: For input type T Output type V Implement a mapper which outputs type (T, V)

Re: Merging Python code? Help avoid Python 3 regressions with these two simple steps :)

2018-03-02 Thread Holden Karau
at is >> the best way to approach this? >> >> On Fri, Mar 2, 2018 at 9:50 AM, Holden Karau <holden.ka...@gmail.com> >> wrote: >> >>> I agree, however I'm of the impression it's blocked on infra? (e.g. it's >>> important but out of my hands). >

Re: Merging Python code? Help avoid Python 3 regressions with these two simple steps :)

2018-03-02 Thread Holden Karau
e.org/jira/browse/BEAM-3671). I would > appreciate if folks pay attention to these 2 steps but I am worried that it > will be easily forgotten. > > On Thu, Mar 1, 2018 at 6:51 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > >> I may have watched too many buzzfeed video

Merging Python code? Help avoid Python 3 regressions with these two simple steps :)

2018-03-01 Thread Holden Karau
I may have watched too many buzzfeed videos this week but the steps are: 1) git checkout the PR in question 2) Run tox -e lint_py2,lint_py3 This is important since Python 3 isn't installed on the Jenkins workers just yet and we have some tests to catch basic invalid Python 3 which we can slowly

Re: Python 3 flake 8: splitting up on the errors?

2018-02-28 Thread Holden Karau
a list, could you either share the list or create individual > JIRAs so that we can track the work among us. > > On Tue, Feb 27, 2018 at 4:53 PM, Holden Karau <hol...@pigscanfly.ca> > wrote: > >> How would folks feel about splitting up some of the Python 3 migration >&

Python 3 flake 8: splitting up on the errors?

2018-02-27 Thread Holden Karau
How would folks feel about splitting up some of the Python 3 migration work by the different flake8 errors in Py3? This might allow us to parallelize some of the work while still keeping things fairly small? -- Twitter: https://twitter.com/holdenkarau

Re: Python 3 reviewers

2018-02-22 Thread Holden Karau
d >> have been reviewing some (simple) PRs in this direction. As long as >> they're broken up into small enough chunks, feel free to send some my >> way. >> >> On Thu, Feb 22, 2018 at 3:59 PM, Holden Karau <hol...@pigscanfly.ca> >> wrote: >> > Hi Y'a

Python 3 reviewers

2018-02-22 Thread Holden Karau
Hi Y'all, I'm trying to make some progress on Python 3 support for Beam but I'm having a bit of difficulty finding people with review bandwidth. Are there any committers with time to spare who would be willing to work on this? If not no worries I'll refocus my efforts elsewhere :) Cheers,

Re: [PROPOSAL] Switch from Guava futures vs Java 8 futures

2018-02-02 Thread Holden Karau
For what it's worth there exists a relatively easy Java8 to Scala future conversion so this shouldn't cause an issue on the Spark runner. On Thu, Feb 1, 2018 at 11:22 PM, Alexey Romanenko wrote: > +1, sounds great! > > Regards, > Alexey > > > On 2 Feb 2018, at 07:14,

FOSDEM mini office hour?

2018-01-31 Thread Holden Karau
Hi BEAM Friends, If any folks are around for FOSDEM this year I was planning on doing a coffee office hour on the last day after my talks . Maybe like 6pm? I'm also going to see if any Spark folks are around and interested :) Cheers,

Re: Strata Conference this March 6-8

2018-01-16 Thread Holden Karau
: > > +1 to BoF > > On Tue, Jan 16, 2018 at 5:00 PM, Dmitry Demeshchuk <dmi...@postmates.com> > wrote: > > Probably won't be attending the conference, but totally down for a BoF. > > On Tue, Jan 16, 2018 at 4:58 PM, Holden Karau <hol...@pigscanfly.ca> > wrote:

Re: Strata Conference this March 6-8

2018-01-16 Thread Holden Karau
Do interested folks have any timing constraints around a BoF? On Tue, Jan 16, 2018 at 4:30 PM, Jesse Anderson wrote: > +1 to BoF. I don't know if any Beam talks will be on the schedule. > > > We could do an informal BoF at the Philz nearby or similar? > --

Re: Strata Conference this March 6-8

2018-01-16 Thread Holden Karau
We could do an informal BoF at the Philz nearby or similar? On Wed, Jan 17, 2018 at 11:23 AM Eugene Kirpichov wrote: > I'm giving a talk about splittable DoFn's > https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63696?locale=zh > > There are no other

Re: Pushing daily/test containers for python

2017-12-21 Thread Holden Karau
So I think we (or more accurately the PMC) need to be careful with how we post the container artifacts from an Apache POV since they most likely contain non-Apache licensed code (and also posting daileys can be conolicated since the PMC hasn’t voted on each one). For just testing it should

Re: Introduction + interest in helping Beam builds, tests, and releases

2017-12-07 Thread Holden Karau
Also, and I know this is maybe a bit beyond the scope of what would make sense initially, but if you wanted to set up something to test BEAM against the new Spark/Flink RCs we could give feedback about any breaking changes we see in upstream projects and I’d be happy to help with that :) On Fri,

Re: Schema-Aware PCollections

2017-11-30 Thread Holden Karau
the community to define these APIs, and to make sure we're covering all >> relevant use cases. >> > > Thanks for sharing this Reuven, I'm excited to see this being discussed. > One global comment: all of the existing examples are in Java. It would be > great if we could des

Re: [VOTE] Fixing @yyy.com.INVALID mailing addresses

2017-11-22 Thread Holden Karau
+1 (non-binding) On Wed, Nov 22, 2017 at 4:06 PM Kenneth Knowles wrote: > +1 > > On Wed, Nov 22, 2017 at 3:43 PM, Lukasz Cwik > wrote: > > > +1 > > > > On Wed, Nov 22, 2017 at 3:35 PM, Reuven Lax > > wrote: > > > > >

Re: [VOTE] Choose the "new" Spark runner

2017-11-20 Thread Holden Karau
[ ] Use Spark 1 & Spark 2 Support Branch [ X ] Use Spark 2 Only Branch non-binding On Mon, Nov 20, 2017 at 1:00 AM, Etienne Chauchot wrote: > [ ] Use Spark 1 & Spark 2 Support Branch > [X] Use Spark 2 Only Branch > > Best > Etienne > > > > Le 19/11/2017 à 13:56,

Re: Questions with containerized runners plans?

2017-11-18 Thread Holden Karau
>https://s.apache.org/beam-mixed-language-pipelines. > > It is also linked from design section in the portability page. > > Thanks, > Henning > > > On Sat, Nov 18, 2017 at 6:33 AM, Holden Karau <hol...@pigscanfly.ca> > wrote: > > > So I was looking th

Questions with containerized runners plans?

2017-11-18 Thread Holden Karau
So I was looking through https://beam.apache.org/contribute/portability/ which lead me to BEAM-2900, and then to https://docs.google.com/document/d/1n6s3BOxOPct3uF4UgbbI9O9rpdiKWFH9R6mtVmR7xp0/edit# . I was wondering if there is any considerations being given to native dependencies that user code

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-08 Thread Holden Karau
phase. Maybe even just flag spark 1 as deprecated > and just being maintained might be enough. > > On Wed, Nov 8, 2017 at 10:25 PM, Holden Karau <hol...@pigscanfly.ca> > wrote: > > > Also, upgrading Spark 1 to 2 is generally easier than changing JVM > > versions. Fo

Re: Portability overview webpage

2017-11-08 Thread Holden Karau
Awesome! Out of interest is there any discussion around common formats for interchange going on? On Tue, Nov 7, 2017 at 9:15 AM, Henning Rohde wrote: > Thanks everyone! The page is now live at: > >https://beam.apache.org/contribute/portability/ > > Henning > > On

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-08 Thread Holden Karau
Also, upgrading Spark 1 to 2 is generally easier than changing JVM versions. For folks using YARN or the hosted environments it pretty much trivial since you can effectively have distinct Spark clusters for each job. On Wed, Nov 8, 2017 at 9:19 PM, Holden Karau <hol...@pigscanfly.ca>

Re: python3 support schedule

2017-11-05 Thread Holden Karau
If anyone wants to help on the inference stuff: https://issues.apache.org/jira/browse/BEAM-3143 + WIP PR @ https://github.com/apache/beam/pull/4079 . On Sat, Nov 4, 2017 at 11:31 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > I've got some work progress, although right now

Re: python3 support schedule

2017-11-05 Thread Holden Karau
ython3 > binary on its workers) once the core work is completed. > > Ahmet > > On Thu, Nov 2, 2017 at 12:58 PM, Jesse Anderson <je...@bigdatainstitute.io > > > wrote: > > > Holden is being modest in her contributions to Python frameworks, > > especially Apache Sp

Re: python3 support schedule

2017-11-02 Thread Holden Karau
Hi! So this is something I'm currently working on (e.g. in between checking my e-mails :p). If you want to help join in we can split up the work into smaller components and parallelize the process a bit :) Always happy to see more folks who care about Python 3 support. On Thu, Nov 2, 2017 at

Re: [VOTE] Migrate to gitbox

2017-10-11 Thread Holden Karau
+1 (non-binding) On Wed, Oct 11, 2017 at 12:25 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > +1 > > On Wed, Oct 11, 2017 at 10:53 AM, Lukasz Cwik > wrote: > > > +1 > > > > On Tue, Oct 10, 2017 at 12:55 PM, Jason Kuster < > >

Re: Beam spark 2.x runner status

2017-08-21 Thread Holden Karau
I'd love to take a look at the PR when it comes in (<3 BEAM + SPARK :)). On Mon, Aug 21, 2017 at 11:33 AM, Jean-Baptiste Onofré wrote: > Hi > > I did a new runner supporting spark 2.1.x. I changed code for that. > > I'm still in vacation this week. I will send an update when