Re: Can apache beam be used for control flow (ETL workflow)

2023-12-17 Thread Austin Bennett
https://beamsummit.org/sessions/event-driven-movie-magic/

^^ the question made me think of that use case.  Though, unclear how close
it is to what you're thinking about.

Cheers -

On Fri, Dec 15, 2023 at 7:01 AM Byron Ellis via user 
wrote:

> As Jan says, theoretically possible? Sure. That particular set of
> operations? Overkill. If you don't have it already set up I'd say even
> something like Airflow is overkill here. If all you need to do is "launch
> job and wait" when a file arrives... that's a small script and not
> something that particularly requires a distributed data processing system.
>
> On Fri, Dec 15, 2023 at 4:58 AM Jan Lukavský  wrote:
>
>> Hi,
>>
>> Apache Beam describes itself as "Apache Beam is an open-source, unified
>> programming model for batch and streaming data processing pipelines, ...".
>> As such, it is possible to use it to express essentially arbitrary logic
>> and run it as a streaming pipeline. A streaming pipeline processes input
>> data and produces output data and/or actions. Given these assumptions, it
>> is technically feasible to use Apache Beam for orchestrating other
>> workflows, the problem is that it will very much likely not be efficient.
>> Apache Beam has a lot of heavy-lifting related to the fact it is designed
>> to process large volumes of data in a scalable way, which is probably not
>> what would one need for workflow orchestration. So, my two cents would be,
>> that although it _could_ be done, it probably _should not_ be done.
>>
>> Best,
>>
>>  Jan
>> On 12/15/23 13:39, Mikhail Khludnev wrote:
>>
>> Hello,
>> I think this page https://beam.apache.org/documentation/ml/orchestration/
>> might answer your question.
>> Frankly speaking: GCP Workflows and Apache Airflow.
>> But Beam itself is a data-stream/flow or batch processor; not a workflow
>> engine (IMHO).
>>
>> On Fri, Dec 15, 2023 at 3:13 PM data_nerd_666 
>> wrote:
>>
>>> I know it is technically possible, but my case may be a little special.
>>> Say I have 3 steps for my control flow (ETL workflow):
>>> Step 1. upstream file watching
>>> Step 2. call some external service to run one job, e.g. run a notebook,
>>> run a python script
>>> Step 3. notify downstream workflow
>>> Can I use apache beam to build a DAG with 3 nodes and run this as either
>>> flink or spark job.  It might be a little weird, but I just want to
>>> learn from the community whether this is the right way to use apache beam,
>>> and has anyone done this before? Thanks
>>>
>>>
>>>
>>> On Fri, Dec 15, 2023 at 10:28 AM Byron Ellis via user <
>>> user@beam.apache.org> wrote:
>>>
 It’s technically possible but the closest thing I can think of would be
 triggering things based on things like file watching.

 On Thu, Dec 14, 2023 at 2:46 PM data_nerd_666 
 wrote:

> Not using beam as time-based scheduler, but just use it to control
> execution orders of ETL workflow DAG, because beam's abstraction is also a
> DAG.
> I know it is a little weird, just want to confirm with the community,
> has anyone used beam like this before?
>
>
>
> On Thu, Dec 14, 2023 at 10:59 PM Jan Lukavský  wrote:
>
>> Hi,
>>
>> can you give an example of what you mean for better understanding? Do
>> you mean using Beam as a scheduler of other ETL workflows?
>>
>>   Jan
>>
>> On 12/14/23 13:17, data_nerd_666 wrote:
>> > Hi all,
>> >
>> > I am new to apache beam, and am very excited to find beam in apache
>> > community. I see lots of use cases of using apache beam for data
>> flow
>> > (process large amount of batch/streaming data). I am just wondering
>> > whether I can use apache beam for control flow (ETL workflow). I
>> don't
>> > mean the spark/flink job in the ETL workflow, I mean the ETL
>> workflow
>> > itself. Because ETL workflow is also a DAG which is very similar as
>> > the abstraction of apache beam, but unfortunately I didn't find
>> such
>> > use cases on internet. So I'd like to ask this question in beam
>> > community to confirm whether I can use apache beam for control flow
>> > (ETL workflow). If yes, please let me know some success stories of
>> > this. Thanks
>> >
>> >
>> >
>>
>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>>
>>


Re: [Request for Feedback] Swift SDK Prototype

2023-08-26 Thread Austin Bennett
This is great that is coming together, and am glad for the messages along
the way to understand process, choices, ...!



On Fri, Aug 25, 2023, 2:04 PM Byron Ellis via user 
wrote:

> Okay, after a brief detour through "get this working in the Flink Portable
> Runner" I think I have something pretty workable.
>
> PInput and POutput can actually be structs rather than protocols, which
> simplifies things quite a bit. It also allows us to use them with property
> wrappers for a SwiftUI-like experience if we want when defining DoFns
> (which is what I was originally intending to use them for). That also means
> the function signature you use for closures would match full-fledged DoFn
> definitions for the most part which is satisfying.
>
>
>
> On Thu, Aug 24, 2023 at 5:55 PM Byron Ellis  wrote:
>
>> Okay, I tried a couple of different things.
>>
>> Implicitly passing the timestamp and window during iteration did not go
>> well. While physically possible it introduces an invisible side effect into
>> loop iteration which confused me when I tried to use it and I implemented
>> it. Also, I'm pretty sure there'd end up being some sort of race condition
>> nightmare continuing down that path.
>>
>> What I decided to do instead was the following:
>>
>> 1. Rename the existing "pardo" functions to "pstream" and require that
>> they always emit a window and timestamp along with their value. This
>> eliminates the side effect but lets us keep iteration in a bundle where
>> that might be convenient. For example, in my cheesy GCS implementation it
>> means that I can keep an OAuth token around for the lifetime of the bundle
>> as a local variable, which is convenient. It's a bit more typing for users
>> of pstream, but the expectation here is that if you're using pstream
>> functions You Know What You Are Doing and most people won't be using it
>> directly.
>>
>> 2. Introduce a new set of pardo functions (I didn't do all of them yet,
>> but enough to test the functionality and decide I liked it) which take a
>> function signature of (any PInput,any POutput).
>> PInput takes the (InputType,Date,Window) tuple and converts it into a
>> struct with friendlier names. Not strictly necessary, but makes the code
>> nicer to read I think. POutput introduces emit functions that optionally
>> allow you to specify a timestamp and a window. If you don't for either one
>> it will take the timestamp and/or window of the input.
>>
>> Trying to use that was pretty pleasant to use so I think we should
>> continue down that path. If you'd like to see it in use, I reimplemented
>> map() and flatMap() in terms of this new pardo functionality.
>>
>> Code has been pushed to the branch/PR if you're interested in taking a
>> look.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Aug 24, 2023 at 2:15 PM Byron Ellis 
>> wrote:
>>
>>> Gotcha, I think there's a fairly easy solution to link input and output
>>> streams Let me try it out... might even be possible to have both
>>> element and stream-wise closure pardos. Definitely possible to have that at
>>> the DoFn level (called SerializableFn in the SDK because I want to
>>> use @DoFn as a macro)
>>>
>>> On Thu, Aug 24, 2023 at 1:09 PM Robert Bradshaw 
>>> wrote:
>>>
 On Thu, Aug 24, 2023 at 12:58 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

>
>
> On Thu, Aug 24, 2023 at 12:27 PM Robert Bradshaw 
> wrote:
>
>> I would like to figure out a way to get the stream-y interface to
>> work, as I think it's more natural overall.
>>
>> One hypothesis is that if any elements are carried over loop
>> iterations, there will likely be some that are carried over beyond the 
>> loop
>> (after all the callee doesn't know when the loop is supposed to end). We
>> could reject "plain" elements that are emitted after this point, 
>> requiring
>> one to emit timestamp-windowed-values.
>>
>
> Are you assuming that the same stream (or overlapping sets of data)
> are pushed to multiple workers ? I thought that the set of data streamed
> here are the data that belong to the current bundle (hence already 
> assigned
> to the current worker) so any output from the current bundle invocation
> would be a valid output of that bundle.
>
>>
 Yes, the content of the stream is exactly the contents of the bundle.
 The question is how to do the input_element:output_element correlation for
 automatically propagating metadata.


> Related to this, we could enforce that the only (user-accessible) way
>> to get such a timestamped value is to start with one, e.g. a
>> WindowedValue.withValue(O) produces a WindowedValue with the same
>> metadata but a new value. Thus a user wanting to do anything "fancy" 
>> would
>> have to explicitly request iteration over these windowed values rather 
>> than
>> over the raw elements. (This is also forward compatible with expanding 

Re: Missing Beam Katas in Intellij >=2023.3

2023-07-31 Thread Austin Bennett
Hi Bartosz,

Yes, you've flagged the exact issue.

For those I have recently look at doing the Katas we've just downloaded
older versions of IntelliJ so that it works with the older plugin
that works with Stepik.  That's not suitable longer term.

I haven't had time to dig into which new supported platform to upload these
to [ I imagine ANY will work pretty seamlessly ].

While I don't think there are any specific/concrete plans [ since someone
needs to do the actual work ], I believe the code is still good, is a
pretty straightforward task, and this would be supported [ is a good idea
].

I wrote up this ticket --> https://github.com/apache/beam/issues/27765

Are you interested/willing to take this on?  I would be happy to
collaborate, chat, etc to help you feel comfortable with direction.
Otherwise, I can look to eventually address.  Feel free to email me
off-list OR tag me in that GH issue to discuss [ @brucearctor ].

Cheers,
Austin



On Mon, Jul 31, 2023 at 12:10 PM Ahmet Altay  wrote:

> Hi Bartosz,
>
> Thanks for flagging this.
>
> Adding @Austin Bennett  and @Israel Herraiz
>  -- They were the two people who maintained beam katas or
> helped with related questions before.
>
> Ahmet
>
> On 2023/07/27 10:21:24 Bartosz Zabłocki via user wrote:
> > Hi all,
> > I'd like to bring to your attention that Beam Katas are no longer
> available
> > in Intellij >=2023.3, as far as I understand.
> >
> > The katas are hosted on the Stepik website [1]  and since Intellij 2023.3
> > Stepik courses are not available in JetBrains IDEs (source [2]).
> >
> > The only available out-of-the-box platforms are Intellij's Marketplace,
> > Hyperskill, Coursera, CheckiO and Codeforces. Is there a plan to migrate
> > Beam Katas to one of the platforms?
> >
> > [1] https://stepik.org/course/54532
> > [2]
> >
> https://blog.jetbrains.com/education/2023/03/30/jetbrains-academy-plugin-2023-3-is-available/
> > section Discontinuing
> > Stepik integration
> >
> > Cheers,
> > Bartosz Zablocki
> >
>


Re: PubSub Lite IO & Python?

2022-08-04 Thread Austin Bennett
@cham thanks for bringing the conversation back to the list ( esp. for
anyone else searching/wondering in the future )!

>From what I understand/summary:  Python should be able to call via X-Lang
the [ Java ] PubSubLite IO for use with any underlying runner ( well, that
utilizes portable runner, ex: Spark, Flink, DataflowV2, etc  )



On Thu, Aug 4, 2022 at 5:49 PM Chamikara Jayalath via user <
user@beam.apache.org> wrote:

>
>
> On Thu, Aug 4, 2022 at 5:29 PM Daniel Collins 
> wrote:
>
>> Hello Drew,
>>
>> > I upgraded to apache-beam 2.40.0 and tried to access
>> apache_beam.io.gcp.pubsublite.ReadFromPubSubLite
>>
>> You should ensure to import `apache_beam.io.gcp.pubsublite.*`. I have no
>> idea why the specific import isn't working- but that should work. If
>> its not, I'll look into it more.
>>
>> > writing native Spark code to pull from PubSub Lite
>>
>> Note that we have a spark native source you can use. I'm unsure if spark
>> works with beam python however, Chamikara would know that better.
>> https://github.com/googleapis/java-pubsublite-spark
>>
>
> It should be supported. See instructions here under "Portable
> (Java/Python/Go)": https://beam.apache.org/documentation/runners/spark/
>
>
>>
>>
>> -Daniel
>>
>> On Thu, Aug 4, 2022 at 7:48 PM Drew Forbes <
>> drew.for...@thatgamecompany.com> wrote:
>>
>>> I've actually not used PyBeam, I just meant writing Beam code with
>>> Python. Didn't realize there was a whole separate PyBeam package.
>>>
>>
> Thanks for clarifying.
>
> Thanks,
> Cham
>
>
>>
>>> I feel dumb asking, but basically we just couldn't get the import to
>>> work. I upgraded to apache-beam 2.40.0 and tried to access
>>> apache_beam.io.gcp.pubsublite.ReadFromPubSubLite through various means
>>> (regular import, proto_api, something like .external., etc) within Python
>>> and determined that there just wasn't anything to access. We could
>>> definitely have been wrong about that but it wasn't clear how to move
>>> forward so we just switched our focus to writing native Spark code to pull
>>> from PubSub Lite
>>>
>>> On Thu, Aug 4, 2022 at 6:46 PM Chamikara Jayalath 
>>> wrote:
>>>
>>>> I believe this should be fully working. I'm not familiar with PyBeam
>>>> though. Is the execution mechanism the same as running a regular Beam
>>>> pipeline ? Also, note that for multi-language, you need to use a portable
>>>> Beam runner.
>>>>
>>>> +Daniel Collins  who implemented this.
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> On Thu, Aug 4, 2022 at 11:24 AM Austin Bennett <
>>>> whatwouldausti...@gmail.com> wrote:
>>>>
>>>>> Hi Users/Devs,
>>>>>
>>>>> Drew, copied, reported having troubles with PubSub Lite:
>>>>>
>>>>> "we just weren’t able to get PubSub Lite working with PyBeam. It’s
>>>>> been a few weeks since we last tried, but we were just trying to use
>>>>> `apache_beam.io.gcp.pubsublite.ReadFromPubSubLite` (here
>>>>> <https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsublite.html>
>>>>> ) in PyBeam and couldn’t get it to import so we just gave up. From the
>>>>> looks of the repo we couldn’t tell if it was ever actually fully
>>>>> implemented and published"
>>>>>
>>>>> I haven't used myself, and figured others might be able to
>>>>> comment/share at least if any have had success using and/or at least
>>>>> whether fully tested/implemented IO ( whether available via cross-language
>>>>> or 'native' python ).
>>>>>
>>>>> Please share any thoughts here.
>>>>>
>>>>> Cheers,
>>>>> Austin
>>>>>
>>>>>


PubSub Lite IO & Python?

2022-08-04 Thread Austin Bennett
Hi Users/Devs,

Drew, copied, reported having troubles with PubSub Lite:

"we just weren’t able to get PubSub Lite working with PyBeam. It’s been a
few weeks since we last tried, but we were just trying to use
`apache_beam.io.gcp.pubsublite.ReadFromPubSubLite` (here

) in PyBeam and couldn’t get it to import so we just gave up. From the
looks of the repo we couldn’t tell if it was ever actually fully
implemented and published"

I haven't used myself, and figured others might be able to comment/share at
least if any have had success using and/or at least whether fully
tested/implemented IO ( whether available via cross-language or 'native'
python ).

Please share any thoughts here.

Cheers,
Austin


Re: Apache Beam London meetup 9: recordings

2022-06-19 Thread Austin Bennett
Great!

On Sun, Jun 19, 2022 at 11:18 AM Matthias Baetens 
wrote:

> Hi all
>
> The recordings from last year's Apache Beam meetup London are now
> available on the YouTube channel (apologies for the delay):
> - Apache Beam meetup 9: BBC's journey with Apache Beam
> 
> - Apache Beam meetup 9: Apache Beam + Apache Druid
> 
> - Apache Beam meetup 9: Attack detection use case at Fastly
> 
>
> Enjoy!
> Matthias
>


Re: [PROPOSAL] Stop Spark 2 support in Spark Runner

2022-04-29 Thread Austin Bennett
https://spark.apache.org/releases/spark-release-3-0-0.html

Since Spark 3 has been out almost 2 years, this seems increasingly
reasonable.

On Fri, Apr 29, 2022 at 4:04 AM Jean-Baptiste Onofré 
wrote:

> +1, it makes sense to me. Users wanting "old" spark version can take
> previous Beam releases.
>
> Regards
> JB
>
> On Fri, Apr 29, 2022 at 12:39 PM Alexey Romanenko
>  wrote:
> >
> > Any objections or comments from Spark 2 users on this topic?
> >
> > —
> > Alexey
> >
> >
> > On 20 Apr 2022, at 19:17, Alexey Romanenko 
> wrote:
> >
> > Hi everyone,
> >
> > A while ago, we already discussed on dev@ that there are several
> reasons to stop provide a support of Spark2 in Spark Runner (in all its
> variants that we have for now - RDD, Dataset, Portable) [1]. In two words,
> it brings some burden to Spark runner support that we would like to avoid
> in the future.
> >
> > From the devs perspective I don’t see any objections about this. So, I’d
> like to know if there are users that still uses Spark2 for their Beam
> pipelines and it will be critical for them to keep using it.
> >
> > Please, share any your opinion on this!
> >
> > —
> > Alexey
> >
> > [1] https://lists.apache.org/thread/opfhg3xjb9nptv878sygwj9gjx38rmco
> >
> > > On 31 Mar 2022, at 17:51, Alexey Romanenko 
> wrote:
> > >
> > > Hi everyone,
> > >
> > > For the moment, Beam Spark Runner supports two versions of Spark - 2.x
> and 3.x.
> > >
> > > Taking into account the several things that:
> > > - almost all cloud providers already mostly moved to Spark 3.x as a
> main supported version;
> > > - the latest Spark 2.x release (Spark 2.4.8, maintenance release) was
> done almost a year ago;
> > > - Spark 3 is considered as a mainstream Spark version for development
> and bug fixing;
> > > - better to avoid the burden of maintenance (there are some
> incompatibilities between Spark 2 and 3) of two versions;
> > >
> > > I’d suggest to stop support Spark 2 for the Spark Runner in the one of
> the next Beam releases.
> > >
> > > What are your thoughts on this? Are there any principal objections or
> reasons for not doing this that I probably missed?
> > >
> > > —
> > > Alexey
> > >
> > >
>


Re: JdbcIO

2022-04-22 Thread Austin Bennett
Without getting into the super specifics of your use-case, it sounds like
you might want to checkout the DebeziumIO for CDC ( Change Data Capture ).
I think DebeziumIO can generally handle even much more complex use cases
than it sounds like you are trying for.

Some pointers/talks from last year's beam summit:
https://www.youtube.com/watch?v=hu5FacAeQ-8
https://www.youtube.com/watch?v=U_RshngpxLc



On Fri, Apr 22, 2022 at 4:41 AM Eric Berryman 
wrote:

> Does an unbounded JdbcIO exist, or would I need to wrap the existing one
> in a spilttable DoFn? Or maybe there is an easier way to do it?
>
> Thank you again,
> Eric
>
>
>
> On Wed, Apr 20, 2022, 21:59 Ahmet Altay  wrote:
>
>> /cc @Pablo Estrada  @John Casey
>> 
>>
>> On Wed, Apr 20, 2022 at 6:29 PM Eric Berryman 
>> wrote:
>>
>>> Hello,
>>>
>>> I have a rather simple use case where I would like to read a db table,
>>> which acts as a queue (~ hundreds millions events in initial load, but only
>>> thousands of events per day), and write that data out to a sink. This
>>> pipeline would be unbounded.
>>>
>>> I'm looking for reading material, and or code, which displays reading
>>> from the JdbcIO API with checkpoints. I would like to avoid the initial
>>> load on restarts, upgrades, etc. :)
>>>
>>> Thank you for your time!
>>> Eric
>>>
>>


Re: Beam Summit is looking for speakers!

2022-03-10 Thread Austin Bennett
I would also add -- if in doubt/concerned, don't hesitate to reachout to
me, I'd be happy to talk through potential submission ideas if that'd be
helpful for anyone.

On Thu, Mar 10, 2022 at 11:25 AM Pablo Estrada  wrote:

> Thanks for sharing Danielle!
>
> And to our users - please do submit your proposals : ) no topic is too
> small, too weird, too complex, too low nor high-level if it uses Beam.
> Best
> -P.
>
> On Thu, Mar 10, 2022 at 11:19 AM Danielle Syse  wrote:
>
>> Hi all,
>>
>> I hope you're having a great week before the long weekend! I'm reaching
>> out to remind you to submit your CFPs for our annual Beam Summit due next
>> Tuesday!
>>
>> Beam Summit is coming back in 2022 with a hybrid format (onsite+online).
>>
>> We’ll host sessions to share use cases from companies using Apache Beam,
>> as well as community driven talks, technical deep dives and in-depth
>> workshops.
>>
>> Beam Summit will take place in Austin, TX on July 18-20, 2022. While we
>> would love for you to join us in person, we will also be streaming all
>> sessions (except workshops) live for an online audience. Please use the
>> links below to get involved!
>> Beam Summit: https://2022.beamsummit.org/
>> Beam Summit Registration: https://2022.beamsummit.org/tickets/
>> CFP Submission: https://bit.ly/3o2D9FL
>>
>> Thanks,
>>
>> Danielle Syse
>>
>


Re: Spark Structured Streaming runner migrated to Spark 3

2021-08-05 Thread Austin Bennett
Hooray!  Thanks, Etienne!

On Thu, Aug 5, 2021 at 3:11 AM Etienne Chauchot 
wrote:

> Hi all,
>
> Just to let you know that Spark Structured Streaming runner was migrated
> to Spark 3.
>
> Enjoy !
>
> Etienne
>
>


Re: Allyship workshops for open source contributors

2021-06-03 Thread Austin Bennett
+1, assuming timing can work.

On Wed, Jun 2, 2021 at 2:07 PM Aizhamal Nurmamat kyzy 
wrote:

> If we have a good number of people who express interest in this thread, I
>> will set up training for the Airflow community.
>>
>
> I meant Beam ^^' I am organizing it for the Airflow community as well.
>


Re: UX Research Findings Readout for Apache Beam Community

2021-01-30 Thread Austin Bennett
Is it possible to writeup/share results for those not able to attend and/or
to digest ahead of attending?



On Thu, Jan 28, 2021, 10:46 AM Carlos Camacho Frausto <
carlos.cama...@wizeline.com> wrote:

> Hello,
> Some weeks ago, our firm conducted a User Experience Research Study for
> Google Apache Beam to identify users’ needs and pain points when learning
> and using Apache Beam.
>
> *Today, we are glad to invite you to a Readout session where we will
> present the key findings and a list of recommendations in order to improve
> the learning experience for Apache Beam users. This session will consider a
> Q where you’ll be able to interact with the community. *
>
> We are considering a session of 60 minutes on any of these possible dates:
>
>- Thursday, February 11th at 11:00 AM CST / 6:00 PM CEST
>- Thursday, February 11th at 2:00 PM CST / 9:00 PM CEST
>- Friday, February 12th at 11:00 AM CST / 6:00 PM CEST
>- Friday, February 12th at 2:00 PM CST / 9:00 PM CEST
>
>
> If you would like to attend the session, *please help us know which of
> the dates/times options work best for you by filling up this form
> *. <
> https://forms.gle/LHjB3uYiJ35BFcbM6>
>
> --
>
> Carlos Camacho | WIZELINE
>
> UX Designer
>
> carlos.cama...@wizeline.com
>
> Amado Nervo 2200, Esfera P6, Col. Jardines del Sol, 45050 Zapopan, Jal.
>
> Follow us @WizelineGlobal  | Facebook
>  | LinkedIn
> 
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*


BeamSQL and Beam equivalent -- examples?

2020-11-01 Thread Austin Bennett
Hi All,

For something I am currently writing -- I am seeking any examples of
BeamSQL and Beam that take the same input and produce the same output.  I
can't recall, off head, any examples/slides/writeups.  Do any exist?

I would like to show:

(a) that BeamSQL is a real thing :-)
(b) that Beam can express the same as BeamSQL
(c) that Beam can be more expressive than just SQL concepts.

Imagining such examples can help with points a and b.

Thanks,
Austin


Re: Ability to link to "latest" of python docs

2020-09-08 Thread Austin Bennett
+dev 

Lynn,

Seems totally doable.  If others don't speak up with a good way to do this
(or in opposition), I'm sure we can sort something out to accomplish this
(will dig into intersphinx mapping tomorrow).

Cheers,
Austin




On Tue, Sep 8, 2020, 5:19 PM Lynn Root  wrote:

> Hey folks -
>
> I'm wondering if there's a way to link to the latest SDK version of the
> Python documentation. I see that if I go here
> , it lists all the available
> documented SDK versions. But it'd be really nice to go to a link like "
> https://beam.apache.org/releases/pydoc/latest; and be automatically
> pointed to the latest one. This is particularly handy for documenting
> libraries that use beam via intersphinx mapping
> .
>
> Thanks!
>
> --
> Lynn Root
> Staff Engineer, Spotify
>


Intro to Beam and Contributing Workshops

2020-07-19 Thread Austin Bennett
Hi All,

I'm a huge fan of HOPE .

In the virtual edition this year, I am giving 2 talks.

* a 2hr introduction to Beam.
* a 1hr introduction to contributing to open source (with specific examples
from Beam).

These to occur on 30/31 July, schedule found:
https://scheduler.hope.net/hope2020/schedule/

I can see whether there are additional passes available for me as a speaker
to share with the community (not sure on this point).

Cheers,
Austin


Re: Can SpannerIO read data from different GCP project?

2020-06-28 Thread Austin Bennett
I havent tried yet, but looks like the connection string asks for the
project to be specified.  Based on that (and cross project working for
other circumstances), I would imagine it will work, but...?  Give it a try!

One tricky place might be ensuring proper permissions, in both projects
(and without being too open).

On Sat, Jun 27, 2020, 5:46 AM Sheng Yang  wrote:

> Hi,
>
> I am working on Beam using Dataflow engine. Recently I am working on
> reading spanner data from different project. Say I run my Beam dataflow job
> in GCP project A, but the Spanner is in GCP project B. I searched all the
> documents, but can't find any documentation about SpannerIO reading data
> with the custom credential key files. Right now I am considering JdbcIO
> because it accepts custom credential as parameters and spanner also have
> jdbc api[1].
> Do I have something wrong in my description? Or am I considering the
> correct approach?
>
> String url = "jdbc:cloudspanner:/projects/my_project_id/"
>
>+ "instances/my_instance_id/"
>+ "databases/my_database_name"
>+ "?credentials=/home/cloudspanner-keys/my-key.json"
>+ ";autocommit=false";try (Connection connection = 
> DriverManager.getConnection(url)) {
>   try(ResultSet rs = connection.createStatement()
>.executeQuery("SELECT SingerId, AlbumId, MarketingBudget FROM 
> Albums")) {
> while(rs.next()) {
>   Long singerId = rs.getLong(1);
> }
>   }
> }
>
>
> [1]: https://github.com/googleapis/java-spanner-jdbc
>
> Thanks,
> Sheng
>
>
>
>


Re: How to safely update jobs in-flight using Apache Beam on AWS EMR?

2020-06-10 Thread Austin Bennett
Hi Dan,

AWS EMR generally runs Flink and/or Spark as supported Beam Runners.  For
EMR, you might want to check compatibility for versions of Beam/Flink can
run, and the status of beam pipelines using either of those runners.

On running Beam in AWS, had you seen:
https://www.youtube.com/watch?v=eCgZRJqdt_I



Cheers,
Austin

On Wed, Jun 10, 2020 at 2:02 PM Dan Hill  wrote:

> No.  I just sent AWS Support a message.
>
> On Wed, Jun 10, 2020 at 1:00 PM Luke Cwik  wrote:
>
>> The runner needs to support it and I'm not aware of an EMR runner for
>> Apache Beam let alone one that supports pipeline update. Have you tried
>> reaching out to AWS?
>>
>> On Wed, Jun 10, 2020 at 11:14 AM Dan Hill  wrote:
>>
>>> Hi!  I found great docs about Apache Beam on Dataflow (which makes
>>> sense).  I was not able to find this about AWS EMR.
>>>
>>> https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline
>>>
>>>
>>> https://medium.com/google-cloud/restarting-cloud-dataflow-in-flight-9c688c49adfd
>>>
>>


Re: Writing pipeline output to google sheet in google drive

2020-06-08 Thread Austin Bennett
@OrielResearch Eila Arich-Landkof   Depending on
your needs, I wonder about establishing a sheet (or sheets, as needed) that
has a BQ connector for the datasource of it.  If you use Dataflow to
write/create a BQ table, that would then hydrate the sheet (not sure the
ordering -- maybe you'd need to create the BQ table before creating the
sheet)...?  An extra step, and perhaps a bit convoluted.

Another idea would be to write to some sort of sheet-compatible file-type
and then upload that to the folder.  There then *might* be something like a
cli call to turn the (ex: csv) file into a sheet?

Neither seem as clean as what you're looking for :-/


On Mon, Jun 8, 2020 at 8:26 AM Luke Cwik  wrote:

> It doesn't look like BigQuery supports exporting to Google sheet[1], maybe
> you can invoke this BQ connector directly by adding a transform that
> follows the BQ sink.
>
> 1:
> https://cloud.google.com/bigquery/docs/exporting-data#export_limitations
>
> On Sat, Jun 6, 2020 at 8:31 PM OrielResearch Eila Arich-Landkof <
> e...@orielresearch.org> wrote:
>
>> Hello,
>>
>> Is it possible to have the pipeline sink to a google sheet within a
>> specific google drive directory.
>> Something like that:
>>
>> p =  beam.Pipeline(options=options)
>> (p | 'Step 1: read file ' >> beam.io.ReadFromText(path/to/file)
>>| 'Step 2:  process data  ' >> beam.ParDo(get_daata(l]))
>>| 'step 3: write data to gsheet  ' >> beam.io.WriteToXXX(GSHEET PATH))
>>
>>
>> I know that BQ has a connector to Google sheet. Is it possible to use
>> this connector from the BQ sink? Other way?
>>
>> Thanks,
>> Eila
>>
>>


Beam First Steps Workshop - 9 June

2020-06-02 Thread Austin Bennett
Hi Beam Users,

Wanted to share the Workshop that I'll give at Berlin Buzzword's next week:

https://berlinbuzzwords.de/session/first-steps-apache-beam-writing-portable-pipelines-using-java-python-go

Do consider joining if you are able and interested (if you're here and
already using, then the workshop would likely be too basic).  Should you
not able to find a way to get a pass, do feel free to write and I'll see
what can be done (no promises, unsure what/whether anything).  If there is
eventually unmet demand, we can see about offering a free/public event
(like the Beam Learning month(s) events), and may have something similar at
https://beamsummit.org/.

Cheers,
Austin


Re: Try Beam Katas Today

2020-05-14 Thread Austin Bennett
It looks like there are instructions online for writing exercises/Katas:
https://www.jetbrains.com/help/education/educator-start-guide.html

Do we have a guide for contributing and publication/releases occur
(publishing to Stepik)?  Although the code lives in the main repo
(therefore subject to those contrib guidelines), I think the
release/publication schedule is distinct?

This hopefully will help illustrate that we are able to contribute to Katas
(PRs welcome?), and not just consume them!



On Thu, May 14, 2020 at 1:41 AM Henry Suryawirawan 
wrote:

> Yeah certainly we can expand it further.
> There are more lessons that definitely can be added further.
>
> >Eg more the write side windowing interactions?
> Are you referring to Write IOs?
>
>
>
> On Wed, May 13, 2020 at 11:56 PM Nathan Fisher 
> wrote:
>
>> I went through them earlier this week! Definitely helpful.
>>
>> Is it possible to expand the katas available in the lO section? Eg more
>> the write side windowing interactions?
>>
>> On Wed, May 13, 2020 at 11:36, Luke Cwik  wrote:
>>
>>> These are an excellent learning tool.
>>>
>>> On Tue, May 12, 2020 at 11:02 PM Pablo Estrada 
>>> wrote:
>>>
 Sharing Damon's email with the user@ list as well. Thanks Damon!

 On Tue, May 12, 2020 at 9:02 PM Damon Douglas 
 wrote:

> Hello Everyone,
>
> If you don't already know, there are helpful instructional tools for
> learning the Apache Beam SDKs called Beam Katas hosted on
> https://stepik.org.  Similar to traditional Kata
> , they are meant to be repeated
> as practice.  Before practicing the katas myself, I found myself
> copy/pasting code (Please accept my confession  ).  Now I find myself
> actually composing pipelines.  Just like kata forms, you find them 
> becoming
> part of you.  If you are interested, below are listed the current 
> available
> katas:
>
> 1.  Java - https://stepik.org/course/54530
>
> 2.  Python -  https://stepik.org/course/54532
>
> 3.  Go (in development) - https://stepik.org/course/70387
>
> If you are absolutely brand new to Beam and it scares you like it
> scared me, come talk to me.
>
> Best,
>
> Damon
>
 --
>> Nathan Fisher
>>  w: http://junctionbox.ca/
>>
>


Beam Digital Summit 2020 -- JUNE 2020!

2020-04-22 Thread Austin Bennett
Hi All,

We are excited to announce the Beam Digital Summit 2020!

This will occur for partial days during the week of 15-19 June.

CfP is open and found: https://sessionize.com/beam-digital-summit-2020/

CfP closes on 20 May 2020.  Do not hesitate to reach out to the organizers
with any questions.

See you there (online)!
Austin, on behalf of the Beam Summit Steering Committee


Meetups

2020-03-23 Thread Austin Bennett
Seems we won't be convening in-person in about any city anytime soon.

Seems like a chance to come together virtually.

WHO CAN SHARE?

Seeking:
* Use Cases
* Developing Beam/Components
* Other

If anything particular, also, what would you like to hear -- can see if we
can track such speakers down.


Bay Area Beam Meetup 19 Feb (Last Wednesday).

2020-02-21 Thread Austin Bennett
Hi All,

We had a meetup @Sentry.io on Wednesday -- with a solid 40+ engaged
attendees.

Thanks for those that joined in person, and for those that were unable,
talks can be found online -->
Syd's talk (real time data warehouse): https://youtu.be/rFK6drAWN40
Mike's talk (beam in production): https://youtu.be/GOQVTr5hBoQ

Cheers,
Austin


P.S.  The event page for more info
https://www.meetup.com/San-Francisco-Apache-Beam/events/268363008/


Re: Help needed on a problem statement

2020-02-19 Thread Austin Bennett
I'd disentangle Dataflow from Beam.  Beam can help you.  Dataflow might be
useful, though, yes, for batch jobs the spin up cost might be a lot for
small file sizes.

There are potentially lots of ways to do this.

An idea (that I haven't seen used anywhere).  Have a streaming Beam
pipeline (that can autoscale if needed) persistently running.  Have another
process that takes each record from the files as dropped and puts in
message queue for Beam to process (you'd have both the data 'record' as
well as metadata about source file).

* The dataflow spin up is heavy: I'm wondering suitability of running
something like this using Direct runner (or even single node Flink) on GCP
RUN (with an event notification coming from GCS to kickoff job):
https://cloud.google.com/run/quotas <-- looks like can handle up to 2GB in
memory.  So, if not, have some logic for when to launch dataflow, vs when
to do lighter weight beam job.

I've have not faced your problem.  Merely making up what might be
interesting solutions :-)  Good luck!



On Wed, Feb 19, 2020 at 11:10 AM subham agarwal 
wrote:

> Hi Team,
>
> I was working on a problem statement and I came across beam. Being very
> new to beam I am not sure if my use case can be solved by beam. Can you
> please help me here.
>
> Use case:
>
> I have list of CSV and JSON files coming every min in Google cloud
> storage. The file can range from kb to gb. I need to parse the file and
> process records in each file independently, which means file 1 records
> should be parsed and data will be enriched and be stored in different
> output location and file 2 will go into different location.
>
> I started with launching a different dataflow job for each file but it is
> over kill for small files. So, I thought if I can batch files every 15 mins
> and process them together in a single job but I need to maintain the above
> boundary of data processing.
>
>
> Can anyone please help me if there is a solution around my problem or beam
> is not meant for this problem statement.
>
> Thanks in advance.
>
> Looking forward for a reply.
>
> Regards,
> Subham Agarwal
>


Re: Link to Flink on K8S Webinar

2020-02-19 Thread Austin Bennett
Cool; @aniket and @dagang,

As someone who hasn't dug into the code of either (will go through your
recording) -- might you share any thoughts on differences between:
https://github.com/googlecloudplatform/flink-on-k8s-operator
and
https://github.com/lyft/flinkk8soperator
??


Also, for those in Bay Area (or to attend GCP NEXT), we'll have an
in-person talk touching on things related in April:
https://www.meetup.com/San-Francisco-Apache-Beam/events/268674177/



On Tue, Feb 18, 2020 at 11:48 AM Aizhamal Nurmamat kyzy 
wrote:

> Hi folks,
>
> Recently Aniket Mokashi and Dagang Wei hosted a webinar on how to use the
> flink k8s operator they have developed. The operator also supports working
> with Beam.
>
> If you think that this may be helpful to you, you may access the recording
> and slides via this link:
> https://www.cncf.io/webinars/operating-os-flink-beam-runtime-kubernetes/
>
> Thanks,
> Aizhamal
>


Beam Meetup LA -- KICKOFF (10 March)

2020-01-27 Thread Austin Bennett
Come join the community kicking off in LA (in person) on 10 March:
https://www.meetup.com/Los-Angeles-Apache-Beam/events/268207085/


Re: NYC ? (or more generally East Coast)

2020-01-26 Thread Austin Bennett
We did get 2 awesome speakers for an event at Spotify in NYC on 25
Feb.  For interested, come join!

https://www.meetup.com/New-York-Apache-Beam/events/268153356/


On Thu, Jan 23, 2020 at 8:08 AM Austin Bennett
 wrote:
>
> Hi Jennifer,
>
> I'd defer to your local expertise, knowledge, and especially inclination.
>
> Space: I'm sure we can track some down even without Georgetown, but is
> great to know what is available -- we will also look to move Beam
> Summit (NA) around over the years, so ... :-)
>
> There is value in many approaches -- and certainly we wind up with
> larger audiences when speaking in front of existing meetups.  That
> being said, there is also much to having a community where we are able
> to reach people more directly (via a meetup someone active in the
> community can message -- since not everyone will signup and attend to
> dev/user lists or other communication channels).  The latter point is
> why I tend to push hosting our own -- this seems quite valuable to us,
> at this stage.
>
> A more recent model getting experimented with is having the Beam
> meetup put on the event, and then letting other groups know about so
> they can announce/share with their meetups and/or co-host to reach a
> wider audience (most meetups are quite happy to have content to share
> with their meetups).
>
> Cheers,
> Austin
>
> On Thu, Jan 23, 2020 at 5:54 AM Jennifer Melot
>  wrote:
> >
> > Sorry I've been so slow to get back to you. I'm looking into whether there 
> > is a Georgetown space we could use and will get back to you about that soon.
> >
> > The idea of a regular meetup is exciting, although I do wonder if doing 
> > talks at other more general meetups might be a good idea to help raise 
> > enough interest/awareness here for a regular Beam-only meetup (although I 
> > personally would love to hear from all the people you mentioned!). For 
> > example, there's a newish "DataOps DC" group that might be interested in 
> > someone giving a talk, etc (I've never been though - still relatively new 
> > to the area and learning about the local tech groups). If it would be 
> > helpful, I could contact a few groups and see if they'd be interested in 
> > someone giving a Beam talk.
> >
> > On Mon, Jan 20, 2020 at 12:46 PM Austin Bennett 
> >  wrote:
> >>
> >> Looks like you found the group:
> >> https://www.meetup.com/DC-Apache-Beam/
> >>
> >> !!
> >>
> >> Members of the groups and signups greatly expand each event that is held.
> >>
> >> Between you, Suneel, and I, we have at least 1 event, and I can also
> >> get some of the committers/PMCs out, as well.  So, probably 2 or 3,
> >> and then ideally we find others that are local.
> >>
> >> Would love to have you at least share, and is fantastic for local
> >> involvement with organization (I'm based in SF, though happy to hop on
> >> flights as sensible).
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Jan 15, 2020 at 6:57 PM Austin Bennett
> >>  wrote:
> >> >
> >> > Hi Jennifer,
> >> >
> >> > Great!  Happy to gather use cases, we are also on the lookout for
> >> > spaces for events a-- if you're affiliated with the university, might
> >> > that be something you'd have access to?
> >> >
> >> > See other thread, we got a meetup page online some time ago.
> >> >
> >> > Cheers,
> >> > Austin
> >> >
> >> > On Wed, Jan 15, 2020 at 4:10 PM Jennifer Melot
> >> >  wrote:
> >> > >
> >> > > I'd be more than happy to contribute in any way to a DC meetup (would 
> >> > > be awesome if that existed!), including a talk on how we've been using 
> >> > > Beam at my org if that would be useful. I'm less confident about 
> >> > > making an NYC meetup but would love to stay in the loop anyway.
> >> > >
> >> > > Jennifer
> >> > >
> >> > > On Wed, Jan 15, 2020 at 6:09 PM Austin Bennett 
> >> > >  wrote:
> >> > >>
> >> > >> Awesome; writing directly to get down to specifics.
> >> > >>
> >> > >> Anyone else?
> >> > >>
> >> > >> On Mon, Jan 13, 2020 at 1:51 PM Suneel Marthi  
> >> > >> wrote:
> >> > >> >
> >> > >> > I can do talks in either DC or NYC meetups.  I can coordinate with 
> >> > >> >

Los Angeles Beam Meetup kickoff on 27 January

2020-01-17 Thread Austin Bennett
We're kicking off the Beam Community at community member Chad
Dombrova's place (Luma Pictures) in Santa Monica.

Come join us!

https://www.meetup.com/Los-Angeles-Apache-Beam/events/267812648/


Bangalore / Bengaluru Meetup

2020-01-16 Thread Austin Bennett
Hi Dev and Users,

Also we hope to kickoff a meetup in India this year.
https://www.meetup.com/Bangalore-Apache-Beam/

Please let us know if you'd like to get involved, speaking, hosting,
etc.  Reply to me, private or on thread, and/or use this survey link:
https://forms.gle/cud39eh3FA1em7EU7 (thanks @Tanay Tummalapalli for
compiling).

And, naturally signup in Meetup if interested to attend - as that is
where most of the messages on that topic will appear.

Cheers,
Austin


Re: NYC ? (or more generally East Coast)

2020-01-15 Thread Austin Bennett
Awesome; writing directly to get down to specifics.

Anyone else?

On Mon, Jan 13, 2020 at 1:51 PM Suneel Marthi  wrote:
>
> I can do talks in either DC or NYC meetups.  I can coordinate with CapitalOne 
> to see if they would be willing to host the DC meetup.
>
> On Mon, Jan 13, 2020 at 4:02 PM Austin Bennett  
> wrote:
>>
>> Hi Devs and Users,
>>
>> We are looking for speakers for future Meetups and Events.  Who is
>> building cool things with Beam?  We are looking at hosting a Meetup at
>> Spotify in February, and ideally keep some meetups going throughout
>> the year.  For this to occur, we need to hear about what people are
>> working on!  Even if only a small/lightning talk, etc, do reach out!
>> Let's figure something out.
>>
>> Cheers,
>> Austin
>>
>> P.S.  https://www.meetup.com/New-York-Apache-Beam/
>>
>> P.S.S.  We also have budding communities in DC and Boston, will
>> eventually write in separate threads on those.


NYC ? (or more generally East Coast)

2020-01-13 Thread Austin Bennett
Hi Devs and Users,

We are looking for speakers for future Meetups and Events.  Who is
building cool things with Beam?  We are looking at hosting a Meetup at
Spotify in February, and ideally keep some meetups going throughout
the year.  For this to occur, we need to hear about what people are
working on!  Even if only a small/lightning talk, etc, do reach out!
Let's figure something out.

Cheers,
Austin

P.S.  https://www.meetup.com/New-York-Apache-Beam/

P.S.S.  We also have budding communities in DC and Boston, will
eventually write in separate threads on those.


Re: proto in pubsub

2019-12-16 Thread Austin Bennett
  I got my issue sorted out; was user ignorance/error!

Thanks, Robert!  Yes, I see that my slightly cryptic message left much
to be desired; was getting late-ish my-time after having been trying
to crack things for several hours; sent it off as calling it a night
which wasn't quite the right move.  As doesn't leave much for anyone
to help with, nor to aid with indexing/surfacing info via others'
future searching.  :-)

Agreed that things work.  I should actually put together a working
example/demo, as extensive searching didn't yield much usable.


On Mon, Dec 16, 2019 at 12:06 PM Robert Bradshaw  wrote:
>
> This should work. An example of what you're trying to do and what
> errors/unexpected behavior you're getting would be helpful.
>
> On Sun, Dec 15, 2019 at 10:13 PM Austin Bennett
>  wrote:
> >
> > Hi All,
> >
> > Struggling with reading a proto message from pubsub and writing to
> > BigQuery in Beam (Direct Runner -- though will then use Dataflow
> > runner.  Hoping that distinction doesn't make a difference).  Probably
> > something I'm doing wrong (or not doing) with the proto coder.  The
> > output (BigQuery) not particularly an issue, as I'm still working on
> > getting the message properly read.
> >
> > Anyone have experience/example/tip(s)?  Attempting to do in Python,
> > but no problem if moving to Go or Java.
> >
> > Thanks,
> > Austin


proto in pubsub

2019-12-15 Thread Austin Bennett
Hi All,

Struggling with reading a proto message from pubsub and writing to
BigQuery in Beam (Direct Runner -- though will then use Dataflow
runner.  Hoping that distinction doesn't make a difference).  Probably
something I'm doing wrong (or not doing) with the proto coder.  The
output (BigQuery) not particularly an issue, as I'm still working on
getting the message properly read.

Anyone have experience/example/tip(s)?  Attempting to do in Python,
but no problem if moving to Go or Java.

Thanks,
Austin


slides?

2019-11-14 Thread Austin Bennett
Hi Dev and User,

Wondering if people would find a benefit from collecting slides from
Meetups/Talks?

Seems that this could be appropriate on the website, for instance.  Not
sure whether this has been asked previously, so bringing it to the group.

Cheers,
Austin


Kicking off Beam Meetup NYC

2019-09-27 Thread Austin Bennett
On the heels of the new Seattle Meetup (yesterday's event), announcing the
kickoff of the first event in NYC.

https://www.meetup.com/New-York-Apache-Beam/events/265128669/

We'll have Tyler Akidau sharing on Streaming SQL, and some talks from Oden
Technologies (a fantastic example of Beam, using both on DataFlow in cloud,
and onsite in factories on flink where connectivity is limited).

Please consider joining and helping start this new local community!


Re: Beam/flink/kubernetes/minikube/wordcount example

2019-09-12 Thread Austin Bennett
I got hung up on that issue earlier this week.  Was using Flink 1.7.  V2.15
of Beam.  Wasn't using Kubernetes.

Then gave up, so don't have a solution :-/

I don't understand the job server enough, but think I was getting error
when I did not have it running

(I still don't understand portability enough, so might not be using this
terminology correctly).


On Wed, Sep 11, 2019 at 1:26 PM Matthew Patterson 
wrote:

> Nope: dang, thanks.
>
> On 9/11/19, 3:49 PM, "Robert Bradshaw"  wrote:
>
> CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
>
> Is your input on a file system accessible to the workers? (Including,
> from within Docker, if the workers are running in docker.)
>
> On Wed, Sep 11, 2019 at 12:03 PM Matthew Patterson
>  wrote:
> >
> > Hi Beamers,
> >
> >
> >
> > I am running the `wordcount` example, but following example from
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbeam.apache.org%2Fdocumentation%2Frunners%2Fflink%2Fdata=02%7C01%7Cmpatterson%40aligntech.com%7Cda40636a76c64746508108d736f12833%7C9ac44c96980a481bae23d8f56b82c605%7C0%7C1%7C637038281702213802sdata=mYqB38n4myNxK9Krvh8opjxK%2BK4kHbwfBLZcgmt3OF8%3Dreserved=0,
> that is, I change the pipeline initialization as follows.
> >
> >
> >
> > ```
> >
> > import apache_beam as beam
> >
> > from apache_beam.options.pipeline_options import PipelineOptions
> >
> >
> >
> > options = PipelineOptions(["--runner=FlinkRunner",
> "--flink_version=1.8", "--flink_master_url=localhost:8081"])
> >
> > with beam.Pipeline(options) as p:
> >
> > …
> >
> > ```
> >
> >
> >
> > Running against my minikube cluster (), I get:
> >
> >
> >
> > “RuntimeError: Pipeline
> BeamApp-mpatterson-0911164258-7ef8768c_71984a02-5036-421e-9754-b57dbc628d3f
> failed in state FAILED: java.io.IOException: FAILED_TO_UNCOMPRESS(5)
> >
> > ”
> >
> >
> >
> > Any ideas?
> >
> >
> >
> > Thanks,
> >
> > Matt
> >
> >
> >
> > (minikube version: v1.3.1
> >
> > commit: ca60a424ce69a4d79f502650199ca2b52f29e631
> >
> >
> >
> > bash-3.2$ kubectl version
> >
> > Client Version: version.Info{Major:"1", Minor:"14",
> GitVersion:"v1.14.6", GitCommit:"96fac5cd13a5dc064f7d9f4f23030a6aeface6cc",
> GitTreeState:"clean", BuildDate:"2019-08-19T11:13:49Z",
> GoVersion:"go1.12.9", Compiler:"gc", Platform:"darwin/amd64"}
> >
> > Server Version: version.Info{Major:"1", Minor:"15",
> GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568",
> GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z",
> GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
> >
> >
> >
> > Flink job- and task-manager containers both built from flink:1.8
> >
> > )
> >
> >
> >
> > Full output
> >
> > >>>
> >
> >
> >
> > /anaconda3/envs/aws/bin/python
> /Users/mpatterson/Library/Preferences/PyCharmCE2019.2/scratches/beam_me_up.py
> >
> > /Users/mpatterson/dev/beam/sdks/python/apache_beam/__init__.py:84:
> UserWarning: Some syntactic constructs of Python 3 are not yet fully
> supported by Apache Beam.
> >
> >   'Some syntactic constructs of Python 3 are not yet fully supported
> by '
> >
> > executable: /anaconda3/envs/aws/bin/python
> >
> > beam.__version__: 2.15.0
> >
> > WARNING:root:Make sure that locally built Python SDK docker image
> has Python 3.7 interpreter. See also: BEAM-7474.
> >
> > INFO:root:Using latest locally built Python SDK docker image:
> mpatterson-docker-apache.bintray.io/beam/python3:latest.
> >
> > INFO:root:  0x11850b200> 
> >
> > INFO:root: 
> 
> >
> > WARNING:root:Downloading job server jar from
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepo.maven.apache.org%2Fmaven2%2Forg%2Fapache%2Fbeam%2Fbeam-runners-flink-1.8-job-server%2F2.15.0%2Fbeam-runners-flink-1.8-job-server-2.15.0.jardata=02%7C01%7Cmpatterson%40aligntech.com%7Cda40636a76c64746508108d736f12833%7C9ac44c96980a481bae23d8f56b82c605%7C0%7C1%7C637038281702213802sdata=Vj3F0j39Ped9UBS0196wZA1CN9UT0ckdXfLolNbM94E%3Dreserved=0
> >
> > [main] INFO
> org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver -
> ArtifactStagingService started on localhost:57443
> >
> > [main] INFO
> org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver - Java
> ExpansionService started on localhost:57444
> >
> > [main] INFO
> org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver -
> JobService started on localhost:57439
> >
> > [grpc-default-executor-0] ERROR
> 

Re: Hackathon @BeamSummit @ApacheCon

2019-09-06 Thread Austin Bennett
+user@beam.apache.org 

On Fri, Sep 6, 2019 at 5:24 PM Austin Bennett 
wrote:

> Ah, yes.  We'll definitely be in Hackathon space 2-3p on Monday and
> Tuesday (and can stay longer if needed).  We aren't scheduling anything
> official on Wed and Thurs, given the multiple Beam tracks that are
> occurring.
>
> On Fri, Sep 6, 2019 at 4:46 PM Mikhail Gryzykhin 
> wrote:
>
>> I'll be in most of the week and will join gladly.
>>
>> On Thu, Sep 5, 2019, 14:32 Chad Dombrova  wrote:
>>
>>> Has a date and time been picked for this?  I'll be there for part of the
>>> week and would love to join.
>>>
>>> On Tue, Sep 3, 2019 at 11:31 AM Brian Hulette 
>>> wrote:
>>>
>>>> I will be around all week as well and would love to help with a Beam
>>>> hackathon in any way :)
>>>>
>>>> On Thu, Aug 29, 2019 at 9:46 AM Maximilian Michels 
>>>> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> I'm in as well! Austin and I recently talked about how we could
>>>>> organize
>>>>> the hackathon. Likely it will be an hour per day for exchanging ideas
>>>>> and learning about Beam. For example, there has been interest from the
>>>>> Apache Streams project to discuss points for collaboration.
>>>>>
>>>>> We will soon announce the exact hours.
>>>>>
>>>>> Cheers,
>>>>> Max
>>>>>
>>>>> On 23.08.19 05:06, Kenneth Knowles wrote:
>>>>> > I will be at Beam Summit / ApacheCon NA and would love to drop by a
>>>>> > hackathon room if one is arranged. Really excited for both my first
>>>>> > ApacheCon and Beam Summit (finally!)
>>>>> >
>>>>> > Kenn
>>>>> >
>>>>> > On Thu, Aug 22, 2019 at 10:18 AM Austin Bennett
>>>>> > mailto:whatwouldausti...@gmail.com>>
>>>>> wrote:
>>>>> >
>>>>> > And, for clarity, especially focused on Hackathon times on Monday
>>>>> > and/or Tuesday of ApacheCon, to not conflict with BeamSummit
>>>>> sessions.
>>>>> >
>>>>> > On Thu, Aug 22, 2019 at 9:47 AM Austin Bennett
>>>>> > mailto:whatwouldausti...@gmail.com
>>>>> >>
>>>>> > wrote:
>>>>> >
>>>>> > Less than 3 weeks till Beam Summit @ApacheCon!
>>>>> >
>>>>> > We are to be in Vegas for BeamSummit and ApacheCon in a few
>>>>> weeks.
>>>>> >
>>>>> > Likely to reserve space in the Hackathon Room to accomplish
>>>>> some
>>>>> > tasks:
>>>>> > * Help Users
>>>>> > * Build Beam
>>>>> > * Collaborate with other projects
>>>>> > * etc
>>>>> >
>>>>> > If you're to be around (or not) let us know how you'd like
>>>>> to be
>>>>> > involved.  Also, please share and surface anything that
>>>>> would be
>>>>> > good for us to look at (and, esp. any beginner tasks, in
>>>>> case we
>>>>> > can entice some new contributors).
>>>>> >
>>>>> >
>>>>> > P.S.  See BeamSummit.org, if you're thinking of attending -
>>>>> > there's a discount code.
>>>>> >
>>>>>
>>>>


Re: Hackathon @BeamSummit @ApacheCon

2019-08-22 Thread Austin Bennett
And, for clarity, especially focused on Hackathon times on Monday and/or
Tuesday of ApacheCon, to not conflict with BeamSummit sessions.

On Thu, Aug 22, 2019 at 9:47 AM Austin Bennett 
wrote:

> Less than 3 weeks till Beam Summit @ApacheCon!
>
> We are to be in Vegas for BeamSummit and ApacheCon in a few weeks.
>
> Likely to reserve space in the Hackathon Room to accomplish some tasks:
> * Help Users
> * Build Beam
> * Collaborate with other projects
> * etc
>
> If you're to be around (or not) let us know how you'd like to be
> involved.  Also, please share and surface anything that would be good for
> us to look at (and, esp. any beginner tasks, in case we can entice some new
> contributors).
>
>
> P.S.  See BeamSummit.org, if you're thinking of attending - there's a
> discount code.
>


Hackathon @BeamSummit @ApacheCon

2019-08-22 Thread Austin Bennett
Less than 3 weeks till Beam Summit @ApacheCon!

We are to be in Vegas for BeamSummit and ApacheCon in a few weeks.

Likely to reserve space in the Hackathon Room to accomplish some tasks:
* Help Users
* Build Beam
* Collaborate with other projects
* etc

If you're to be around (or not) let us know how you'd like to be involved.
Also, please share and surface anything that would be good for us to look
at (and, esp. any beginner tasks, in case we can entice some new
contributors).


P.S.  See BeamSummit.org, if you're thinking of attending - there's a
discount code.


Re: Live fixing of a Beam bug on July 25 at 3:30pm-4:30pm PST

2019-07-23 Thread Austin Bennett
Pablo,

Assigned  https://issues.apache.org/jira/browse/BEAM-7607 to you, to make
even more likely that it is still around on the 25th :-)

Cheers,
Austin

On Tue, Jul 23, 2019 at 11:24 AM Pablo Estrada  wrote:

> Hi all,
> I've just realized that https://issues.apache.org/jira/browse/BEAM-7607 is
> a single-line change - and we'd spend 40 minutes chitchatting, so I'll also
> be working on https://jira.apache.org/jira/browse/BEAM-7803, which is a
> Python issue (also for the BigQuery sink!).
> Thanks!
> -P.
>
> On Sat, Jul 20, 2019 at 2:05 PM Pablo Estrada  wrote:
>
>> Hello all,
>>
>> This will be streamed on youtube on this link:
>> https://www.youtube.com/watch?v=xpIpEO4PUDo
>>
>> I think there will be a live chat, so I will hopefully be available to
>> answer questions. To be honest, my workflow is not super efficient, but...
>> oh well, hopefully it will be at least somewhat helpful to others : )
>> Best
>> -P.
>>
>> On Thu, Jul 18, 2019 at 12:59 AM Tim Sell  wrote:
>>
>>> +1, I'd love to see this as a recording. Will you stick it up on youtube
>>> afterwards?
>>>
>>> On Thu, Jul 18, 2019 at 4:00 AM sridhar inuog 
>>> wrote:
>>>
 Thanks, Pablo! Looking forward to it! Hopefully, it will also be
 recorded as well.

 On Wed, Jul 17, 2019 at 2:50 PM Pablo Estrada 
 wrote:

> Yes! So I will be working on a small feature request for Java's
> BigQueryIO: https://issues.apache.org/jira/browse/BEAM-7607
>
> Maybe I'll do something for Python next month. : )
> Best
> -P.
>
> On Wed, Jul 17, 2019 at 12:32 PM Rakesh Kumar 
> wrote:
>
>> +1, I really appreciate this initiative. It would be really helpful
>> newbies like me.
>>
>> Is it possible to list out what are the things that you are planning
>> to cover?
>>
>>
>>
>>
>> On Tue, Jul 16, 2019 at 11:19 AM Yichi Zhang 
>> wrote:
>>
>>> Thanks for organizing this Pablo, it'll be very helpful!
>>>
>>> On Tue, Jul 16, 2019 at 10:57 AM Pablo Estrada 
>>> wrote:
>>>
 Hello all,
 I'll be having a session where I live-fix a Beam bug for 1 hour
 next week. Everyone is invited.

 It will be on July 25, between 3:30pm and 4:30pm PST. Hopefully I
 will finish a full change in that time frame, but we'll see.

 I have not yet decided if I will do this via hangouts, or via a
 youtube livestream. In any case, I will share the link here in the 
 next few
 days.

 I will most likely work on the Java SDK (I have a little feature
 request in mind).

 Thanks!
 -P.

>>>


Re: Beam Summit at ApacheCon

2019-05-11 Thread Austin Bennett
The paper submission deadline doesn't have a concrete time.  Morning
pacific time is all that is on their website, we're doing this
collaboratively, thus we (Beam) doesn't have full control over everything.

>From what I have seen elsewhere, it will be cut when the guy managing this
comes into the office for the day not necessarily a specific time on the
clock.



On Sat, May 11, 2019 at 9:54 AM Suneel Marthi 
wrote:

> Could u please further quantify the 'morning pacific time' part of it?
> Its just not clear what the deadline is now from that.
>
> On Sat, May 11, 2019 at 12:47 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> Hi All,
>>
>> Deadline for CfP is the morning of 13 May (this Monday)  Pacific Time, as
>> decided by ApacheCon.  Please submit if you have anything.  Also, do write
>> if you have questions/concerns, etc.
>>
>> Cheers,
>> Austin
>>
>>
>>
>> On Tue, Apr 30, 2019 at 7:59 AM Austin Bennett <
>> whatwouldausti...@gmail.com> wrote:
>>
>>> Hi Users and Devs,
>>>
>>> The CfP deadline approaches.  Do submit your technical and/or use case
>>> talks, etc etc.  Feel free to reach out if you have any questions.
>>>
>>> Cheers,
>>> Austin
>>>
>>> On Tue, Apr 23, 2019 at 2:49 AM Maximilian Michels 
>>> wrote:
>>>
>>>> Hi Austin,
>>>>
>>>> Thanks for the heads-up! I just want to highlight that this is a great
>>>> chance for Beam. There will be a _dedicated_ Beam track which means
>>>> that
>>>> there is potential for lots of new people to learn about Beam. Of
>>>> course, there will also be many people already involved in Beam.
>>>>
>>>> -Max
>>>>
>>>> On 23.04.19 02:47, Austin Bennett wrote:
>>>> > Beam Summit will be at ApacheCon this year -- please consider
>>>> submitting!
>>>> >
>>>> > Dates for Beam Summit 11 and 12 September 2019.  There are other
>>>> tracks
>>>> > at ApacheCon during this and on other dates too.
>>>> >
>>>> > https://www.apachecon.com/acna19/cfp.html
>>>> >
>>>> >
>>>>
>>>


Re: Beam Summit at ApacheCon

2019-05-11 Thread Austin Bennett
Hi All,

Deadline for CfP is the morning of 13 May (this Monday)  Pacific Time, as
decided by ApacheCon.  Please submit if you have anything.  Also, do write
if you have questions/concerns, etc.

Cheers,
Austin



On Tue, Apr 30, 2019 at 7:59 AM Austin Bennett 
wrote:

> Hi Users and Devs,
>
> The CfP deadline approaches.  Do submit your technical and/or use case
> talks, etc etc.  Feel free to reach out if you have any questions.
>
> Cheers,
> Austin
>
> On Tue, Apr 23, 2019 at 2:49 AM Maximilian Michels  wrote:
>
>> Hi Austin,
>>
>> Thanks for the heads-up! I just want to highlight that this is a great
>> chance for Beam. There will be a _dedicated_ Beam track which means that
>> there is potential for lots of new people to learn about Beam. Of
>> course, there will also be many people already involved in Beam.
>>
>> -Max
>>
>> On 23.04.19 02:47, Austin Bennett wrote:
>> > Beam Summit will be at ApacheCon this year -- please consider
>> submitting!
>> >
>> > Dates for Beam Summit 11 and 12 September 2019.  There are other tracks
>> > at ApacheCon during this and on other dates too.
>> >
>> > https://www.apachecon.com/acna19/cfp.html
>> >
>> >
>>
>


Re: Apache BEAM on Flink in production

2019-05-07 Thread Austin Bennett
On the Beam YouTube channel:
https://www.youtube.com/channel/UChNnb_YO_7B0HlW6FhAXZZQ you can see two
talks from people at Lyft; they use Beam on Flink.

Other users can also chime in as to how they are running.

Would also suggest coming to BeamSummit.org in Berlin in June and/or
sharing experiences or coming to ApacheCon in September, where we are to
have 2 tracks in each of 2 days focused on Beam
https://www.apachecon.com/acna19/index.html




On Tue, May 7, 2019 at 6:52 AM  wrote:

> Hi all,
>
>
>
> We currently run Apache Flink based data load processes (fairly simple
> streaming ETL jobs) and are looking at converting to Apache BEAM to give
> more flexibility on the runner.
>
>
>
> Is anyone aware of any organisations running Apache BEAM on Flink in
> production. Does anyone have any case studies they would be able to share?
>
>
>
> Many thanks,
>
>
>
> Steve
>
> This communication and any attachments are confidential and intended
> solely for the addressee. If you are not the intended recipient please
> advise us immediately and delete it. Unless specifically stated in the
> message or otherwise indicated, you may not duplicate, redistribute or
> forward this message and any attachments are not intended for distribution
> to, or use by any person or entity in any jurisdiction or country where
> such distribution or use would be contrary to local law or regulation.
> NatWest Markets Plc  or any affiliated entity ("NatWest Markets") accepts
> no responsibility for any changes made to this message after it was sent.
> Unless otherwise specifically indicated, the contents of this
> communication and its attachments are for information purposes only and
> should not be regarded as an offer or solicitation to buy or sell a product
> or service, confirmation of any transaction, a valuation, indicative price
> or an official statement. Trading desks may have a position or interest
> that is inconsistent with any views expressed in this message. In
> evaluating the information contained in this message, you should know that
> it could have been previously provided to other clients and/or internal
> NatWest Markets personnel, who could have already acted on it.
> NatWest Markets cannot provide absolute assurances that all electronic
> communications (sent or received) are secure, error free, not corrupted,
> incomplete or virus free and/or that they will not be lost, mis-delivered,
> destroyed, delayed or intercepted/decrypted by others. Therefore NatWest
> Markets disclaims all liability with regards to electronic communications
> (and the contents therein) if they are corrupted, lost destroyed, delayed,
> incomplete, mis-delivered, intercepted, decrypted or otherwise
> misappropriated by others.
> Any electronic communication that is conducted within or through NatWest
> Markets systems will be subject to being archived, monitored and produced
> to regulators and in litigation in accordance with NatWest Markets’ policy
> and local laws, rules and regulations. Unless expressly prohibited by local
> law, electronic communications may be archived in countries other than the
> country in which you are located, and may be treated in accordance with the
> laws and regulations of the country of each individual included in the
> entire chain.
> Copyright NatWest Markets plc. All rights reserved. See
> http://www.natwestmarkets.com/legal/s-t-discl.html for further risk
> disclosure.
>


Re: Beam Summit at ApacheCon

2019-04-30 Thread Austin Bennett
Hi Users and Devs,

The CfP deadline approaches.  Do submit your technical and/or use case
talks, etc etc.  Feel free to reach out if you have any questions.

Cheers,
Austin

On Tue, Apr 23, 2019 at 2:49 AM Maximilian Michels  wrote:

> Hi Austin,
>
> Thanks for the heads-up! I just want to highlight that this is a great
> chance for Beam. There will be a _dedicated_ Beam track which means that
> there is potential for lots of new people to learn about Beam. Of
> course, there will also be many people already involved in Beam.
>
> -Max
>
> On 23.04.19 02:47, Austin Bennett wrote:
> > Beam Summit will be at ApacheCon this year -- please consider submitting!
> >
> > Dates for Beam Summit 11 and 12 September 2019.  There are other tracks
> > at ApacheCon during this and on other dates too.
> >
> > https://www.apachecon.com/acna19/cfp.html
> >
> >
>


Beam Summit at ApacheCon

2019-04-22 Thread Austin Bennett
Beam Summit will be at ApacheCon this year -- please consider submitting!

Dates for Beam Summit 11 and 12 September 2019.  There are other tracks at
ApacheCon during this and on other dates too.

https://www.apachecon.com/acna19/cfp.html


Re: kafka 0.9 support

2019-04-02 Thread Austin Bennett
I withdraw my concern -- checked on info on the cluster I will eventually
access.  It is on 0.8, so I was speaking too soon.  Can't speak to rest of
user base.

On Tue, Apr 2, 2019 at 11:03 AM Raghu Angadi  wrote:

> Thanks to David Morávek for pointing out possible improvement to KafkaIO
> for dropping support for 0.9 since it avoids having a second consumer just
> to fetch latest offsets for backlog.
>
> Ideally we should be dropping 0.9 support for next major release, in fact
> better to drop versions before 0.10.1 at the same time. This would further
> reduce reflection based calls for supporting multiple versions. If the
> users still on 0.9 could stay on current stable release of Beam, dropping
> would not affect them. Otherwise, it would be good to hear from them about
> how long we need to keep support for old versions.
>
> I don't think it is good idea to have multiple forks of KafkaIO in the
> same repo. If we do go that route, we should fork the entire kafka
> directory and rename the main class KafkaIO_Unmaintained :).
>
> IMHO, so far, additional complexity for supporting these versions is not
> that bad. Most of it is isolated to ConsumerSpEL.java & ProducerSpEL.java.
> My first preference is dropping support for deprecated versions (and a
> deprecate a few more versions, may be till the version that added
> transactions around 0.11.x I think).
>
> I haven't looked into what's new in Kafka 2.x. Are there any features that
> KafkaIO should take advantage of? I have not noticed our existing code
> breaking. We should certainly certainly support latest releases of Kafka.
>
> Raghu.
>
> On Tue, Apr 2, 2019 at 10:27 AM Mingmin Xu  wrote:
>
>>
>> We're still using Kafka 0.10 a lot, similar as 0.9 IMO. To expand
>> multiple versions in KafkaIO is quite complex now, and it confuses users
>> which is supported / which is not. I would prefer to support Kafka 2.0+
>> only in the latest version. For old versions, there're some options:
>> 1). document Kafka-Beam support versions, like what we do in FlinkRunner;
>> 2). maintain separated KafkaIOs for old versions;
>>
>> 1) would be easy to maintain, and I assume there should be no issue to
>> use Beam-Core 3.0 together with KafkaIO 2.0.
>>
>> Any thoughts?
>>
>> Mingmin
>>
>> On Tue, Apr 2, 2019 at 9:56 AM Reuven Lax  wrote:
>>
>>> KafkaIO is marked as Experimental, and the comment already warns that
>>> 0.9 support might be removed. I think that if users still rely on Kafka 0.9
>>> we should leave a fork (renamed) of the IO in the tree for 0.9, but we can
>>> definitely remove 0.9 support from the main IO if we want, especially if
>>> it's complicated changes to that IO. If we do though, we should fail with a
>>> clear error message telling users to use the Kafka 0.9 IO.
>>>
>>> On Tue, Apr 2, 2019 at 9:34 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
>>>> > How are multiple versions of Kafka supported? Are they all in one
>>>> client, or is there a case for forks like ElasticSearchIO?
>>>>
>>>> They are supported in one client but we have additional “ConsumerSpEL”
>>>> adapter which unifies interface difference among different Kafka client
>>>> versions (mostly to support old ones 0.9-0.10.0).
>>>>
>>>> On the other hand, we warn user in Javadoc of KafkaIO (which is
>>>> Unstable, btw) by the following:
>>>> *“KafkaIO relies on kafka-clients for all its interactions with the
>>>> Kafka cluster.**kafka-clients versions 0.10.1 and newer are supported
>>>> at runtime. The older versions 0.9.x **- 0.10.0.0 are also supported,
>>>> but are deprecated and likely be removed in near future.”*
>>>>
>>>> Despite the fact that, personally, I’d prefer to have only one unified
>>>> client interface but, since people still use Beam with old Kafka instances,
>>>> we, likely, should stick with it till Beam 3.0.
>>>>
>>>> WDYT?
>>>>
>>>> On 2 Apr 2019, at 02:27, Austin Bennett 
>>>> wrote:
>>>>
>>>> FWIW --
>>>>
>>>> On my (desired, not explicitly job-function) roadmap is to tap into a
>>>> bunch of our corporate Kafka queues to ingest that data to places I can
>>>> use.  Those are 'stuck' 0.9, with no upgrade in sight (am told the upgrade
>>>> path isn't trivial, is very critical flows, and they are scared for it to
>>>> break, so it just sits behind firewalls, etc).  But, I wouldn't begin that
>>>&g

Re: kafka 0.9 support

2019-04-01 Thread Austin Bennett
FWIW --

On my (desired, not explicitly job-function) roadmap is to tap into a bunch
of our corporate Kafka queues to ingest that data to places I can use.
Those are 'stuck' 0.9, with no upgrade in sight (am told the upgrade path
isn't trivial, is very critical flows, and they are scared for it to break,
so it just sits behind firewalls, etc).  But, I wouldn't begin that for
probably at least another quarter.

I don't contribute to nor understand the burden of maintaining the support
for the older version, so can't reasonably lobby for that continued pain.

Anecdotally, this could be a place many enterprises are at (though I also
wonder whether many of the people that would be 'stuck' on such versions
would also have Beam on their current radar).


On Mon, Apr 1, 2019 at 2:29 PM Kenneth Knowles  wrote:

> This could be a backward-incompatible change, though that notion has many
> interpretations. What matters is user pain. Technically if we don't break
> the core SDK, users should be able to use Java SDK >=2.11.0 with KafkaIO
> 2.11.0 forever.
>
> How are multiple versions of Kafka supported? Are they all in one client,
> or is there a case for forks like ElasticSearchIO?
>
> Kenn
>
> On Mon, Apr 1, 2019 at 10:37 AM Jean-Baptiste Onofré 
> wrote:
>
>> +1 to remove 0.9 support.
>>
>> I think it's more interesting to test and verify Kafka 2.2.0 than 0.9 ;)
>>
>> Regards
>> JB
>>
>> On 01/04/2019 19:36, David Morávek wrote:
>> > Hello,
>> >
>> > is there still a reason to keep Kafka 0.9 support? This unfortunately
>> > adds lot of complexity to KafkaIO implementation.
>> >
>> > Kafka 0.9 was released on Nov 2015.
>> >
>> > My first shot on removing Kafka 0.9 support would remove second
>> > consumer, which is used for fetching offsets.
>> >
>> > WDYT? Is this support worth keeping?
>> >
>> > https://github.com/apache/beam/pull/8186
>> >
>> > D.
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


Re: Beam Meetups Feb 2019

2019-03-12 Thread Austin Bennett
Hi Teja and All,

The video recordings from the recent SF meetup have been posted to the Beam
YouTube channel (thanks, Matthias!).

Links:
General Beam YouTube:
https://www.youtube.com/channel/UChNnb_YO_7B0HlW6FhAXZZQ

*Beam Introduction*:  https://www.youtube.com/watch?v=Ao2NM8rvKZY
*TFX*:  https://www.youtube.com/watch?v=2MDEkn6_Mig
*Python Streaming Pipelines with Beam on Flink*:
https://www.youtube.com/watch?v=4jXQtt1McvM
*Dynamic Pricing of Lyft Rides using Streaming*:
https://www.youtube.com/watch?v=oQHyLfiv8Aw

Cheers,
Austin

On Mon, Feb 11, 2019 at 10:32 PM Teja MVSR  wrote:

> Hi,
>
> Can you please provide any video recordings if they are available?
>
> Thanks,
> Teja
>
> On Mon, Feb 11, 2019, 4:51 PM Austin Bennett  wrote:
>
>> The slides from Tyler's presentation found:
>> http://s.apache.org/beam-intro-feb-2019
>>
>> I'll also send out links to videos once I get my hands on them (@Mark
>> Grover  ).
>>
>> On Mon, Feb 11, 2019 at 9:48 AM Thomas Weise  wrote:
>>
>>> Here are slides for 2 of the presentations from the Lyft meetup:
>>>
>>> Python/Flink/Streaming: http://go.lyft.com/python-flink-beam-meetup-2019
>>> Use Case:
>>> https://www.slideshare.net/AmarPai2/dynamic-pricing-of-lyft-rides-using-streaming
>>>
>>> +Tyler Akidau  do you have pointers for the others
>>> by chance?
>>>
>>>
>>> On Fri, Feb 8, 2019 at 4:22 PM Kenneth Knowles  wrote:
>>>
>>>> Yea, wow. 300 is huge! Nice. Looking forward to the Feb 21 meetup.
>>>>
>>>> Kenn
>>>>
>>>> On Fri, Feb 8, 2019 at 3:02 PM Matthias Baetens <
>>>> baetensmatth...@gmail.com> wrote:
>>>>
>>>>> Wow, that is awesome, Joana! Great job to everyone involved! :-)
>>>>>
>>>>> On Fri, 8 Feb 2019 at 22:42, Joana Filipa Bernardo Carrasqueira <
>>>>> joanafil...@google.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I would like to take a moment to acknowledge the fact that yesterday
>>>>>> we had nearly 300 people at the Beam Meetup at Lyft!
>>>>>>
>>>>>> It was a remarkable event with great presentations and engagement
>>>>>> from the audience! It's great to see the community growing!
>>>>>>
>>>>>> [image: Lyft.jpg]
>>>>>>
>>>>>> For those in the Seattle area on Feb 21st, we will host another Beam
>>>>>> Meetup so help us spreading the word! Check the details here
>>>>>> <https://www.meetup.com/seattle-apache-flink/events/258723322/>.
>>>>>>
>>>>>> Have a great weekend!
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Joana Carrasqueira*
>>>>>>
>>>>>> Cloud Developer Relations Events Manager
>>>>>>
>>>>>> 415-602-2507 Mobile
>>>>>>
>>>>>> 1160 N Mathilda Ave, Sunnyvale, CA 94089
>>>>>>
>>>>>>
>>>>>>


Re: Beam Summits!

2019-01-23 Thread Austin Bennett
Hi All,

PMC approval still pending for Summit in SF (so things may change), but
wanted to get a preliminary CfP out there to start to get sense of interest
-- giving the targeted dates are approaching.  Much of this
delay/uncertainty my fault and I should have done more before the holidays
and my long vacation in from end of December through mid-January.  This CfP
will remain open for some time, and upon/after approval will make sure to
give notice for a CfP deadline.

Please submit talks via:
https://docs.google.com/forms/d/e/1FAIpQLSfD0qhoS2QrDbtK1E85gATGQCgRGKhQcLIkiiAsPW9G_7Um_Q/viewform?usp=sf_link

Would very much encourage anyone that can lead hands-on/tutorials/workshops
for full day, half-day, focused couple hours, etc to apply, as well as any
technical talks and/or use cases.  Again, tentative dates(s) 3 and 4 April
2019.

Thanks,
Austin


On Mon, Jan 21, 2019 at 7:58 PM Austin Bennett 
wrote:

> Hi All,
>
> Other projects/Summits like Kafka and Spark offer add-on days to summits
> for training.  I'm wondering the appetite/interest for hands-on sessions
> for working with Beam, and whether we think that'd be helpful.  Are there
> people that would benefit from a beginning with Beam day, or a more
> advanced/specialized session.  This was on the original agenda for London,
> but hadn't materialized, seeing if we think there is interest to make this
> worth putting together/making-available.
>
> Furthermore, it had been mentioned that an introduction to contributing to
> Beam might also be beneficial.  Also curious to hear whether that would be
> of interest to people here (or for those that those here know, but aren't
> following these distribution channels for themselves -- since following dev@
> or even user@ is potentially a more focused selection of those with an
> interest in Beam.
>
> Thanks,
> Austin
>
>
>
> On Wed, Dec 19, 2018 at 3:05 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> Hi All,
>>
>> I really enjoyed Beam Summit in London (Thanks Matthias!), and there was
>> much enthusiasm for continuations.  We had selected that location in a
>> large part due to the growing community there, and we have users in a
>> variety of locations.  In our 2019 calendar,
>> https://docs.google.com/spreadsheets/d/1CloF63FOKSPM6YIuu8eExjhX6xrIiOp5j4zPbSg3Apo/
>> shared in the past weeks, 3 Summits are tentatively slotted for this year.
>> Wanting to start running this by the group to get input.
>>
>> * Beam Summit NA, in San Francisco, approx 3 April 2019 (following Flink
>> Forward).  I can organize.
>> * Beam Summit Europe, in Stockholm, this was the runner up in voting
>> falling behind London.  Or perhaps Berlin?  October-ish 2019
>> * Beam Summit Asia, in Tokyo ??
>>
>> What are general thoughts on locations/dates?
>>
>> Looking forward to convening in person soon.
>>
>> Cheers,
>> Austin
>>
>


Re: Beam Summits!

2019-01-21 Thread Austin Bennett
Hi All,

Other projects/Summits like Kafka and Spark offer add-on days to summits
for training.  I'm wondering the appetite/interest for hands-on sessions
for working with Beam, and whether we think that'd be helpful.  Are there
people that would benefit from a beginning with Beam day, or a more
advanced/specialized session.  This was on the original agenda for London,
but hadn't materialized, seeing if we think there is interest to make this
worth putting together/making-available.

Furthermore, it had been mentioned that an introduction to contributing to
Beam might also be beneficial.  Also curious to hear whether that would be
of interest to people here (or for those that those here know, but aren't
following these distribution channels for themselves -- since following dev@
or even user@ is potentially a more focused selection of those with an
interest in Beam.

Thanks,
Austin



On Wed, Dec 19, 2018 at 3:05 PM Austin Bennett 
wrote:

> Hi All,
>
> I really enjoyed Beam Summit in London (Thanks Matthias!), and there was
> much enthusiasm for continuations.  We had selected that location in a
> large part due to the growing community there, and we have users in a
> variety of locations.  In our 2019 calendar,
> https://docs.google.com/spreadsheets/d/1CloF63FOKSPM6YIuu8eExjhX6xrIiOp5j4zPbSg3Apo/
> shared in the past weeks, 3 Summits are tentatively slotted for this year.
> Wanting to start running this by the group to get input.
>
> * Beam Summit NA, in San Francisco, approx 3 April 2019 (following Flink
> Forward).  I can organize.
> * Beam Summit Europe, in Stockholm, this was the runner up in voting
> falling behind London.  Or perhaps Berlin?  October-ish 2019
> * Beam Summit Asia, in Tokyo ??
>
> What are general thoughts on locations/dates?
>
> Looking forward to convening in person soon.
>
> Cheers,
> Austin
>


Re: Beam courses

2019-01-14 Thread Austin Bennett
Hi Alex,

I'm certainly interested in helping more people use beam (and beyond
beginner level).  I believe there are people that can help as have already
been mentioned in this thread, I am also happy to help create training
materials for people as we identify areas that are in need.  Have discussed
a cookbook (and started drafting what would be needed for a 'beam: up and
running' tome), but what you mention might need to be done by (hopefully
not).

We have also discussed having some hands-on training at beam summits, so
perhaps your need can help motivate getting that kicked off here in SF, and
then more substantially at eu summit, locally for your needs.  I know that
helped with my spark usage (attending training tied to various spark
summit).

Cheers,
Austin


On Mon, Jan 14, 2019, 7:31 AM Maximilian Michels  Hi Alex,
>
> I know of
> http://www.bigdatainstitute.io/courses/data-engineering-with-apache-beam/
>
> There is also some public materials by Jesse (in CC):
> https://github.com/eljefe6a/beamexample
> This training uses the above exercises:
>
> https://docs.google.com/presentation/d/1ln5KndBTiskEOGa1QmYSCq16YWO9Dtmj7ZwzjU7SsW4
>
> Overall, this is more for beginners and some of the newer features like
> user
> state and timers are missing. I think it could be interesting to
> collaborate on
> an updated in-depth training.
>
> Best,
> Max
>
> On 14.01.19 01:47, Davor Bonaci wrote:
> > I'll introduce you to folks who can do this for you off-list.
> >
> > On Sun, Jan 13, 2019 at 12:28 PM Vikram Tiwari  > > wrote:
> >
> > Hey! I think he mentioned it to me once that they do trainings for
> Beam etc.
> > Might wanna talk to him.
> > https://www.linkedin.com/in/dbonaci
> >
> >
> > On Sun, Jan 13, 2019, 12:08 PM Alex Van Boxel  > > wrote:
> >
> > Hey all,
> >
> > Our team had the luxury of growing with Beam, we where Dataflow
> users
> > before it was GA. But now our team has grown, due to a merger.
> >
> > As we will continue using Beam, but then over different sites I'm
> > thinking about training. The question is... Should I create
> trainings
> > myself. Or do people specialise in Beam training? I'm not
> talking about
> > some simple getting started training... I want deep training.
> >
> > Any suggestions how people in this group do trainings?
> >
>


Re: Beam Summits!

2019-01-03 Thread Austin Bennett
Hi Matthias, etc,

Trying to get thoughts on formalizing a process for getting proposals
together.  I look forward to the potential day that there are many people
that want (rather than just willing) to host a summit in a given region in
a given year.  Perhaps too forward looking.

Also, you mentioned planning London wound up with a tight time window.  If
shooting for April in SF, seems  the clock might be starting to tick.  Any
advice for how much time needed?  And guidance on getting whatever formal
needed through Apache - and does this also necessarily involve a Beam PMC
or community vote (probably more related to the first paragraph)?

Thanks,
Austin

On Thu, Dec 20, 2018, 1:09 AM Matthias Baetens  Great stuff, thanks for the overview, Austin.
>
> For EU, there are things to say for both Stockholm and Berlin, but I think
> it makes sense to do it on the back of another conference (larger chance of
> people being in town with the same interest). I like Thomas comment - we
> will attract more people from the US if we don't let it conflict with the
> big events there. +1 for doing it around the time of Berlin Buzzwords.
>
> For Asia, I'd imagine Singapore would be an option as well. I'll reach out
> to some people that are based there to get a grasp on the size of the
> community there.
>
> Best,
> -M
>
>
>
> On Thu, 20 Dec 2018 at 05:08, Thomas Weise  wrote:
>
>> I think for EU there is a proposal to have it next to Berlin Buzzwords in
>> June. That would provide better spacing and avoid conflict with ApacheCon.
>>
>> Thomas
>>
>>
>> On Wed, Dec 19, 2018 at 3:09 PM Suneel Marthi  wrote:
>>
>>> How about Beam Summit in Berlin on Sep 6 immediately following Flink
>>> Forward Berlin on the previous 2 days.
>>>
>>> Same may be for Asia also following Flink Forward Asia where and
>>> whenever it happens.
>>>
>>> On Wed, Dec 19, 2018 at 6:06 PM Austin Bennett <
>>> whatwouldausti...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I really enjoyed Beam Summit in London (Thanks Matthias!), and there
>>>> was much enthusiasm for continuations.  We had selected that location in a
>>>> large part due to the growing community there, and we have users in a
>>>> variety of locations.  In our 2019 calendar,
>>>> https://docs.google.com/spreadsheets/d/1CloF63FOKSPM6YIuu8eExjhX6xrIiOp5j4zPbSg3Apo/
>>>> shared in the past weeks, 3 Summits are tentatively slotted for this year.
>>>> Wanting to start running this by the group to get input.
>>>>
>>>> * Beam Summit NA, in San Francisco, approx 3 April 2019 (following
>>>> Flink Forward).  I can organize.
>>>> * Beam Summit Europe, in Stockholm, this was the runner up in voting
>>>> falling behind London.  Or perhaps Berlin?  October-ish 2019
>>>> * Beam Summit Asia, in Tokyo ??
>>>>
>>>> What are general thoughts on locations/dates?
>>>>
>>>> Looking forward to convening in person soon.
>>>>
>>>> Cheers,
>>>> Austin
>>>>
>>>


Re: 2019 Beam Events

2018-12-04 Thread Austin Bennett
Already got that process kicked off with the NY and LA meet ups, now that
SF is about to be inagurated goal will be to get these moving as well.

For anyone that is in (or goes to) those areas:
https://www.meetup.com/New-York-Apache-Beam/
https://www.meetup.com/Los-Angeles-Apache-Beam/

Please reach out to get involved!



On Tue, Dec 4, 2018 at 3:13 PM Griselda Cuevas  wrote:

> +1 to Pablo's suggestion, if there's interest in "Founding a Meetup group
> in a particular city, let's create the Meetup page and start getting sign
> ups. Joana will be reaching out with a comprenhexive list of how to get
> started and we're hoping to compile a high level calendar of
> launches/announcements to feed into your meetup.
>
> G
>
> On Tue, 4 Dec 2018 at 12:04, Daniel Salerno  wrote:
>
>> =)
>> What good news!
>> Okay, I'll set up the group and try to get interested.
>> Thank you
>>
>>
>> Em ter, 4 de dez de 2018 às 17:19, Pablo Estrada 
>> escreveu:
>>
>>> FWIW, for some of these places that have interest (e.g. Brazil, Israel),
>>> it's possible to create a group in meetup.com, and start gauging
>>> interest, and looking for organizers.
>>> Once a group of people with interest exists, it's easier to get interest
>>> / sponsorship to bring speakers.
>>> So if you are willing to create the group in meetup, Daniel, we can
>>> monitor it and try to plan something as it grows : )
>>> Best
>>> -P.
>>>
>>> On Tue, Dec 4, 2018 at 10:55 AM Daniel Salerno 
>>> wrote:
>>>

 It's a shame that there are no events in Brazil ...

 =(

 Em ter, 4 de dez de 2018 às 13:12, OrielResearch Eila Arich-Landkof <
 e...@orielresearch.org> escreveu:

> agree 
>
> On Tue, Dec 4, 2018 at 5:41 AM Chaim Turkel  wrote:
>
>> Israel would be nice to have one
>> chaim
>> On Tue, Dec 4, 2018 at 12:33 AM Griselda Cuevas 
>> wrote:
>> >
>> > Hi Beam Community,
>> >
>> > I started curating industry conferences, meetups and events that
>> are relevant for Beam, this initial list I came up with. I'd love your 
>> help
>> adding others that I might have overlooked. Once we're satisfied with the
>> list, let's re-share so we can coordinate proposal submissions, 
>> attendance
>> and community meetups there.
>> >
>> >
>> > Cheers,
>> >
>> > G
>> >
>> >
>> >
>>
>> --
>>
>>
>> Loans are funded by
>> FinWise Bank, a Utah-chartered bank located in Sandy,
>> Utah, member FDIC, Equal
>> Opportunity Lender. Merchant Cash Advances are
>> made by Behalf. For more
>> information on ECOA, click here
>> . For important information
>> about
>> opening a new
>> account, review Patriot Act procedures here
>> .
>> Visit Legal
>>  to
>> review our comprehensive program terms,
>> conditions, and disclosures.
>>
>
>
> --
> Eila
> www.orielresearch.org
> https://www.meetu
> p.co
> 
> m/Deep-Learning-In-Production/
> 
>
>
>


Bay Area Apache Beam Kickoff!

2018-11-19 Thread Austin Bennett
We have our first meetup scheduled for December 12th in San Francisco.

Andrew Pilloud, a software engineer at Google and Beam committer, will demo
the latest feature in Beam SQL: a standalone SQL shell. The talk cover why
SQL is a good fit for streaming data processing, the technical details of
the Beam SQL engine, and a peek into our future plans.

Kenn Knowles, a founding PMC Member and incoming PMC Chair for the Apache
Beam project, as well as computer scientist and engineer at Google will
share about all things Beam. Where it is, where its been, where its going.

More info:
https://www.meetup.com/San-Francisco-Apache-Beam/events/256348972/

For those in/around town (or that can be) come join in the fun!


Re: FlinkRunner JAAS verify failed in Flink cluster

2018-11-06 Thread Austin Bennett
Related to another thread:

Is there a value in posting issues that get put here (with follow up
solutions, like this thread, which indeed was excellent to have shared the
solution with the list) in Stack Overflow?  Again, for ease of
discoverability, for those that face similar issues.  Not sure how would
formalize, but bringing up nonetheless.



On Tue, Nov 6, 2018 at 1:49 AM Maximilian Michels  wrote:

> Hi Fred,
>
> I see! Thanks for posting your solution here.
>
> Best,
> Max
>
> On 06.11.18 03:49, K Fred wrote:
> > Hi Max,
> >
> > I have resolved this issue. It's caused by the flink cluster kerberos
> > configuration. Just need to set some config on flink-conf.yaml can make
> > it work fine!
> >
> > The settings is below:
> >
> > security.kerberos.login.use-ticket-cache: false
> > security.kerberos.login.keytab: /etc/kafka/kafka.keytab
> > security.kerberos.login.principal: ka...@hadoop.com
> > 
> > security.kerberos.login.contexts: Client,KafkaClient
> >
> >
> > Thanks,
> > Fred.
> >
> > On Tue, Nov 6, 2018 at 2:56 AM Maximilian Michels  > > wrote:
> >
> > Hi Fred,
> >
> > Just to double check: Are you running this from a cluster or your
> local
> > machine? Asking because the stack trace indicates that the exception
> > occurs during job submission through the Flink command-line client.
> So
> > the machine you're running this on should also have the file located
> in
> > /etc.
> >
> > Thanks,
> > Max
> >
> > On 05.11.18 12:26, K Fred wrote:
> >  > Hi Max,
> >  >
> >  > Yeah, The config is always located on the remote cluster. The
> > exception
> >  > looks like that my application can find the config file, but
> > cannot find
> >  > out the config's KafkaClient entry. So i guess the reason may be
> > related
> >  > to flink cluster some settings!
> >  >
> >  > /These code depict some stack trace below:/
> >  >
> >
>  -
> >  > The program finished with the following exception:
> >  >
> >  > org.apache.flink.client.program.ProgramInvocationException: The
> main
> >  > method caused an error.
> >  > at
> >  >
> >
>  
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
> >  > at
> >  >
> >
>  
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
> >  > at
> >
>  org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
> >  > at
> >  >
> >
>  org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:785)
> >  > at
> >
>  org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:279)
> >  > at
> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
> >  > at
> >  >
> >
>  
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
> >  > at
> >  >
> >
>  org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
> >  > at
> >  >
> >
>  
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> >  > at
> > org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
> >  > Caused by: java.lang.RuntimeException: Error while translating
> >  > UnboundedSource:
> > org.apache.beam.sdk.io.kafka.KafkaUnboundedSource@7bc6d27a
> >  > at
> >  >
> >
>  
> org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$UnboundedReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:225)
> >  > at
> >  >
> >
>  
> org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$ReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:273)
> >  > at
> >  >
> >
>  
> org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.applyStreamingTransform(FlinkStreamingPipelineTranslator.java:122)
> >  > at
> >  >
> >
>  
> org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:101)
> >  > at
> >  >
> >
>  
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
> >  > at
> >  >
> >
>  
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
> >  > at
> >  >
> >
>  
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
> >  > at
> >  >
> >
>  
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
> >  > at
> >  >
> >
>  
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
> >  > at
> > org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
> >  > at
> >  >
> >
>  
> 

Growing Beam -- A call for ideas? What is missing? What would be good to see?

2018-10-25 Thread Austin Bennett
Hi Beam Devs and Users,

Trying to get a sense from the community on the sorts of things we think
would be useful to build the community (I am thinking not from an angle of
specific code/implementation/functionality, but from a user/usability -- I
want to dive in and make real contributions with the code, too, but know I
also have the interest and skills to help with education and community
aspects, hence my focus on this).

I had previously suggested a sort of cookbook for focused and curated
examples (code and explination) to help people get started, on-boarding,
using Beam to aid getting up and running and accomplishing something
worthwhile (and quickly), that seems one way to help grow our user base
(and maybe future dev base afterwards those users become enamored), which
did get some positive feedback when first put out there.

There are many other areas where featuring others sharing successes from
having used Beam or little tips can be valuable, Pablo's Awesome Beam is
one example of such a collection: https://github.com/pabloem/awesome-beam
or even centralizing a general place to find any/all Beam
blogs/shared-code/writeups/etc.

Certainly there is a place for all sorts of contributions and resources.
What do people on these lists think would be particularly useful?  Trying
to get a more focused sense of where we think efforts might be best
focused.

Please share anything (even semi-)related!?

Thanks,
Austin


P.S.  I realize that those following this list are rather self selecting as
well, so this might not be the best forum to figure out what new/novice
users need, but I would like to hear what everyone else here thinks could
be useful.


Re: SF Meetup(s)

2018-10-08 Thread Austin Bennett
Great!  Given the responses, seems a wealth of suitable locations.  It
sounds like this would be great to be a roaming meet up, to not be too tied
to SF or peninsula location nor a specific company/office.  I'll be in
touch with the individuals who responded to me (both on-list and off), with
aims get this off the ground in the not too distant future.

Especially @users: who is willing to speak and share what fantastic things
are getting accomplished with Beam!?

On Fri, Oct 5, 2018, 1:43 PM Ahmet Altay  wrote:

> I checked. We can host a meetup in Google's San Francisco or Sunnyvale
> offices. I can help with planning while Gris is out.
>
> On Fri, Oct 5, 2018 at 1:07 PM, Ahmet Altay  wrote:
>
>> Gris is out of office for a week I believe. We should be able to help
>> with hosting a meetup in Sunnyvale, not sure about the city. I will check
>> and update here.
>>
>> On Fri, Oct 5, 2018 at 1:02 PM, Thomas Weise  wrote:
>>
>>> Thanks for the initiative.
>>>
>>> Lyft may be able to help with hosting and I can help with talks. I will
>>> check and circle back.
>>>
>>> Thomas
>>>
>>>
>>> On Fri, Oct 5, 2018 at 8:48 AM Austin Bennett <
>>> whatwouldausti...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Looking to start organizing events for Beam around San Francisco.  On
>>>> the lookout for space -- anyone work for a company that could offer space
>>>> around the city (my company offices are in Foster City, a wholly
>>>> undesirable Meetup location)?
>>>>
>>>> Also, anyone Beam using that would be happy to share use cases?  And
>>>> devs that could talk.about experiences building things, etc?
>>>>
>>>> I imagine will prompt more and more, to get off the ground.  This is
>>>> just the beginning!
>>>>
>>>> Thanks,
>>>> Austin
>>>>
>>>>
>>>>
>>
>


Re: SF Meetup(s)

2018-10-05 Thread Austin Bennett
Ah, and link to the meet up so can join or be aware:
https://www.meetup.com/San-Francisco-Apache-Beam/



On Fri, Oct 5, 2018 at 8:47 AM Austin Bennett 
wrote:

> Hi All,
>
> Looking to start organizing events for Beam around San Francisco.  On the
> lookout for space -- anyone work for a company that could offer space
> around the city (my company offices are in Foster City, a wholly
> undesirable Meetup location)?
>
> Also, anyone Beam using that would be happy to share use cases?  And devs
> that could talk.about experiences building things, etc?
>
> I imagine will prompt more and more, to get off the ground.  This is just
> the beginning!
>
> Thanks,
> Austin
>
>
>


SF Meetup(s)

2018-10-05 Thread Austin Bennett
Hi All,

Looking to start organizing events for Beam around San Francisco.  On the
lookout for space -- anyone work for a company that could offer space
around the city (my company offices are in Foster City, a wholly
undesirable Meetup location)?

Also, anyone Beam using that would be happy to share use cases?  And devs
that could talk.about experiences building things, etc?

I imagine will prompt more and more, to get off the ground.  This is just
the beginning!

Thanks,
Austin


Re: [ACTION REQUESTED] What do you use Beam for?

2018-10-01 Thread Austin Bennett
Hi All,

In honor of Today's Beam Summit (where we'll gather feedback in-person,
too), looking for input.  How can we make Beam more usable?  Are you
struggling with anything?  Did you struggle with anything previously that
you could share to make things clearer/easier for people in the future?
Your feedback is valuable!

Cheerio,
Austin


On Thu, Sep 13, 2018 at 6:16 PM Rose Nguyen  wrote:

> Hi Beamers,
>
> As described in the Sep newsletter and initiated in this thread
> 
>  by
> Austin, we are starting efforts to create and curate Beam cookbook
> examples.
>
> To help us *identify* which content we should put our energy into
> creating for Beam users, we're focusing on collecting feedback on *common*
> /*tricky*/*difficult* tasks. Please fill out this survey; your responses
> will impact the direction of this community resource. You can submit
> multiple times about content that you believe is missing, but that you do
> not personally need. I'll send out another call for collating existing
> content and put down ideas we have about organization in a later thread.
>
> *Find the survey here: Beam User Tasks
> *
>
> Cheers!
> --
> Rose Thị Nguyễn
>


Re: [Discuss] Upgrade story for Beam's execution engines

2018-09-16 Thread Austin Bennett
Do we currently maintain a finer grained list of compatibility between
execution/runner versions and beam versions?  Is this only really a concern
with recent Flink (sounded like at least Spark jump, too)?  I see the
capability matrix:
https://beam.apache.org/documentation/runners/capability-matrix/, but some
sort of compatibility between runner versions with beam releases might be
useful.

I see compatibility matrix as far as beam features, but not for underlying
runners.  Ex: something like this would save a user trying to get Beam
working on recent Flink 1.6 and then subsequently hitting a (potentially
not well documented) wall given known issues.



On Sun, Sep 16, 2018 at 3:59 AM Maximilian Michels  wrote:

> > If I understand the LTS proposal correctly, then it will be a release
> line that continues to receive patches (as in semantic versioning), but no
> new features as that would defeat the purpose (stability).
>
> It matters insofar, as execution engine upgrades could be performed in
> the master but the LTS version won't receive them. So LTS is the go-to
> if you want to ensure compatibility with your existing setup.
>
> > To limit the pain of dealing with incompatible runner changes and copies
> within Beam, we should probably also work with the respective community to
> improve the compatibility story.
>
> Absolutely. If we find that we can improve compatibility with upstream
> changes, we should go that path. Even if we don't have a dedicated
> compatibility layer upstream yet.
>
> On 13.09.18 19:34, Thomas Weise wrote:
> >
> > On Thu, Sep 13, 2018 at 9:49 AM Maximilian Michels  > > wrote:
> >
> > Thank you for your comments. Let me try to summarize what has been
> > discussed so far:
> >
> > 1. The Beam LTS version will ensure a stable execution engine for as
> > long as the LTS life span.
> >
> >
> > If I understand the LTS proposal correctly, then it will be a release
> > line that continues to receive patches (as in semantic versioning), but
> > no new features as that would defeat the purpose (stability).
> >
> > If so, then I don't think LTS matters for this discussion.
> >
> > 2. We agree that pushing updates to the execution engine for the
> > Runners
> > is only desirable if it results in a better integration with the Beam
> > model or if it is necessary due security or performance reasons.
> >
> > 3. We might have to consider adding additional build targets for a
> > Runner for whenever the execution engine gets upgraded. This might be
> > really easy if the engine's API remains stable. It might also be
> > desirable if the upgrade path is not easy and not completely
> > foreseeable, e.g. Etienne mentioned Spark 1.x vs Spark 2.x Runner.
> The
> > Beam feature set could vary depending on the version.
> >
> >
> > To limit the pain of dealing with incompatible runner changes and copies
> > within Beam, we should probably also work with the respective community
> > to improve the compatibility story.
> >
> >
> > 4. In the long run, we want a stable abstraction layer for each
> Runner
> > that, ideally, is maintained by the upstream of the execution
> > engine. In
> > the short run, this is probably not realistic, as the shared
> libraries
> > of Beam are not stable enough.
> >
> >
> > Yes, that will only become an option once we reach interface stability.
> > Similar to how the runner projects maintain their IO connectors.
> >
> > On 13.09.18 14:39, Robert Bradshaw wrote:
> >  > The ideal long-term solution is, as Romain mentions, pushing the
> >  > runner-specific code up to be maintained by each runner with a
> > stable
> >  > API to use to talk to Beam. Unfortunately, I think we're still a
> > long
> >  > way from having this Stable API, or having the clout for
> >  > non-beam-developers to maintain these bindings externally (though
> >  > hopefully we'll get there).
> >  >
> >  > In the short term, we're stuck with either hurting users that
> > want to
> >  > stick with Flink 1.5, hurting users that want to upgrade to Flink
> > 1.6,
> >  > or supporting both. Is Beam's interaction with Flink such that we
> > can't
> >  > simply have separate targets linking the same Beam code against
> > one or
> >  > the other? (I.e. are code changes needed?) If so, we'll probably
> > need a
> >  > flink-runner-1.5 module, a flink-runner-1.6, and a
> > flink-runner-common
> >  > module. Or we hope that all users are happy with 1.5 until a
> certain
> >  > point in time when they all want to simultaneously jump to 1.6
> > and Beam
> >  > at the same time. Maybe that's enough in the short term, but
> > longer term
> >  > we need a more sustainable solution.
> >  >
> >  >
> >  > On Thu, Sep 13, 2018 at 7:13 AM Romain Manni-Bucau
> >  > mailto:rmannibu...@gmail.com>
> > 

Re: delayed emit (timer) in py-beam?

2018-07-30 Thread Austin Bennett
Fantastic; thanks, Charles!



On Mon, Jul 30, 2018 at 3:49 PM, Charles Chen  wrote:

> Hey Austin,
>
> This API is not yet implemented in the Python SDK.  I am working on this
> feature:  the next step from my end is to finish a reference implementation
> in the local DirectRunner.  As you note, the doc at
> https://s.apache.org/beam-python-user-state-and-timers describes the
> design.
>
> You can track progress on the mailing list thread here:
> https://lists.apache.org/thread.html/51ba1a00027ad8635bc1d2c0df805c
> e873995170c75d6a08dfe21997@%3Cdev.beam.apache.org%3E
>
> Best,
> Charles
>
> On Mon, Jul 30, 2018 at 3:34 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> What's going on with timers and python?
>>
>> Am looking at building a pipeline (assuming another group in my company
>> will grant access to the Kafka topic):
>>
>> Kafka -> beam -> have beam wait 24 hours -> do transform(s) and emit a
>> record.  If I read things correctly that's not currently possible in python
>> on beam.  What all is needed?  (trying to figure out whether that is
>> something that I am capable of and there is room for me to help with).
>> Looking for similar functionality to https://www.rabbitmq.com/blog/
>> 2015/04/16/scheduling-messages-with-rabbitmq/ (though don't need
>> alternate routing, nor is that example in python).
>>
>>
>> For example, I see:  https://beam.apache.org/blog/
>> 2017/08/28/timely-processing.html
>>
>> and tickets like:  https://issues.apache.org/jira/browse/BEAM-4594
>>
>>
>>


delayed emit (timer) in py-beam?

2018-07-30 Thread Austin Bennett
What's going on with timers and python?

Am looking at building a pipeline (assuming another group in my company
will grant access to the Kafka topic):

Kafka -> beam -> have beam wait 24 hours -> do transform(s) and emit a
record.  If I read things correctly that's not currently possible in python
on beam.  What all is needed?  (trying to figure out whether that is
something that I am capable of and there is room for me to help with).
Looking for similar functionality to https://www.rabbitmq.com/blog/
2015/04/16/scheduling-messages-with-rabbitmq/ (though don't need alternate
routing, nor is that example in python).


For example, I see:  https://beam.apache.org/blog/
2017/08/28/timely-processing.html

and tickets like:  https://issues.apache.org/jira/browse/BEAM-4594


Re: Cloud Next 2018 : Catch Up

2018-07-12 Thread Austin Bennett
Hi Guarav,

Yes, I'll be around there, happy to meet up.  You can follow up with me
directly once we get closer, or perhaps we'll figure out a larger thing if
more people chime in on this thread that they'll be around.

Sessions: didn't see many sessions on Beam/DataFlow at this conference.

Best,
Austin



On Thu, Jul 12, 2018 at 5:05 PM, Gaurav Thakur  wrote:

> Hi Everyone,
>
> Are any of us going to be at CloudNext 2018? Would you want to point out
> any interesting sessions?
>
> I think it would be a great idea to catch up in person if a few of us are
> going to be there?
>
> Excuse me, if this is not the right forum for something like this.
>
> Thanks, gaurav
>


Re: CSVSplitter - Splittable DoFn

2018-06-18 Thread Austin Bennett
Hi Beam Users/Dev,

How are people handling currently handling CSVs as input to Beam (or not
really doing so)?

I see the things listed at the start of this thread -- any others?

I have many batch workflows involve getting multi-GB CSV files from third
party data aggregators (ex: hourly) and ingesting.  Currently this goes to
S3/Redshift, and have written some spark so s3/Parquet.  It'd be great to
take the csv.gz and write to BigQuery.  Is Beam not up to the task yet (and
then should use something else and transform to newline json, Avro, parquet
on GS and run bq load from there)?  Is there much thought on development to
support/formalize these workflows?

Thanks for any additional info beyond what is already in this thread (and
thanks to Peter for prelim conversation),

Austin




On Wed, Apr 25, 2018 at 1:01 PM, Peter Brumblay 
wrote:

> This blog post was an excellent find. If I had infinite time I'd take a
> stab at implementing this. They basically outline an algorithm which
> *might* be appropriate for a generalized solution. It certainly beats my
> "try to parse 3 records and if you do pretend you're good" method.
>
> Peter
>
> On Tue, Apr 24, 2018 at 4:46 PM, Eugene Kirpichov 
> wrote:
>
>> Actually, you're right, this is not a pathological case. If we take a
>> regular 1TB-sized CSV file that actually doesn't have any quotes, and start
>> looking somewhere in the middle of it, there is no way to know whether
>> we're currently inside or outside quotes without scanning the whole file -
>> in theory there might be a quote lurking a few GB back. I suppose this can
>> be addressed with specifying limits on field sizes in bytes: e.g. with a
>> limit of 1kb, if there's no quotes in the preceding 1kb, then we're
>> definitely in an unquoted context. However, if there is a quote, it may be
>> either opening or closing the quoted context. There might be some way to
>> resolve the ambiguity, https://blog.etleap.com/2016/1
>> 1/27/distributed-csv-parsing/ seems to discuss this in detail.
>>
>> On Tue, Apr 24, 2018 at 3:26 PM Eugene Kirpichov 
>> wrote:
>>
>>> Robert - you're right, but this is a pathological case. It signals that
>>> there *might* be cases where we'll need to scan the whole file, however for
>>> practical purposes it's more important whether we need to scan the whole
>>> file in *all* (or most) cases - i.e. whether no amount of backward scanning
>>> of a non-pathological file can give us confidence that we're truly located
>>> a record boundary.
>>>
>>> On Tue, Apr 24, 2018 at 3:21 PM Robert Bradshaw 
>>> wrote:
>>>
 On Tue, Apr 24, 2018 at 3:18 PM Eugene Kirpichov 
 wrote:

 > I think the first question that has to be answered here is: Is it
 possible *at all* to implement parallel reading of RFC 4180?

 No. Consider a multi-record CSV file with no quotes. Placing a quote at
 the
 start and end gives a new CSV file with exactly one element.

 > I.e., given a start byte offset, is it possible to reliably locate the
 first record boundary at or after that offset while scanning only a
 small
 amount of data?
 > If it is possible, then that's what the SDF (or BoundedSource, etc.)
 should do - split into blind byte ranges, and use this algorithm to
 assign
 consistent meaning to byte ranges.

 > To answer your questions 2 and 3: think of it this way.
 > The SDF's ProcessElement takes an element and a restriction.
 > ProcessElement must make only one promise: that it will correctly
 perform
 exactly the work associated with this element and restriction.
 > The challenge is that the restriction can become smaller while
 ProcessElement runs - in which case, ProcessElement must also do fewer
 work. This can happen concurrently to ProcessElement running, so really
 the
 guarantee should be rephrased as "By the time ProcessElement completes,
 it
 should have performed exactly the work associated with the element and
 tracker.currentRestriction() at the moment of completion".

 > This is all that is asked of ProcessElement. If Beam decides to ask
 the
 tracker to split itself into two ranges (making the current one -
 "primary"
 - smaller, and producing an additional one - "residual"), Beam of course
 takes the responsibility for executing the residual restriction
 somewhere
 else: it won't be lost.

 > E.g. if ProcessElement was invoked with [a, b), but while it was
 invoked
 it was split into [a, b-100) and [b-100, b), then the current
 ProcessElement call must process [a, b-100), and Beam guarantees that it
 will fire up another ProcessElement call for [b-100, b) (Of course,
 both of
 these calls may end up being recursively split further).

 > I'm not quite sure what you mean by "recombining" - please let me
 know if
 the explanation above makes things clear enough or not.

 > On 

Beam Cookbook?

2018-06-07 Thread Austin Bennett
I'm looking at assembling a physical book along the lines of "Apache Beam
Cookbook", though might take a different approach to topic (if realize
there is a better hole to fill or something that needs more attention
before that).

I believe many could benefit from more substantive write-ups and
explanations on use-cases, and specific bits in code (ex: to accomplish x
you might want to use recipe Y, pay special attention to this function,
with associated paragraphs of text and noting specific lines in the code,
etc etc).  While this can be done on a website and GitHub, I do believe the
more concrete nature of a book (esp. with reputable publisher) gives
additional signaling to others that the subject is sufficiently mature.  My
aim will be for this book to be freely available at least in an e-book
version, an example of that I have is:
https://www.confluent.io/resources/kafka-the-definitive-guide/ and surely
you've come across other examples.

I see many cookbook examples of code already exist, but the associated
writeup I know could be useful to others; as well as the overall
presentation/bundling to make it even easier to find and use.

Wondering thoughts from the group and if there are others with a strong
interest in collaborating on such an undertaking.


Re: Initial contributor experience

2018-06-05 Thread Austin Bennett
I was thinking of that being a portion of the first meet up for:
https://www.meetup.com/San-Francisco-Apache-Beam/
so even if not as detailed, can go through this with many more people.

Was thinking explaining open source contribution generally is something
that would be great for lots of communities; bootcamps, universities, and
could do such an educational talk to share - with the applied case of
Apache Beam.  I've got some other venues in mind for this, and anyone else
could do similar.





On Tue, Jun 5, 2018 at 2:12 PM, Pablo Estrada  wrote:

> Thanks Austin for taking the time to go through this! We came out with a
> few JIRAs to improve the documentation (see doc), and hopefully we'll keep
> iterating on this.
>
> Hopefully we can get more experiences from other people that start to
> approach Beam.
>
> Best
> -P.
>
> On Tue, Jun 5, 2018 at 1:49 PM Griselda Cuevas  wrote:
>
>> +user@ in case someone has had similar experiences.
>>
>> Thanks for documenting this Austin & Pablo!
>>
>> If any other folks would like to participate in improving the "First
>> contribution experience" for Beam let us know in this thread.
>>
>> On Tue, 5 Jun 2018 at 13:40, Austin Bennett 
>> wrote:
>>
>>> Had been meaning to get setup to contribute for a bit - so walked
>>> through with Pablo, to try to point out assumptions, lacking docs, etc.
>>>
>>> Write-up found-->
>>>
>>> https://docs.google.com/document/d/1hq-s3L676LkMTftvhv0eCkdwrRnZmCRiL
>>> daQBWLHWWA/edit
>>>
>> --
> Got feedback? go/pabloem-feedback
>


Re: Regarding Beam Slack Channel

2018-01-08 Thread Austin Bennett
It'd be easier to follow along there

On Jan 8, 2018 9:32 PM, "Shashank Prabhakara"  wrote:

> I'd also like to be added, please.
>
> Thanks.
>
> On 2018-01-04 11:58, Jean-Baptiste Onofr�  wrote:
> > Hi,>
> >
> > you should have received the invite.>
> >
> > Welcome aboard !>
> >
> > Regards>
> > JB>
> >
> > On 01/04/2018 01:38 AM, David Sabater Dinter wrote:>
> > > Hi,>
> > > Can I join also, please?>
> > > >
> > > Thanks!>
> > > >
> > > On Thu, Jan 4, 2018 at 12:54 AM Maria Tjahjadi <
> maria.tjahj...@tokopedia.com >
> > > > wrote:>
> > > >
> > > Hi,>
> > > >
> > > Could I join also?>
> > > >
> > > Thanks,>
> > > Maria Tjahjadi>
> > > >
> > > On 4 Jan 2018, at 06.15, Lukasz Cwik 
> > > > wrote:>
> > > >
> > >> Invite sent, welcome.>
> > >>>
> > >> On Wed, Jan 3, 2018 at 3:01 PM, Carlos Alonso <
> car...@mrcalonso.com>
> > >> > wrote:>
> > >>>
> > >> Hi, I'd like to be added too, please!>
> > >>>
> > >> Thanks!>
> > >>>
> > >> On Tue, Dec 19, 2017 at 3:41 PM Jean-Baptiste Onofr� <
> j...@nanthrax.net>
> > >> > wrote:>
> > >>>
> > >> Done,>
> > >>>
> > >> you should have received an invite.>
> > >>>
> > >> Regards>
> > >> JB>
> > >>>
> > >> On 12/19/2017 03:20 PM, Unais T wrote:>
> > >> > Hello>
> > >> >>
> > >> > Can someone please add me to the Beam slack channel?>
> > >> >>
> > >> > Thanks.>
> > >> >>
> > >> >>
> > >>>
> > >> -->
> > >> Jean-Baptiste Onofr�>
> > >> jbono...@apache.org >
> > >> http://blog.nanthrax.net>
> > >> Talend - http://www.talend.com>
> > >>>
> > >>>
> >
> > -- >
> > Jean-Baptiste Onofr�>
> > jbono...@apache.org>
> > http://blog.nanthrax.net>
> > Talend - http://www.talend.com>
> >
>