Re: Python Beam pipelines on Flink on Kubernetes

2019-07-24 Thread Pablo Estrada
I am very happy to see this. I'll take a look, and leave my comments. I think this is something we'd been needing, and it's great that you guys are putting thought into it. Thanks!<3 On Wed, Jul 24, 2019 at 9:01 PM Thomas Weise wrote: > Hi, > > Recently Lyft open sourced *FlinkK8sOperator,* a K

Python Beam pipelines on Flink on Kubernetes

2019-07-24 Thread Thomas Weise
Hi, Recently Lyft open sourced *FlinkK8sOperator,* a Kubernetes operator to manage Flink deployments on Kubernetes: https://github.com/lyft/flinkk8soperator/ We are now discussing how to extend this operator to also support deployment of Python Beam pipelines with the Flink runner. I would like

Re: Enhancement for Joining Unbounded PCollections of different WindowFns

2019-07-24 Thread Kenneth Knowles
I think the best way to approach this is probably to have an example SQL statement and to discuss what the relational semantics should be. Windowing is not really part of SQL (yet) and in a way it just needs very minimal extensions. See https://arxiv.org/abs/1905.12133. In this proposal for SQL, w

Re: [DISCUSS] Turn `WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition

2019-07-24 Thread Thomas Weise
Hi Jincheng, It is very exciting to see this follow-up, that you have done your research on the current state and that there is the intention to join forces on the portability effort! I have added a few pointers inline. Several of the issues you identified affect our usage of Beam as well. These

Re: contributor permission for Beam Jira tickets

2019-07-24 Thread Ahmet Altay
Welcome. I added ningk@ as a JIRA contributor. On Wed, Jul 24, 2019 at 4:19 PM Ning Kang wrote: > Bump this thread as a friendly ping! > > On Tue, Jul 16, 2019 at 5:08 PM Ning Kang wrote: > >> Hi, >> >> This is Ning Kang from Google. I'm working on the interactive beam >>

Re: contributor permission for Beam Jira tickets

2019-07-24 Thread Ning Kang
Bump this thread as a friendly ping! On Tue, Jul 16, 2019 at 5:08 PM Ning Kang wrote: > Hi, > > This is Ning Kang from Google. I'm working on the interactive beam > . > Could someone please add me as a contri

Re: Sort Merge Bucket - Action Items

2019-07-24 Thread Kenneth Knowles
>From the peanut gallery, keeping a separate implementation for SMB seems fine. Dependencies are serious liabilities for both upstream and downstream. It seems like the reuse angle is generating extra work, and potentially making already-complex implementations more complex, instead of helping thin

Re: [PROPOSAL] Revised streaming extensions for Beam SQL

2019-07-24 Thread Mingmin Xu
+1 to remove those magic words in Calcite streaming SQL, just because they're not SQL standard. The idea to replace HOP/TUMBLE with table-view-functions makes it concise, my only question is, is it(or will it be) part of SQL standard? --I'm a big fan to align with standards :lol Ps, although the c

Re: [Discuss] Retractions in Beam

2019-07-24 Thread Rui Wang
Hello! In case you are not aware of, I have added a modified streaming wordcount example at the end of the doc to illustrate retractions. -Rui On Wed, Jul 10, 2019 at 10:58 AM Rui Wang wrote: > Hi Community, > > Retractions is a part of core Beam model [1]. I come up with a doc to > discuss r

Re: How to expose/use the External transform on Java SDK

2019-07-24 Thread Heejong Lee
I think it depends how we define "the core" part of the SDK. If we define the core as only the (abstract) data types which describe BEAM pipeline model then it would be more sensible to put external transform into a separate extension module (option 4). Otherwise, option 1 makes sense. On Wed, Jul

Re: Sort Merge Bucket - Action Items

2019-07-24 Thread Neville Li
I spoke too soon. Turns out for unsharded writes, numShards can't be determined until the last finalize transform, which is again different from the current SMB proposal (static number of buckets & shards). I'll end up with more code specialized for SMB in order to generalize existing sink code, wh

Re: How to expose/use the External transform on Java SDK

2019-07-24 Thread Chamikara Jayalath
The idea of 'ExternalTransform' is to allow users to use transforms in SDK X from SDK Y. I think this should be a core part of each SDK and corresponding external transforms ([a] for Java, [b] for Python) should be released with each SDK. This will also allow us to add core external transforms to s

Talk Beam Go at GopherCon?

2019-07-24 Thread Robert Burke
If anyone wants to talk about the Apache Beam Go SDK, I'm at GopherCon in San Diego this week. Please say hi to the moustached, blue haired gentleman (me). There's no official Beam content on the program, but why should that stop us? Robert Burke (@lostluck on Twitter)

Re: How to expose/use the External transform on Java SDK

2019-07-24 Thread Robert Burke
Ideas inline. On Wed, Jul 24, 2019, 9:56 AM Ismaël Mejía wrote: > After Beam Summit EU I was curious about the External transform. I was > interested on the scenario of using it to call python code in the > middle of a Java pipeline. This is a potentially useful scenario for > example to evaluat

Re: [BEAM-7755] fixing how repeated records are handled

2019-07-24 Thread Rui Wang
Thanks for you contribution. Left a comment on the PR. -Rui On Wed, Jul 24, 2019 at 8:56 AM wrote: > Hello, > > I opened the ticket BEAM-7755, and also suggested a fix at > https://github.com/apache/beam/pull/9089. > I would like feedback or thoughts on this issue as we are looking to use > Be

How to expose/use the External transform on Java SDK

2019-07-24 Thread Ismaël Mejía
After Beam Summit EU I was curious about the External transform. I was interested on the scenario of using it to call python code in the middle of a Java pipeline. This is a potentially useful scenario for example to evaluate models from python ML frameworks on Java pipelines. In my example I did a

[BEAM-7755] fixing how repeated records are handled

2019-07-24 Thread sahith . reddy
Hello, I opened the ticket BEAM-7755, and also suggested a fix at https://github.com/apache/beam/pull/9089. I would like feedback or thoughts on this issue as we are looking to use BeamSQL and are testing it out on some use cases we have. This issue has been one of the bigger we have encounter

Re: Choosing a coder for a class that contains a Row?

2019-07-24 Thread Ryan Skraba
I'm also really interested in the question of evolving schemas... It's something I've also put off figuring out :D With all its warts, the LazyAvroCoder technique (a coder backed by some sort of schema registry) _could_ work with "homogeneish" data (i.e. if the number of schemas in play for a sing

Re: [DISCUSS] Turn `WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition

2019-07-24 Thread jincheng sun
Hi all, Thanks Max and all of your kind words. :) Sorry for the late reply as I'm busy working on the Flink 1.9 release. For the next major release of Flink, we plan to add Python user defined functions(UDF, UDTF, UDAF) support in Flink and I have go over the Beam portability framework and think

Re: Jenkins nodes disconnected?

2019-07-24 Thread Łukasz Gajowy
Thanks Yifan! Best, Łukasz wt., 23 lip 2019 o 18:50 Yifan Zou napisał(a): > That was a known issue, BEAM-7650 > . Basically, the disk > was full. We should either fix this problem in the python precommit, or as > Udi suggested, having a cron job