Re: Brief of interactive Beam

2019-08-13 Thread Ahmet Altay
Ning, I believe Robert's questions from his email has not been answered yet. On Tue, Aug 13, 2019 at 5:00 PM Ning Kang wrote: > Hi all, I'll leave another 3 days for design > > review. > Then we

Re: Brief of interactive Beam

2019-08-13 Thread Ning Kang
Hi all, I'll leave another 3 days for design review. Then we can have a vote session if there is no objection. Thanks! On Fri, Aug 9, 2019 at 12:14 PM Ning Kang wrote: > Thanks Ahmet for the

Re: Hello :)

2019-08-13 Thread Pablo Estrada
Welcome! new examples are great! : D On Mon, Aug 12, 2019 at 12:13 PM Ahmet Altay wrote: > Done. Added your username as a contributor. You should be able to self > assign issues now. > > On Mon, Aug 12, 2019 at 12:12 PM Johan Hansson < > johan.eric.hans...@gmail.com> wrote: > >> Hi again, >> >>

Re: [Update] Beam 2.15 Release Progress

2019-08-13 Thread Yifan Zou
All blockers resolved and cherry-picks are done. Will start building the release candidates. On Wed, Aug 7, 2019 at 2:58 PM Yifan Zou wrote: > Thanks Udi. > > On Wed, Aug 7, 2019 at 2:58 PM Udi Meiri wrote: > >> https://github.com/apache/beam/pull/9240 has been merged >> >> On Wed, Aug 7, 2019

Re: Write-through-cache in State logic

2019-08-13 Thread Thomas Weise
The token would be needed in general to invalidate the cache when bundles are processed by different workers. In the case of the Flink runner we don't have a scenario of SDK worker surviving the runner in the case of a failure, so there is no possibility of inconsistent state as result of a

Re: Java serialization for coders and compatibility

2019-08-13 Thread Lukasz Cwik
Coders such as AvroCoder are translated to an intermediate JSON form called a CloudObject[1]. Dataflow only uses the serialized Java representation (embedded as bytes in ?base64? within the CloudObject) for coders which extend SerializableCoder[2]. Dataflow only cares that these CloudObject

Re: Write-through-cache in State logic

2019-08-13 Thread Maximilian Michels
Thanks for clarifying. Cache-invalidation for side inputs makes sense. In case the Runner fails to checkpoint, could it not re-attempt the checkpoint? At least in the case of Flink, the cache would still be valid until another checkpoint is attempted. For other Runners that may not be the case.

Java serialization for coders and compatibility

2019-08-13 Thread Gleb Kanterov
I'm looking into the code of AvroCoder, and I was wondering what happens when users upgrade Beam for streaming pipelines? As I understand it, we should be able to deserialize coder from previous Beam version. Looking into guava vendoring, it's going to break serialization when we are going to

Re: Python Beam pipelines on Flink on Kubernetes

2019-08-13 Thread Chad Dombrova
Hi Thomas, Nice work! It's really clearly presented. What's the current favored approach for pipeline submission? I'm also interested to know how this plan overlaps (if at all) with the work on Fine-Grained Resource Scheduling [1][2] that's being done for Flink 1.9+, which has implications for

Re: [VOTE] Support ZetaSQL as another SQL dialect for BeamSQL in Beam repo

2019-08-13 Thread Rui Wang
+1 Although it would be a long way to go, I also hope it can go into Calcite. -Rui On Tue, Aug 13, 2019 at 9:11 AM Lukasz Cwik wrote: > +1 > > On Tue, Aug 13, 2019 at 9:09 AM Andrew Pilloud > wrote: > >> +1 >> I also hope this can move to Calcite. >> >> On Tue, Aug 13, 2019 at 2:40 AM Gleb

Re: [VOTE] Support ZetaSQL as another SQL dialect for BeamSQL in Beam repo

2019-08-13 Thread Lukasz Cwik
+1 On Tue, Aug 13, 2019 at 9:09 AM Andrew Pilloud wrote: > +1 > I also hope this can move to Calcite. > > On Tue, Aug 13, 2019 at 2:40 AM Gleb Kanterov wrote: > >> +1 >> >> On Tue, Aug 13, 2019 at 10:47 AM Ismaël Mejía wrote: >> >>> +1 >>> Wishing that this goes to calcite too someday (hoping

Re: Write-through-cache in State logic

2019-08-13 Thread Lukasz Cwik
On Tue, Aug 13, 2019 at 4:36 AM Maximilian Michels wrote: > Agree that we have to be able to flush before a checkpoint to avoid > caching too many elements. Also good point about checkpoint costs > increasing with flushing the cache on checkpoints. A LRU cache policy in > the SDK seems

Re: Write-through-cache in State logic

2019-08-13 Thread Maximilian Michels
Agree that we have to be able to flush before a checkpoint to avoid caching too many elements. Also good point about checkpoint costs increasing with flushing the cache on checkpoints. A LRU cache policy in the SDK seems desirable. What is the role of the cache token in the design document[1]? It

Re: [FLINK-12653] and system state

2019-08-13 Thread Maximilian Michels
Sounds good. Might be worth commenting on the JIRA to get this prioritized in case it has not been fixed. -Max On 13.08.19 12:18, Jan Lukavský wrote: > Hi Max, > > comments inline. > > On 8/13/19 12:01 PM, Maximilian Michels wrote: > > Hi Jan, > > > > Just checking, do you see the same

Re: [FLINK-12653] and system state

2019-08-13 Thread Jan Lukavský
Hi Max, comments inline. On 8/13/19 12:01 PM, Maximilian Michels wrote: Hi Jan, Just checking, do you see the same rescaling problem as described in https://jira.apache.org/jira/browse/FLINK-12653 ? Yes. If so, you are most likely correct that this is due to the system state that you added

Re: [VOTE] Support ZetaSQL as another SQL dialect for BeamSQL in Beam repo

2019-08-13 Thread Gleb Kanterov
+1 On Tue, Aug 13, 2019 at 10:47 AM Ismaël Mejía wrote: > +1 > Wishing that this goes to calcite too someday (hoping that it makes > Beam side maintenance simpler) > > On Tue, Aug 13, 2019 at 6:18 AM Manu Zhang > wrote: > > > > +1 > > > > On Tue, Aug 13, 2019 at 11:55 AM Mingmin Xu wrote: >

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-08-13 Thread Ismaël Mejía
I stumbled recently into this specification for changelogs, maybe we can follow it, or at least use some of their sections for further blog posts about releases. https://keepachangelog.com/en/1.0.0/ On Mon, Aug 12, 2019 at 6:08 PM Anton Kedin wrote: > Concrete user feedback: >

Re: [VOTE] Support ZetaSQL as another SQL dialect for BeamSQL in Beam repo

2019-08-13 Thread Ismaël Mejía
+1 Wishing that this goes to calcite too someday (hoping that it makes Beam side maintenance simpler) On Tue, Aug 13, 2019 at 6:18 AM Manu Zhang wrote: > > +1 > > On Tue, Aug 13, 2019 at 11:55 AM Mingmin Xu wrote: >> >> +1 >> >> On Mon, Aug 12, 2019 at 8:53 PM Ryan McDowell >> wrote: >>> >>>

[DISCUSS] Multiple-triggering SQL Join with retractions support

2019-08-13 Thread Rui Wang
Hi Community, BeamSQL currently does not support unbounded-unbounded join with non-default trigger. It is because: - Discarding mode does not work for outer joins because of lacking of ability to retract pre-emitted values. You can think about an example in which a tuple of (left_row, null)