date:20190530

[VOTE] Release 2.13.0, release candidate #2

2019-05-30 Thread Ankur Goenka

Hi everyone,

Please review and vote on the release candidate #2 for the version 2.13.0,
as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
6356C1A9F089B0FA3DE8753688934A6699985948 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.13.0-RC2" [5],
* website pull request listing the release [6] and publishing the API
reference manual [7].
* Python artifacts are deployed along with the source release to the
dist.apache.org [2].
* Validation sheet with a tab for 2.13.0 release to help with validation
[8].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Ankur

[1]
https://jira.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12345166
[2] https://dist.apache.org/repos/dist/dev/beam/2.13.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1070/
[5] https://github.com/apache/beam/tree/v2.13.0-RC2
[6] https://github.com/apache/beam/pull/8645
[7] https://github.com/apache/beam-site/pull/589
[8]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1031196952

Re: DISCUSS: Sorted MapState API

2019-05-30 Thread Kenneth Knowles

On Tue, May 28, 2019 at 2:59 AM Robert Bradshaw  wrote:

> On Fri, May 24, 2019 at 6:57 PM Kenneth Knowles  wrote:
> >
> > On Fri, May 24, 2019 at 9:51 AM Kenneth Knowles  wrote:
> >>
> >> On Fri, May 24, 2019 at 8:14 AM Reuven Lax  wrote:
> >>>
> >>> Some great comments!
> >>>
> >>> Aljoscha: absolutely this would have to be implemented by runners to
> be efficient. We can of course provide a default (inefficient)
> implementation, but ideally runners would provide better ones.
> >>>
> >>> Jan Exactly. I think MapState can be dropped or backed by this. E.g.
> >>>
> >>> Robert Great point about standard coders not satisfying this. That's
> why I suggested that we provide a way to tag the coders that do preserve
> order, and only accept those as key coders Alternatively we could present a
> more limited API - e.g. only allowing a hard-coded set of types to be used
> as keys - but that seems counter to the direction Beam usually goes.
> >>
> >>
> >> I think we got it right with GroupByKey: the encoded form of a key is
> authoritative/portable.
> >>
> >> Instead of seeing the in-language type as the "real" value and the
> coder a way to serialize it, the portable encoded bytestring is the "real"
> value and the representation in a particular SDK is the responsibility of
> the SDK.
> >>
> >> This is very important, because many in-language representations,
> especially low-level representations, do not have the desired equality. For
> example, Java arrays. Coder.structuralValue(...) is required to have
> equality that matches the equality of the encoded form. It can be a noop if
> the in-language equality already matches. Or it can be a full encoding if
> there is not a more efficient option. I think we could add another method
> "lexicalValue" or add the requirement that structuralValue also sort
> equivalently to the wire format.
> >>
> >> Now, since many of our wire formats do not sort in the natural
> mathematical order, SDKs should help users avoid the pitfall of using
> these, as we do for GBK by checking "determinism" of the coder. Note that I
> am *not* referring to the order they would sort in any particular
> programming language implementation.
> >
> >
> > Another related features request. Today we do this:
> >
> > (1) infer any coder for a type
> > (2) crash if it is not suitable for GBK
> >
> > I have proposed in the past that instead we:
> >
> > (1) notice that a PCollection is input to a GBK
> > (2) infer an appropriate coder for a type that will work with GBK
> >
> > This generalizes to the idea of registering multiple coders for
> different purposes, particularly inferring a coder that has good lexical
> sorting.
> >
> > I don't recall that there was any objection, but neither I nor anyone
> else has gotten around to working on this. It does have the problem that it
> could change the coders and impact pipeline update. I suggest that update
> is fragile enough that we should develop pipeline options that allow opt-in
> to improvements without a major version bump.
>
> It also has the difficulty that coders cannot be inferred until one
> knows all their consumers (which breaks a transform being able to
> inspect or act on the coder(s) of its input(s).
>
> Possibly, if we move to something more generic, like specifying
> element types/schemas on collections and then doing
> whole-pipeline-analysis to infer all the coders, this would be
> solvable (but potentially backwards incompatible, and fraught with
> difficulties if users (aka transform authors) specify their own coders
> (e.g. when inference fails, or as part of inference code)).
>
> Another way to look at this is that there are certain transformations
> ("upgrades") one can do to coders when they need certain properties,
> which change the encoded form but preserve the types. Upgrading a
> coder C to its length-prefixed one is one such operation. Upgrading a
> coder to a deterministic version, or a natural-order-preserving one,
> are two other possible transformations (which should be idempotent,
> may be the identity, and may be an error).
>

You pose a good problem, and a reasonable solution. Since on most (all?)
runners this coder conversion will occur in the middle of a fused stage, it
should have near-zero cost. And if it has a standard URN + payload it could
even be optimized away entirely and/or inform optimization decisions.

Ideally, there's be less of transforms acting on the coders of their input.
Today I think it is mostly to push them around when coder inference fails.
In that case, I've hypothesized that we could do elementary constraint
solving like any type inferencer does, keeping unspecified coders as
"variables" until after construction. But that's a sizable change now that
we've gone down the current path, and not enough clear benefit I suppose.
If we started from scratch I'd start with that probably.

Kenn

Re: Support for PaneInfo in Python SDK

2019-05-30 Thread Pablo Estrada

Hi Tanay,
thanks for bringing this to the mailing list. I believe this is certainly
useful, and necessary. As an example, the fileio.WriteToFiles transform
does not work well without PaneInfo data (since we can't know how many
firings there are for each window, and we can't give names to files based
on this).

Best
-P.

On Thu, May 30, 2019 at 1:00 PM Tanay Tummalapalli 
wrote:

> Hi everyone,
>
> The PR linked in [BEAM-3759] - "Add support for PaneInfo descriptor in
> Python SDK"[1] was merged, but, the issue is still open.
> There might be some work left on this for full support for PaneInfo. Eg:
> Although the PaneInfo class exists, it is not accessible in a DoFn via a
> kwarg(PaneInfoParam) like TimestampParam or WindowParam.
>
> Please let me know the remaining work to be done on this issue as this may
> be needed in the near future.
>
> Regards
> Tanay Tummalapalli
>
> [1] https://issues.apache.org/jira/browse/BEAM-3759
>

Support for PaneInfo in Python SDK

2019-05-30 Thread Tanay Tummalapalli

Hi everyone,

The PR linked in [BEAM-3759] - "Add support for PaneInfo descriptor in
Python SDK"[1] was merged, but, the issue is still open.
There might be some work left on this for full support for PaneInfo. Eg:
Although the PaneInfo class exists, it is not accessible in a DoFn via a
kwarg(PaneInfoParam) like TimestampParam or WindowParam.

Please let me know the remaining work to be done on this issue as this may
be needed in the near future.

Regards
Tanay Tummalapalli

[1] https://issues.apache.org/jira/browse/BEAM-3759

Beam Summit volunteering team

2019-05-30 Thread Matthias Baetens

Hi everyone,

As you might know, the Beam Summit is currently organised by a small team
 of people committing their free time to make
this happen. We are looking to make this group larger in the future and
particularly could use some help on website maintenance, development and
design.

This is a call for all volunteers interested, please ping over your details
and what you would like to do - this can go for meetups as well.

Many thanks,
Matthias on behalf of the Beam Summit organising team

Re: Shuffling on apache beam

2019-05-30 Thread pasquale . bonito

Ideally my pipeline requires no shuffling, I just saw that introducing a 
windowing operation improves performance of BigTable insert.
I don't know how to measure time spent in PubSub. I took the time when I 
message is published, fill the timestamp metadata and than confront that value 
with the timestamp of the first DoFn after the message is read.
I can't find any way to measure time spent in Pub/Sub or in 
PubsubIO.readMessagesWithAttributes().
If you can suggest a way I'm happy to run my test measuring that time.
I understand that we can obtain 100-150ms but 1.5s seems too much. Do you have 
an idea of the minimum latency we can achieve?
I also saw that performance improve with time. On 3 messages I saw that 
average latency improve from 1988ms (first 1000) to 1465 (last 1000) with a 
minimum of 300 ms.

I understand that probably this tool is built for large number, but still I 
would like to verify if we can use a serverless architecture with low load and 
low latency.

On 2019/05/30 15:32:11, Reuven Lax  wrote: 
> Do you have any way of knowing how much of this time is being spent in
> Pub/Sub and how much in the Beam pipeline?
> 
> If you are using the Dataflow runner and doing any shuffling, 100-150ms is
> currently not attainable. Writes to shuffle are batched for up to 100ms at
> a time to keep operational costs down, and this parameter is not tunable.
> 
> Reuven
> 
> On Thu, May 30, 2019 at 5:54 AM pasquale.bon...@gmail.com <
> pasquale.bon...@gmail.com> wrote:
> 
> > I'm measuring latency as the difference  between the timestamp of the
> > column on BigTable and the one I associate to the message when I publish it
> > to the topic.
> > I also do intermediate measurement after the message is read from PubSub
> > topic and before inserting into BigTable.
> > All timestamps are written into BigTable and I have a procedure to read
> > and analyse  data.
> >
> > What surprise me more is than with such a smaller load and 4 worker it
> > requires 1s for the message to be processed (timestamp taken after the
> > message is read from the topic).
> >
> > We need to lower latency to 100-150 ms.
> >
> >
> >
> > On 2019/05/30 11:41:45, Reuven Lax  wrote:
> > > How are you measuring latency?
> > >
> > > On Thu, May 30, 2019 at 3:08 AM pasquale.bon...@gmail.com <
> > > pasquale.bon...@gmail.com> wrote:
> > >
> > > > This was my first option but I'm using google dataflow as runner and
> > it's
> > > > not clear if it supports stateful DoFn.
> > > > However my problem is latency, I've been trying different solution but
> > it
> > > > seems difficult to bring latency under 1s when consuming message (150/s
> > > > )from PubSub with beam/dataflow.
> > > > Is some benchmark or example available to understand if we can
> > effectively
> > > > achieve low latency pr we should look at different solutions?
> > > >
> > > >
> > > >
> > > > On 2019/05/29 17:12:37, Pablo Estrada  wrote:
> > > > > If you add a stateful DoFn to your pipeline, you'll force Beam to
> > shuffle
> > > > > data to their corresponding worker per key. I am not sure what is the
> > > > > latency cost of doing this (as the messages still need to be
> > shuffled).
> > > > But
> > > > > it may help you accomplish this without adding windowing+triggering.
> > > > >
> > > > > -P.
> > > > >
> > > > > On Wed, May 29, 2019 at 5:16 AM pasquale.bon...@gmail.com <
> > > > > pasquale.bon...@gmail.com> wrote:
> > > > >
> > > > > > Hi Reza,
> > > > > > with GlobalWindow with triggering I was able to reduce hotspot
> > issues
> > > > > > gaining satisfying performance for BigTable update. Unfortunately
> > > > latency
> > > > > > when getting messages from PubSub remains around 1.5s that it's too
> > > > much
> > > > > > considering our NFR.
> > > > > >
> > > > > > This is the code I use to get the messages:
> > > > > > PCollectionTuple rawTransactions = p //
> > > > > > .apply("GetMessages",
> > > > > >
> > > > > >
> > > >
> > PubsubIO.readMessagesWithAttributes().withIdAttribute(TRANSACTION_MESSAGE_ID_FIELD_NAME)
> > > > > >
> > > > > >
> > > >
> > .withTimestampAttribute(TRANSACTION_MESSAGE_TIMESTAMP_FIELD_NAME).fromTopic(topic))
> > > > > >
> > > >  .apply(Window.configure()
> > > > > > .triggering(Repeatedly
> > > > > >
> > > > > > .forever(AfterWatermark.pastEndOfWindow()
> > > > > > .withEarlyFirings(
> > > > > >
> >  AfterProcessingTime
> > > > > >
> > > > > > .pastFirstElementInPane()
> > > > > >
> > > > > > .plusDelayOf(Duration.millis(1)))
> > > > > > //
> > Fire on
> > > > any
> > > > > > late data
> > > > > >
> > > > > > .withLateFirings(AfterPane.elementCountAtLeast(1
> > > > > > .discardingFiredPanes())
> > > > > >
> > > > > > Messages are produced with a different dataflow:
> > > > > >  Pipeline p = Pipeline.create(options);
> > > > > >

Re: Shuffling on apache beam

2019-05-30 Thread Reuven Lax

Do you have any way of knowing how much of this time is being spent in
Pub/Sub and how much in the Beam pipeline?

If you are using the Dataflow runner and doing any shuffling, 100-150ms is
currently not attainable. Writes to shuffle are batched for up to 100ms at
a time to keep operational costs down, and this parameter is not tunable.

Reuven

On Thu, May 30, 2019 at 5:54 AM pasquale.bon...@gmail.com <
pasquale.bon...@gmail.com> wrote:

> I'm measuring latency as the difference  between the timestamp of the
> column on BigTable and the one I associate to the message when I publish it
> to the topic.
> I also do intermediate measurement after the message is read from PubSub
> topic and before inserting into BigTable.
> All timestamps are written into BigTable and I have a procedure to read
> and analyse  data.
>
> What surprise me more is than with such a smaller load and 4 worker it
> requires 1s for the message to be processed (timestamp taken after the
> message is read from the topic).
>
> We need to lower latency to 100-150 ms.
>
>
>
> On 2019/05/30 11:41:45, Reuven Lax  wrote:
> > How are you measuring latency?
> >
> > On Thu, May 30, 2019 at 3:08 AM pasquale.bon...@gmail.com <
> > pasquale.bon...@gmail.com> wrote:
> >
> > > This was my first option but I'm using google dataflow as runner and
> it's
> > > not clear if it supports stateful DoFn.
> > > However my problem is latency, I've been trying different solution but
> it
> > > seems difficult to bring latency under 1s when consuming message (150/s
> > > )from PubSub with beam/dataflow.
> > > Is some benchmark or example available to understand if we can
> effectively
> > > achieve low latency pr we should look at different solutions?
> > >
> > >
> > >
> > > On 2019/05/29 17:12:37, Pablo Estrada  wrote:
> > > > If you add a stateful DoFn to your pipeline, you'll force Beam to
> shuffle
> > > > data to their corresponding worker per key. I am not sure what is the
> > > > latency cost of doing this (as the messages still need to be
> shuffled).
> > > But
> > > > it may help you accomplish this without adding windowing+triggering.
> > > >
> > > > -P.
> > > >
> > > > On Wed, May 29, 2019 at 5:16 AM pasquale.bon...@gmail.com <
> > > > pasquale.bon...@gmail.com> wrote:
> > > >
> > > > > Hi Reza,
> > > > > with GlobalWindow with triggering I was able to reduce hotspot
> issues
> > > > > gaining satisfying performance for BigTable update. Unfortunately
> > > latency
> > > > > when getting messages from PubSub remains around 1.5s that it's too
> > > much
> > > > > considering our NFR.
> > > > >
> > > > > This is the code I use to get the messages:
> > > > > PCollectionTuple rawTransactions = p //
> > > > > .apply("GetMessages",
> > > > >
> > > > >
> > >
> PubsubIO.readMessagesWithAttributes().withIdAttribute(TRANSACTION_MESSAGE_ID_FIELD_NAME)
> > > > >
> > > > >
> > >
> .withTimestampAttribute(TRANSACTION_MESSAGE_TIMESTAMP_FIELD_NAME).fromTopic(topic))
> > > > >
> > >  .apply(Window.configure()
> > > > > .triggering(Repeatedly
> > > > >
> > > > > .forever(AfterWatermark.pastEndOfWindow()
> > > > > .withEarlyFirings(
> > > > >
>  AfterProcessingTime
> > > > >
> > > > > .pastFirstElementInPane()
> > > > >
> > > > > .plusDelayOf(Duration.millis(1)))
> > > > > //
> Fire on
> > > any
> > > > > late data
> > > > >
> > > > > .withLateFirings(AfterPane.elementCountAtLeast(1
> > > > > .discardingFiredPanes())
> > > > >
> > > > > Messages are produced with a different dataflow:
> > > > >  Pipeline p = Pipeline.create(options);
> > > > > p.apply(
> > > > > "ReadFile",
> > > > > TextIO.read()
> > > > > .from(options.getInputLocation() + "/*.csv")
> > > > > .watchForNewFiles(
> > > > > // Check for new files every 1 seconds
> > > > > Duration.millis(600),
> > > > > // Never stop checking for new files
> > > > > Watch.Growth.never()))
> > > > > .apply(
> > > > > "create message",
> > > > > ParDo.of(
> > > > > new DoFn() {
> > > > >   @ProcessElement
> > > > >   public void processElement(ProcessContext
> context) {
> > > > > String line = context.element();
> > > > >
> > > > > String payload = convertRow(line);
> > > > > long now =
> > > > >
> > >
> LocalDateTime.now().atZone(ZoneId.systemDefault()).toInstant().toEpochMilli();
> > > > > context.output(
> > > > > new PubsubMessage(
> > > > > payload.getBytes(),
> > > > > ImmutableMap.of(TRANSACTION_MESSAGE_ID_FIELD_NAME,
> > > > >

Re: Request to add me as a contributor.

2019-05-30 Thread Lukasz Cwik

Welcome, I have added you as a contributor and assigned BEAM-7442 to you.

On Wed, May 29, 2019 at 9:09 PM Akshay Iyangar  wrote:

> Hello everyone,
>
>
>
> My name is Akshay Iyangar, using beam repo extensively. There is a small
> patch that I would like to push through upstream.
> https://issues.apache.org/jira/browse/BEAM-7442 . I’m working on this
> issue and hope to become a contributor for Beam's JIRA issue tracker so
> that I can assign it to myself.
>
>
>
> My JIRA id is - aiyangar
>
>
>
> Thanks,
>
> Akshay Iyangar
>
>
>

Re: Definition of Unified model

2019-05-30 Thread Jan Lukavský

That's right, but is there a filesystem, that allows unbounded size of 
files? If there will always be an upper size limit, does that mean that 
you cannot use the order of elements in the file as is? You might need 
to transfer the offset from one file to another (that's how Kafka does 
it), but that implies that you don't use what natively gives you the 
batch storage, but you store the offset yourself (as metadata).

Either way, maybe the discussion is not that important, because the 
invariant requirement persists - there has to be a sequential observer 
of the data, that creates sequence of updates in the order the data was 
observed and persists this order. If you have two observers of data, 
each storing his own (even unbounded in size) file, then (if partition 
by key is not enforced) I'd say the ordering cannot be used.

This mechanism seems to me related to what limits parallelism in 
streaming sources and why batch sources are generally better parallelised.

Jan

On 5/30/19 1:35 PM, Reuven Lax wrote:
Files can grow (depending on the filesystem), and tailing growing 
files is a valid use case.

On Wed, May 29, 2019 at 3:23 PM Jan Lukavský > wrote:

 > Offsets within a file, unordered between files seems exactly
analogous with offsets within a partition, unordered between
partitions,
right?

Not exactly. The key difference is in that partitions in streaming
stores are defined (on purpose, and with key impact on this
discussion)
as unbounded sequence of appends. Files, on the other hand are
always of
finite size. This difference makes the semantics of offsets in
partitioned stream useful, because the are guaranteed to only
increase.
On batch stores as files, these offsets would have to start from zero
after some (finite) time, which makes them useless for comparison.

On 5/29/19 2:44 PM, Robert Bradshaw wrote:
> On Tue, May 28, 2019 at 12:18 PM Jan Lukavský mailto:je...@seznam.cz>> wrote:
>> As I understood it, Kenn was supporting the idea that sequence
metadata
>> is preferable over FIFO. I was trying to point out, that it
even should
>> provide the same functionally as FIFO, plus one important more -
>> reproducibility and ability to being persisted and reused the
same way
>> in batch and streaming.
>>
>> There is no doubt, that sequence metadata can be stored in every
>> storage. But, regarding some implicit ordering that sources
might have -
>> yes, of course, data written into HDFS or Cloud Storage has
ordering,
>> but only partial - inside some bulk (e.g. file) and the
ordering is not
>> defined correctly on boundaries of these bulks (between files).
That is
>> why I'd say, that ordering of sources is relevant only for
>> (partitioned!) streaming sources and generally always reduces to
>> sequence metadata (e.g. offsets).
> Offsets within a file, unordered between files seems exactly
analogous
> with offsets within a partition, unordered between partitions,
right?
>
>> On 5/28/19 11:43 AM, Robert Bradshaw wrote:
>>> Huge +1 to all Kenn said.
>>>
>>> Jan, batch sources can have orderings too, just like Kafka. I
think
>>> it's reasonable (for both batch and streaming) that if a
source has an
>>> ordering that is an important part of the data, it should preserve
>>> this ordering into the data itself (e.g. as sequence numbers,
offsets,
>>> etc.)
>>>
>>> On Fri, May 24, 2019 at 10:35 PM Kenneth Knowles
mailto:k...@apache.org>> wrote:
 I strongly prefer explicit sequence metadata over FIFO
requirements, because:

    - FIFO is complex to specify: for example Dataflow has
"per stage key-to-key" FIFO today, but it is not guaranteed to
remain so (plus "stage" is not a portable concept, nor even
guaranteed to remain a Dataflow concept)
    - complex specifications are by definition poor usability
(if necessary, then it is what it is)
    - overly restricts the runner, reduces parallelism, for
example any non-stateful ParDo has per-element parallelism, not
per "key"
    - another perspective on that: FIFO makes everyone pay
rather than just the transform that requires exactly sequencing
    - previous implementation details like reshuffles become
part of the model
    - I'm not even convinced the use cases involved are
addressed by some careful FIFO restrictions; many sinks re-key and
they would all have to become aware of how keying of a sequence of
"stages" affects the end-to-end FIFO

 A noop becoming a non-noop is essentially the mathematical
definition of moving from higher-level to lower-level abstraction.

 So this strikes at the core question of what level of
abstraction Beam aims to represent.

Re: Definition of Unified model

2019-05-30 Thread Reuven Lax

Files can grow (depending on the filesystem), and tailing growing files is
a valid use case.

On Wed, May 29, 2019 at 3:23 PM Jan Lukavský  wrote:

>  > Offsets within a file, unordered between files seems exactly
> analogous with offsets within a partition, unordered between partitions,
> right?
>
> Not exactly. The key difference is in that partitions in streaming
> stores are defined (on purpose, and with key impact on this discussion)
> as unbounded sequence of appends. Files, on the other hand are always of
> finite size. This difference makes the semantics of offsets in
> partitioned stream useful, because the are guaranteed to only increase.
> On batch stores as files, these offsets would have to start from zero
> after some (finite) time, which makes them useless for comparison.
>
> On 5/29/19 2:44 PM, Robert Bradshaw wrote:
> > On Tue, May 28, 2019 at 12:18 PM Jan Lukavský  wrote:
> >> As I understood it, Kenn was supporting the idea that sequence metadata
> >> is preferable over FIFO. I was trying to point out, that it even should
> >> provide the same functionally as FIFO, plus one important more -
> >> reproducibility and ability to being persisted and reused the same way
> >> in batch and streaming.
> >>
> >> There is no doubt, that sequence metadata can be stored in every
> >> storage. But, regarding some implicit ordering that sources might have -
> >> yes, of course, data written into HDFS or Cloud Storage has ordering,
> >> but only partial - inside some bulk (e.g. file) and the ordering is not
> >> defined correctly on boundaries of these bulks (between files). That is
> >> why I'd say, that ordering of sources is relevant only for
> >> (partitioned!) streaming sources and generally always reduces to
> >> sequence metadata (e.g. offsets).
> > Offsets within a file, unordered between files seems exactly analogous
> > with offsets within a partition, unordered between partitions, right?
> >
> >> On 5/28/19 11:43 AM, Robert Bradshaw wrote:
> >>> Huge +1 to all Kenn said.
> >>>
> >>> Jan, batch sources can have orderings too, just like Kafka. I think
> >>> it's reasonable (for both batch and streaming) that if a source has an
> >>> ordering that is an important part of the data, it should preserve
> >>> this ordering into the data itself (e.g. as sequence numbers, offsets,
> >>> etc.)
> >>>
> >>> On Fri, May 24, 2019 at 10:35 PM Kenneth Knowles 
> wrote:
>  I strongly prefer explicit sequence metadata over FIFO requirements,
> because:
> 
> - FIFO is complex to specify: for example Dataflow has "per stage
> key-to-key" FIFO today, but it is not guaranteed to remain so (plus "stage"
> is not a portable concept, nor even guaranteed to remain a Dataflow concept)
> - complex specifications are by definition poor usability (if
> necessary, then it is what it is)
> - overly restricts the runner, reduces parallelism, for example
> any non-stateful ParDo has per-element parallelism, not per "key"
> - another perspective on that: FIFO makes everyone pay rather than
> just the transform that requires exactly sequencing
> - previous implementation details like reshuffles become part of
> the model
> - I'm not even convinced the use cases involved are addressed by
> some careful FIFO restrictions; many sinks re-key and they would all have
> to become aware of how keying of a sequence of "stages" affects the
> end-to-end FIFO
> 
>  A noop becoming a non-noop is essentially the mathematical definition
> of moving from higher-level to lower-level abstraction.
> 
>  So this strikes at the core question of what level of abstraction
> Beam aims to represent. Lower-level means there are fewer possible
> implementations and it is more tied to the underlying architecture, and
> anything not near-exact match pays a huge penalty. Higher-level means there
> are more implementations possible with different tradeoffs, though they may
> all pay a minor penalty.
> 
>  I could be convinced to change my mind, but it needs some extensive
> design, examples, etc. I think it is probably about the most consequential
> design decision in the whole Beam model, around the same level as the
> decision to use ParDo and GBK as the primitives IMO.
> 
>  Kenn
>

Re: Timer support in Flink

2019-05-30 Thread Reza Rokni

:-)

https://issues.apache.org/jira/browse/BEAM-7456

On Thu, 30 May 2019 at 18:41, Alex Van Boxel  wrote:

> Oh... you can expand the matrix. Never saw that, this could indeed be
> better. So it isn't you.
>
>  _/
> _/ Alex Van Boxel
>
>
> On Thu, May 30, 2019 at 12:24 PM Reza Rokni  wrote:
>
>> PS, until it was just pointed out to me by Max, I had missed the (expand
>> details) clickable link in the capability matrix.
>>
>> Probably just me, but do others think it's also easy to miss? If yes I
>> will raise a Jira for it
>>
>> On Wed, 29 May 2019 at 19:52, Reza Rokni  wrote:
>>
>>> Thanx Max!
>>>
>>> Reza
>>>
>>> On Wed, 29 May 2019, 16:38 Maximilian Michels,  wrote:
>>>
 Hi Reza,

 The detailed view of the capability matrix states: "The Flink Runner
 supports timers in non-merging windows."

 That is still the case. Other than that, timers should be working fine.

 > It makes very heavy use of Event.Time timers and has to do some
 manual DoFn cache work to get around some O(heavy) issues.

 If you are running on Flink 1.5, timer deletion suffers from O(n)
 complexity which has been fixed in newer versions.

 Cheers,
 Max

 On 29.05.19 03:27, Reza Rokni wrote:
 > Hi Flink experts,
 >
 > I am getting ready to push a PR around a utility class for
 timeseries join
 >
 > left.timestamp match to closest right.timestamp where right.timestamp
 <=
 > left.timestamp.
 >
 > It makes very heavy use of Event.Time timers and has to do some
 manual
 > DoFn cache work to get around some O(heavy) issues. Wanted to test
 > things against Flink: In the capability matrix we have "~" for Timer
 > support in Flink:
 >
 > https://beam.apache.org/documentation/runners/capability-matrix/
 >
 > Is that page outdated, if not what are the areas that still need to
 be
 > addressed please?
 >
 > Cheers
 >
 > Reza
 >
 >
 > --
 >
 > This email may be confidential and privileged. If you received this
 > communication by mistake, please don't forward it to anyone else,
 please
 > erase all copies and attachments, and please let me know that it has
 > gone to the wrong person.
 >
 > The above terms reflect a potential business arrangement, are
 provided
 > solely as a basis for further discussion, and are not intended to be
 and
 > do not constitute a legally binding obligation. No legally binding
 > obligations will be created, implied, or inferred until an agreement
 in
 > final form is executed in writing by all parties involved.
 >

>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: Timer support in Flink

2019-05-30 Thread Alex Van Boxel

Oh... you can expand the matrix. Never saw that, this could indeed be
better. So it isn't you.

 _/
_/ Alex Van Boxel


On Thu, May 30, 2019 at 12:24 PM Reza Rokni  wrote:

> PS, until it was just pointed out to me by Max, I had missed the (expand
> details) clickable link in the capability matrix.
>
> Probably just me, but do others think it's also easy to miss? If yes I
> will raise a Jira for it
>
> On Wed, 29 May 2019 at 19:52, Reza Rokni  wrote:
>
>> Thanx Max!
>>
>> Reza
>>
>> On Wed, 29 May 2019, 16:38 Maximilian Michels,  wrote:
>>
>>> Hi Reza,
>>>
>>> The detailed view of the capability matrix states: "The Flink Runner
>>> supports timers in non-merging windows."
>>>
>>> That is still the case. Other than that, timers should be working fine.
>>>
>>> > It makes very heavy use of Event.Time timers and has to do some manual
>>> DoFn cache work to get around some O(heavy) issues.
>>>
>>> If you are running on Flink 1.5, timer deletion suffers from O(n)
>>> complexity which has been fixed in newer versions.
>>>
>>> Cheers,
>>> Max
>>>
>>> On 29.05.19 03:27, Reza Rokni wrote:
>>> > Hi Flink experts,
>>> >
>>> > I am getting ready to push a PR around a utility class for
>>> timeseries join
>>> >
>>> > left.timestamp match to closest right.timestamp where right.timestamp
>>> <=
>>> > left.timestamp.
>>> >
>>> > It makes very heavy use of Event.Time timers and has to do some manual
>>> > DoFn cache work to get around some O(heavy) issues. Wanted to test
>>> > things against Flink: In the capability matrix we have "~" for Timer
>>> > support in Flink:
>>> >
>>> > https://beam.apache.org/documentation/runners/capability-matrix/
>>> >
>>> > Is that page outdated, if not what are the areas that still need to be
>>> > addressed please?
>>> >
>>> > Cheers
>>> >
>>> > Reza
>>> >
>>> >
>>> > --
>>> >
>>> > This email may be confidential and privileged. If you received this
>>> > communication by mistake, please don't forward it to anyone else,
>>> please
>>> > erase all copies and attachments, and please let me know that it has
>>> > gone to the wrong person.
>>> >
>>> > The above terms reflect a potential business arrangement, are provided
>>> > solely as a basis for further discussion, and are not intended to be
>>> and
>>> > do not constitute a legally binding obligation. No legally binding
>>> > obligations will be created, implied, or inferred until an agreement
>>> in
>>> > final form is executed in writing by all parties involved.
>>> >
>>>
>>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>

Re: Shuffling on apache beam

2019-05-30 Thread Reza Rokni

Hi,

Would you mind sharing your latency requirements? For example is it < 1 sec
at XX percentile?

With regards to Stateful DoFn with a few exceptions it is supported :
https://beam.apache.org/documentation/runners/capability-matrix/#cap-full-what

Cheers

Reza




On Thu, 30 May 2019 at 18:08, pasquale.bon...@gmail.com <
pasquale.bon...@gmail.com> wrote:

> This was my first option but I'm using google dataflow as runner and it's
> not clear if it supports stateful DoFn.
> However my problem is latency, I've been trying different solution but it
> seems difficult to bring latency under 1s when consuming message (150/s
> )from PubSub with beam/dataflow.
> Is some benchmark or example available to understand if we can effectively
> achieve low latency pr we should look at different solutions?
>
>
>
> On 2019/05/29 17:12:37, Pablo Estrada  wrote:
> > If you add a stateful DoFn to your pipeline, you'll force Beam to shuffle
> > data to their corresponding worker per key. I am not sure what is the
> > latency cost of doing this (as the messages still need to be shuffled).
> But
> > it may help you accomplish this without adding windowing+triggering.
> >
> > -P.
> >
> > On Wed, May 29, 2019 at 5:16 AM pasquale.bon...@gmail.com <
> > pasquale.bon...@gmail.com> wrote:
> >
> > > Hi Reza,
> > > with GlobalWindow with triggering I was able to reduce hotspot issues
> > > gaining satisfying performance for BigTable update. Unfortunately
> latency
> > > when getting messages from PubSub remains around 1.5s that it's too
> much
> > > considering our NFR.
> > >
> > > This is the code I use to get the messages:
> > > PCollectionTuple rawTransactions = p //
> > > .apply("GetMessages",
> > >
> > >
> PubsubIO.readMessagesWithAttributes().withIdAttribute(TRANSACTION_MESSAGE_ID_FIELD_NAME)
> > >
> > >
> .withTimestampAttribute(TRANSACTION_MESSAGE_TIMESTAMP_FIELD_NAME).fromTopic(topic))
> > >
>  .apply(Window.configure()
> > > .triggering(Repeatedly
> > >
> > > .forever(AfterWatermark.pastEndOfWindow()
> > > .withEarlyFirings(
> > > AfterProcessingTime
> > >
> > > .pastFirstElementInPane()
> > >
> > > .plusDelayOf(Duration.millis(1)))
> > > // Fire on
> any
> > > late data
> > >
> > > .withLateFirings(AfterPane.elementCountAtLeast(1
> > > .discardingFiredPanes())
> > >
> > > Messages are produced with a different dataflow:
> > >  Pipeline p = Pipeline.create(options);
> > > p.apply(
> > > "ReadFile",
> > > TextIO.read()
> > > .from(options.getInputLocation() + "/*.csv")
> > > .watchForNewFiles(
> > > // Check for new files every 1 seconds
> > > Duration.millis(600),
> > > // Never stop checking for new files
> > > Watch.Growth.never()))
> > > .apply(
> > > "create message",
> > > ParDo.of(
> > > new DoFn() {
> > >   @ProcessElement
> > >   public void processElement(ProcessContext context) {
> > > String line = context.element();
> > >
> > > String payload = convertRow(line);
> > > long now =
> > >
> LocalDateTime.now().atZone(ZoneId.systemDefault()).toInstant().toEpochMilli();
> > > context.output(
> > > new PubsubMessage(
> > > payload.getBytes(),
> > > ImmutableMap.of(TRANSACTION_MESSAGE_ID_FIELD_NAME,
> > > payload.split(",")[6],TRANSACTION_MESSAGE_TIMESTAMP_FIELD_NAME,
> > > Long.toString(now;
> > >   }
> > > }))
> > > .apply("publish message", PubsubIO.writeMessages().to(topic));
> > >
> > > I'm uploading a file containing 100 rows every 600 ms.
> > >
> > > I found different threads on satckoverflow around this latency issue,
> but
> > > none has a solution.
> > >
> > >
> > >
> > >
> > > On 2019/05/24 07:19:02, Reza Rokni  wrote:
> > > > PS You can also make use of the GlobalWindow with a stateful DoFn.
> > > >
> > > > On Fri, 24 May 2019 at 15:13, Reza Rokni  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Have you explored the use of triggers with your use case?
> > > > >
> > > > >
> > > > >
> > >
> https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/transforms/windowing/Trigger.html
> > > > >
> > > > > Cheers
> > > > >
> > > > > Reza
> > > > >
> > > > > On Fri, 24 May 2019 at 14:14, pasquale.bon...@gmail.com <
> > > > > pasquale.bon...@gmail.com> wrote:
> > > > >
> > > > >> Hi Reuven,
> > > > >> I would like to know if is possible to guarantee that record are
> > > > >> processed by the same thread/task based on a key, as probably
> happens
> >

Re: Timer support in Flink

2019-05-30 Thread Reza Rokni

PS, until it was just pointed out to me by Max, I had missed the (expand
details) clickable link in the capability matrix.

Probably just me, but do others think it's also easy to miss? If yes I will
raise a Jira for it

On Wed, 29 May 2019 at 19:52, Reza Rokni  wrote:

> Thanx Max!
>
> Reza
>
> On Wed, 29 May 2019, 16:38 Maximilian Michels,  wrote:
>
>> Hi Reza,
>>
>> The detailed view of the capability matrix states: "The Flink Runner
>> supports timers in non-merging windows."
>>
>> That is still the case. Other than that, timers should be working fine.
>>
>> > It makes very heavy use of Event.Time timers and has to do some manual
>> DoFn cache work to get around some O(heavy) issues.
>>
>> If you are running on Flink 1.5, timer deletion suffers from O(n)
>> complexity which has been fixed in newer versions.
>>
>> Cheers,
>> Max
>>
>> On 29.05.19 03:27, Reza Rokni wrote:
>> > Hi Flink experts,
>> >
>> > I am getting ready to push a PR around a utility class for
>> timeseries join
>> >
>> > left.timestamp match to closest right.timestamp where right.timestamp
>> <=
>> > left.timestamp.
>> >
>> > It makes very heavy use of Event.Time timers and has to do some manual
>> > DoFn cache work to get around some O(heavy) issues. Wanted to test
>> > things against Flink: In the capability matrix we have "~" for Timer
>> > support in Flink:
>> >
>> > https://beam.apache.org/documentation/runners/capability-matrix/
>> >
>> > Is that page outdated, if not what are the areas that still need to be
>> > addressed please?
>> >
>> > Cheers
>> >
>> > Reza
>> >
>> >
>> > --
>> >
>> > This email may be confidential and privileged. If you received this
>> > communication by mistake, please don't forward it to anyone else,
>> please
>> > erase all copies and attachments, and please let me know that it has
>> > gone to the wrong person.
>> >
>> > The above terms reflect a potential business arrangement, are provided
>> > solely as a basis for further discussion, and are not intended to be
>> and
>> > do not constitute a legally binding obligation. No legally binding
>> > obligations will be created, implied, or inferred until an agreement in
>> > final form is executed in writing by all parties involved.
>> >
>>
>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: Shuffling on apache beam

2019-05-30 Thread pasquale . bonito

This was my first option but I'm using google dataflow as runner and it's not 
clear if it supports stateful DoFn.
However my problem is latency, I've been trying different solution but it seems 
difficult to bring latency under 1s when consuming message (150/s  )from PubSub 
with beam/dataflow.
Is some benchmark or example available to understand if we can effectively 
achieve low latency pr we should look at different solutions?



On 2019/05/29 17:12:37, Pablo Estrada  wrote: 
> If you add a stateful DoFn to your pipeline, you'll force Beam to shuffle
> data to their corresponding worker per key. I am not sure what is the
> latency cost of doing this (as the messages still need to be shuffled). But
> it may help you accomplish this without adding windowing+triggering.
> 
> -P.
> 
> On Wed, May 29, 2019 at 5:16 AM pasquale.bon...@gmail.com <
> pasquale.bon...@gmail.com> wrote:
> 
> > Hi Reza,
> > with GlobalWindow with triggering I was able to reduce hotspot issues
> > gaining satisfying performance for BigTable update. Unfortunately latency
> > when getting messages from PubSub remains around 1.5s that it's too much
> > considering our NFR.
> >
> > This is the code I use to get the messages:
> > PCollectionTuple rawTransactions = p //
> > .apply("GetMessages",
> >
> > PubsubIO.readMessagesWithAttributes().withIdAttribute(TRANSACTION_MESSAGE_ID_FIELD_NAME)
> >
> > .withTimestampAttribute(TRANSACTION_MESSAGE_TIMESTAMP_FIELD_NAME).fromTopic(topic))
> > .apply(Window.configure()
> > .triggering(Repeatedly
> >
> > .forever(AfterWatermark.pastEndOfWindow()
> > .withEarlyFirings(
> > AfterProcessingTime
> >
> > .pastFirstElementInPane()
> >
> > .plusDelayOf(Duration.millis(1)))
> > // Fire on any
> > late data
> >
> > .withLateFirings(AfterPane.elementCountAtLeast(1
> > .discardingFiredPanes())
> >
> > Messages are produced with a different dataflow:
> >  Pipeline p = Pipeline.create(options);
> > p.apply(
> > "ReadFile",
> > TextIO.read()
> > .from(options.getInputLocation() + "/*.csv")
> > .watchForNewFiles(
> > // Check for new files every 1 seconds
> > Duration.millis(600),
> > // Never stop checking for new files
> > Watch.Growth.never()))
> > .apply(
> > "create message",
> > ParDo.of(
> > new DoFn() {
> >   @ProcessElement
> >   public void processElement(ProcessContext context) {
> > String line = context.element();
> >
> > String payload = convertRow(line);
> > long now =
> > LocalDateTime.now().atZone(ZoneId.systemDefault()).toInstant().toEpochMilli();
> > context.output(
> > new PubsubMessage(
> > payload.getBytes(),
> > ImmutableMap.of(TRANSACTION_MESSAGE_ID_FIELD_NAME,
> > payload.split(",")[6],TRANSACTION_MESSAGE_TIMESTAMP_FIELD_NAME,
> > Long.toString(now;
> >   }
> > }))
> > .apply("publish message", PubsubIO.writeMessages().to(topic));
> >
> > I'm uploading a file containing 100 rows every 600 ms.
> >
> > I found different threads on satckoverflow around this latency issue, but
> > none has a solution.
> >
> >
> >
> >
> > On 2019/05/24 07:19:02, Reza Rokni  wrote:
> > > PS You can also make use of the GlobalWindow with a stateful DoFn.
> > >
> > > On Fri, 24 May 2019 at 15:13, Reza Rokni  wrote:
> > >
> > > > Hi,
> > > >
> > > > Have you explored the use of triggers with your use case?
> > > >
> > > >
> > > >
> > https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/transforms/windowing/Trigger.html
> > > >
> > > > Cheers
> > > >
> > > > Reza
> > > >
> > > > On Fri, 24 May 2019 at 14:14, pasquale.bon...@gmail.com <
> > > > pasquale.bon...@gmail.com> wrote:
> > > >
> > > >> Hi Reuven,
> > > >> I would like to know if is possible to guarantee that record are
> > > >> processed by the same thread/task based on a key, as probably happens
> > in a
> > > >> combine/stateful operation, without adding the delay of a windows.
> > > >> This could increase efficiency of caching and reduce same racing
> > > >> condition when writing data.
> > > >> I understand that workers are not part of programming model so I would
> > > >> like to know if it's possible to achieve this behaviour reducing at
> > minimum
> > > >> the delay of windowing. We don't need any combine or state we just
> > want the
> > > >> all record with a given key are sent to same thread,
> > > >>
> > > >> Thanks
> > > >>
> > > >>
> > > >>

Re: [DISCUSS] Autoformat python code with Black

2019-05-30 Thread Łukasz Gajowy

+1 for any autoformatter for Python SDK that does the job. My experience is
that since spotless in Java SDK I would never start a new Java project
without it. So many great benefits not only for one person coding but for
all community.

It is a GitHub UI issue that you cannot easily browse past the reformat. It
is not actually that hard, but does take a couple extra clicks to get
GitHub to display blame before a reformat. It is easier with the command
line. I do a lot of code history digging and the global Java reformat is
not really a problem.

It's actually one more click on Github but I agree it's not the best way to
search the history. The most convenient and clear one I've found so far is
in Jetbrains IDEs (Intelij) where you can:

right click on line number -> "annotate" -> click again -> "annotate
previous revision" -> ...

You can also use "compare with" to see the diff between two revisions.

Łukasz





czw., 30 maj 2019 o 06:15 Kenneth Knowles  napisał(a):

> +1 pending good enough tooling (I can't quite tell - seems there are some
> issues?)
>
> On Wed, May 29, 2019 at 2:40 PM Katarzyna Kucharczyk <
> ka.kucharc...@gmail.com> wrote:
>
>> What else actually we gain? My guess is faster PR review iteration. We
>> will skip some of conversations about code style.
>>
> ...
>
>> Last but not least, new contributor may be less discouraged. When I
>> started contribute I didn’t know how to format my code and I lost a lot of
>> time to add pylint and adjust IntelliJ. I eventually failed. Currently I
>> write code intuitively and when I don’t forget I rerun tox.
>>
>
> This is a huge benefit. This is why I supported it so much for Java. It is
> a community benefit. You do not have to be a contributor to the Python SDK
> to support this. That is why I am writing here. Just eliminate all
> discussion of formatting. It doesn't really matter what the resulting
> format is, if it is not crazy to read. I strongly oppose maintaining a
> non-default format.
>
> Reformating 20k lines or 200k is not hard. The Java global reformat
> touched 50k lines. It does not really matter how big it is. Definitely do
> it all at once if you think the tool is good enough. And you should pin a
> version, so churn is not a problem. You can upgrade the version and
> reformat in a PR later and that is also easy.
>
> It is a GitHub UI issue that you cannot easily browse past the reformat.
> It is not actually that hard, but does take a couple extra clicks to get
> GitHub to display blame before a reformat. It is easier with the command
> line. I do a lot of code history digging and the global Java reformat is
> not really a problem.
>
> Kenn
>
>
>
>> Also everything will be formatted in a same way, so eventually it would
>> be easier to read.
>>
>> Moreover, as it was mentioned in previous emails - a lot of Jenkins
>> failures won’t take place, so we save time and resources.
>>
>>
>> One of disadvantages is that our pipelines has custom syntax and after
>> formatting they looks a little bit weird, but maybe extending the only
>> configurable option in Black - lines, from 88 to 110 would be solution.
>>
>> Second one is that Black requires Python 3 to be run. I don’t know how
>> big obstacle it would be.
>>
>> I believe there are two options how it would be possible to introduce
>> Black. First: just do it, it will hurt but then it would be ok (same as a
>> dentist appointment). Of course it may require some work to adjust linters.
>> On the other hand we can do it gradually and start including sdk parts one
>> by one - maybe it will be less painful?
>>
>> As an example I can share one of projects [2] I know that uses Black
>> (they use also other cool checkers and pre-commit [3]). This is how looks
>> their build with all checks [4].
>>
>> To sum up I believe that if we want improve our coding experience, we
>> should improve our toolset. Black seems be recent and quite popular tool
>> what makes think they won’t stop developing it.
>>
>> [1]
>> https://stackoverflow.com/questions/4112410/git-change-styling-whitespace-without-changing-ownership-blame
>>
>> [2]  https://github.com/GoogleCloudPlatform/oozie-to-airflow
>>
>> [3] https://pre-commit.com
>>
>> [4]
>> https://travis-ci.org/GoogleCloudPlatform/oozie-to-airflow/builds/538725689
>>
>>
>> On Wed, May 29, 2019 at 2:01 PM Robert Bradshaw 
>> wrote:
>>
>>> Reformatting to 4 spaces seems a non-starter to me, as it would change
>>> nearly every single line in the codebase (and the loss of all context as
>>> well as that particular line).
>>>
>>> This is probably why the 2-space fork exists. However, we don't conform
>>> to that either--we use 2 spaces for indentation, but 4 for continuation
>>> indentation. (As for the history of this, this goes back to Google's
>>> internal style guide, probably motivated by consistency with C++, Java, ...
>>> and the fact that with an indent level of 4 one ends up wrapping lines
>>> quite frequently (it's telling that black's default line length

[RESULT] Release flink-shaded 7.0, release candidate

2019-05-30 Thread jincheng sun

Hi all,

I'm happy to announce that we have unanimously approved this release.

There are 6 approving votes, 3 of which are binding:
* Chesnay
* Timo
* Hequn
* Till
* Nico
* Jincheng

There are no disapproving votes.

Thanks, everyone!

Cheers,
Jincheng

[VOTE] Release 2.13.0, release candidate #2

Re: DISCUSS: Sorted MapState API

Re: Support for PaneInfo in Python SDK

Support for PaneInfo in Python SDK

Beam Summit volunteering team

Re: Shuffling on apache beam

Re: Shuffling on apache beam

Re: Request to add me as a contributor.

Re: Definition of Unified model

Re: Definition of Unified model

Re: Timer support in Flink

Re: Timer support in Flink

Re: Shuffling on apache beam

Re: Timer support in Flink

Re: Shuffling on apache beam

Re: [DISCUSS] Autoformat python code with Black

[RESULT] Release flink-shaded 7.0, release candidate

17 matches

Site Navigation

Mail list logo

Footer information