Re: Should WindowFn have a mininal Duration?

2021-04-26 Thread Robert Bradshaw
I do think minimal window duration is a meaningful concept for WindowFns, but from the pragmatic perspective I would ask is it useful enough to require all implementers of WindowFn to specify it (given that a default value of 0 would not be very useful). On Mon, Apr 26, 2021 at 10:05 AM Jan

Re: [PROPOSAL] Preparing for Beam 2.30.0 release

2021-04-26 Thread Robert Bradshaw
Confirming that the cut date is 4/28/2021 (in two days), right? On Wed, Apr 21, 2021 at 4:41 PM Tomo Suzuki wrote: > > Thank you for the preparation! > > > a few responses that some high priority changes > > Would you be willing to share the items for visibility? There are several PRs in flight

Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-22 Thread Robert Bradshaw
d squash by default > in the upcoming squash bot even for single commit PRs? > Does squash-and-merge in that case preserve the commit as-is if there's only one? In that case, there'd be no issues of history. (I opted to not comment on 1-commit PRs to be less chatty.) > > On Thu, Apr 22, 2

Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-22 Thread Robert Bradshaw
even there there is high variance. I think the issue is "oh, I didn't think to squash vs. merge" rather than "who cares, I always press merge anyway" in which case a timely reminder will go a long way. Kenn > > [1] > https://lists.apache.org/thread.html/4a65fb0b66935c

Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-22 Thread Robert Bradshaw
er paid more attention (if not > yet) on these “non code” things before reviewing/merging a PR. > > [1] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py > > > On 22 Apr 2021, at 01:28, Robert Bradshaw wrote: > > I am also in the camp that it often makes sense

Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-21 Thread Robert Bradshaw
d and verified PRs. >> We could solve the unwanted commit issue if we have a policy to always >> "Squash and Merge" PRs with rare exceptions. >> >> I agree jira/PR titles could be better, I'm not sure what we can do about >> it aside from reminding comm

Re: Issues and PR names and descriptions (or should we change the contribution guide)

2021-04-21 Thread Robert Bradshaw
+1 to better descriptions for JIRA (and PRs). Thanks for bringing this up. For merging unwanted commits, can we automate a simple check (e.g. with github actions)? On Wed, Apr 21, 2021 at 8:00 AM Tomo Suzuki wrote: > BEAM-12173 is on me. I'm sorry about that. Re-reading committer guide > [1],

Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-20 Thread Robert Bradshaw
herrypicked >>>> to the release branch. >>>> >>>> Kenn >>>> >>>> On Mon, Apr 19, 2021 at 8:34 PM Kenneth Knowles >>>> wrote: >>>> >>>>> OK it sounds like I need to re-roll the artifacts in question. I don'

Re: Naming! Dataflow Worker/SDK "Harness" image flag

2021-04-19 Thread Robert Bradshaw
I commented on the doc, but I'm also in favor of dropping "harness." On Mon, Apr 19, 2021 at 3:10 PM Tyson Hamilton wrote: > I'm in favor of dropping "harness" and going with "sdk_container_image". I > don't feel like the word "harness" adds value or clarity. > > On Mon, Apr 19, 2021 at 11:34

Re: [VOTE] Release 2.29.0, release candidate #1

2021-04-19 Thread Robert Bradshaw
It looks like the wheels are also versioned "2.29.0.dev". Not sure if it's important, but the source tarball also seems to contain some release script changes that are not reflected in the github branch. On Mon, Apr 19, 2021 at 8:41 AM Kenneth Knowles wrote: > Thanks for the details, Valentyn

Runner capabilities

2021-04-09 Thread Robert Bradshaw
While working on BEAM-6597, I ran into the issue that not all runners support the non-deprecated metrics protocol. Additionally, there is no way for the runner to inform the worker whether it supports this protocol. I was thinking about adding a "runner_capabilities" field to ProvisionInfo [1]

Re: [PROPOSAL] Remove pylint format checks

2021-04-09 Thread Robert Bradshaw
I'd be happy with yapf + docformatter + isort, but I'd like to understand why yapf lets breakable lines go longer than 80 chars. On Fri, Apr 9, 2021 at 4:19 PM Brian Hulette wrote: > Currently we have two different format checks for the Python SDK. Most > format checks are handled by yapf,

Re: Long term support versions of Beam Java

2021-04-08 Thread Robert Bradshaw
Python (again a language) has a slower release cycle, fairly strict backwards compatibility stance (with the ability to opt-in before changes become the default) and clear ownership for maintenance of each minor version until end-of-life (so each could be considered an LTS release).

Re: Long term support versions of Beam Java

2021-04-06 Thread Robert Bradshaw
; with some "coder-version-registry". I suppose there > might have been a discussion about this in the past, does anyone know of > any conclusion? > > Jan > On 4/6/21 10:54 PM, Robert Bradshaw wrote: > > I do think there's value in having an LTS release, if there's suffici

Re: Long term support versions of Beam Java

2021-04-06 Thread Robert Bradshaw
I do think there's value in having an LTS release, if there's sufficient interest to fund it (specifically, figuring out who would be backporting fixes and cutting the new releases). On Mon, Apr 5, 2021 at 1:14 PM Elliotte Rusty Harold wrote: > Hi, > > I'd like to return to the discussion

Re: [DISCUSS] Include inherited members in Python API Docs?

2021-04-06 Thread Robert Bradshaw
Fn, etc. I'm not sure how realistic that is though. It >> would be nice if this argument worked the other way >> >> [1] https://theneuralbit.github.io/beam-site/pydoc/inherited-members >> [2] >> https://theneuralbit.github.io/beam-site/pydoc/inherited-members/apache_beam.transforms.

Re: [ANNOUNCE] New committer: Tomo Suzuki

2021-04-02 Thread Robert Bradshaw
Congratulations! On Fri, Apr 2, 2021 at 10:19 AM Chamikara Jayalath wrote: > Congrats Tomo! > > On Fri, Apr 2, 2021 at 9:54 AM Brian Hulette wrote: > >> Congratulations Tomo! Well deserved :) >> >> On Fri, Apr 2, 2021 at 9:51 AM Yichi Zhang wrote: >> >>> Congratulations! >>> >>> On Fri, Apr

Re: [DISCUSS] Jira isssue type: "Support"

2021-04-01 Thread Robert Bradshaw
I agree with the idea of adding a label to this, which can be attached to real bugs, etc. that are in specific components. I don't think it makes sense to try to offer user support though jira though--if it's a bug let's rephrase the question as such, otherwise we could post the question (and

Re: [DISCUSS] Include inherited members in Python API Docs?

2021-03-31 Thread Robert Bradshaw
+1 to an example. In particular, are these inherited members grouped in such a way that it makes it easy to ignore them once they get to "low" in the stack? If it can't be per-module, is there a "nice" set of ancestors to avoid (as it seems this option takes such an argument). On Wed, Mar 31,

Re: Python Dataframe API issue

2021-03-25 Thread Robert Bradshaw
1929, I >> removed it from the release blockers since there is a workaround (use a >> NamedTuple type), but it's probably worth cherrypicking the fix. >> >> On Thu, Mar 25, 2021 at 4:44 PM Robert Bradshaw >> wrote: >> >>> This could be https://issues.apac

Re: Python Dataframe API issue

2021-03-25 Thread Robert Bradshaw
This could be https://issues.apache.org/jira/browse/BEAM-11929 On Thu, Mar 25, 2021 at 4:26 PM Robert Bradshaw wrote: > This is definitely wrong. Looking into what's going on here, but this > seems severe enough to be a blocker for the next release. > > On Thu, Mar 25, 2021 at 3:39

Re: Python Dataframe API issue

2021-03-25 Thread Robert Bradshaw
This is definitely wrong. Looking into what's going on here, but this seems severe enough to be a blocker for the next release. On Thu, Mar 25, 2021 at 3:39 PM Xinyu Liu wrote: > Hi, folks, > > I am playing around with the Python Dataframe API, and seemly got an > schema issue when converting

Re: Write to multiple IOs in linear fashion

2021-03-25 Thread Robert Bradshaw
lectionTuple is extensible too, as one could add more (or better) outputs in the future without changing the signature. > > > >> Kenn >> >> On Wed, Mar 24, 2021 at 5:36 PM Robert Bradshaw >> wrote: >> >>> Yeah, the entire input is not alw

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Robert Bradshaw
gt; >> Am I reading this wrong? >> >> Kenn >> >> On Wed, Mar 24, 2021 at 4:35 PM Alex Amato wrote: >> >>> How about a PCollection containing every element which was successfully >>> written? >>> Basically the same things which were passed into it. >

Re: Write to multiple IOs in linear fashion

2021-03-24 Thread Robert Bradshaw
nd just add new > funtionality. Though, we need to follow the same pattern for user API and > maybe even naming for this feature across different IOs (like we have for > "readAll()” methods). > > > > I agree that we have to avoid returning PDone for such cases. > >

Re: Make KafkaIO usable from Dataflow Template?

2021-03-23 Thread Robert Bradshaw
I would encourage flex templates over further proliferation of ValueProviders. On Tue, Mar 23, 2021 at 12:42 PM Alexey Romanenko wrote: > I think you are right - now with SDF support in KafkaIO it should be > possible to determine the number of splits in the runtime and support > ValueProviders

Re: Tutorial - How to run a Beam pipeline with Flink on Kubernetes Natively

2021-03-22 Thread Robert Bradshaw
Thanks, that looks quite useful. A not on POJOs, Serializable has several disadvantages (e.g. non-deterministic coding for key grouping, less efficient serialization). You could look into making them compatible with Beam schemas. On Mon, Mar 22, 2021 at 1:11 PM Cristian Constantinescu wrote: >

Re: [VOTE] New Committer: Tomo Suzuki

2021-03-17 Thread Robert Bradshaw
+1 On Wed, Mar 17, 2021 at 5:48 PM Kenneth Knowles wrote: > Hi all, > > Please vote on the proposal for Tomo Suzuki to become a committer in the > Apache Beam project, as follows: > > [ ] +1, Approve the proposal for the candidate to become a committer > [ ] -1, Disapprove the proposal for the

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-03-17 Thread Robert Bradshaw
nt hourly output, I cannot trigger >> source with lower frequency. If I trigger source with hourly, but do not >> propagate this as fast as possible, I'm inevitably introducing additional >> latency (that's the definition of "not as fast as possible") in downstream >> p

Re: [VOTE] Vendored Dependencies Release gRPC 1.36.0 v0.1 RC1

2021-03-16 Thread Robert Bradshaw
+1 On Tue, Mar 16, 2021 at 4:00 PM Kenneth Knowles wrote: > Please review the release of the following artifacts that we vendor: > * beam-vendor-grpc-1_36_0 > > Hi everyone, > > Please review and vote on the release candidate #1 for the version 0.1, as > follows: > [ ] +1, Approve the release

Re: [DISCUSS] Drop support for Flink 1.8 and 1.9

2021-03-12 Thread Robert Bradshaw
Do we now support 1.8 through 1.12? Unless there are specific objections, makes sense to me. On Fri, Mar 12, 2021 at 8:29 AM Alexey Romanenko wrote: > +1 too but are there any potential objections for this? > > On 12 Mar 2021, at 11:21, David Morávek wrote: > > +1 > > D. > > On Thu, Mar 11,

Re: User-related questions in dev@ list

2021-03-10 Thread Robert Bradshaw
I am in the same boat: I'm subscribed to both but the dev@ ones are more visible to me. That being said, I do think it's valuable to have segregated discussion (otherwise why have two lists), and "lurkers" on users@ can learn from questions answered there. I would say that it is on us to redirect

Re: Java Tests are failing on Github checks

2021-03-04 Thread Robert Bradshaw
I've noticed this sometimes for Python as well: Jenkins is happy with the exact same tests that Github checks fails on. On Thu, Mar 4, 2021 at 8:40 AM Alexey Romanenko wrote: > Hi, > > Does anyone know why some Java Tests, that run as Github checks, fail? For > example for this PR [1], this [2]

Re: DoFn @Setup with PipelineOptions

2021-03-01 Thread Robert Bradshaw
Any reason not to simply pass these parameters into the DoFn constructor? On Mon, Mar 1, 2021 at 3:42 PM Xinyu Liu wrote: > Hi, all, > > Currently the @Setup method signature in DoFn does not support any > arguments. This is a bit cumbersome to use for use cases such as creating a > db

Re: inconsistency found in DirectRunner API (arg should be _UnwindowedValues but is not)

2021-02-26 Thread Robert Bradshaw
Thanks. I've filed https://issues.apache.org/jira/browse/BEAM-11882 . If you want to take a stab at fixing it, you could try replacing the arguemnt passed to merge_accumulators at https://github.com/apache/beam/blob/release-2.28.0/sdks/python/apache_beam/transforms/combiners.py#L963 with a new

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-23 Thread Robert Bradshaw
agate down is > confusing (it's also a bit confusing for Windows, but with Windows the > propagation at least makes sense). The fact that users rarely have access > to the actual GBK operation means that allowing them to specify triggers on > their sinks is the best approach. > > On Mon

Re: Java/Python/Proto mismatch: MergeStatus.ALREADY_MERGED vs InvalidWindows

2021-02-22 Thread Robert Bradshaw
E. We still >>> remove ALREADY_MERGED. This would allow a later GBK to make no sense >>> because there's not likely to be any merging for the same reason. But >>> merging WindowFns don't have to work like sessions so they might merge >>> based on some other interesting cr

Re: Java/Python/Proto mismatch: MergeStatus.ALREADY_MERGED vs InvalidWindows

2021-02-22 Thread Robert Bradshaw
ially if there are very few user-authored > merging WindowFns out there (and I agree that this is probably true). > Choice (2) also has the benefit that it matches Python and that it is > trivial to implement. > > Kenn > > On Thu, Feb 18, 2021 at 3:18 PM Robert Bradshaw > wr

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-22 Thread Robert Bradshaw
d actually mean, that the >> correct place, where to specify triggering is not Window PTransform, but >> the GBK, i.e. >> >> input.apply(GroupByKey.create().triggering(...)) >> >> That would imply we simply have default trigger for all GBKs, unless >> explicitly cha

Re: [Vote] Publishing new website designs

2021-02-22 Thread Robert Bradshaw
>> wrote: >> >>> Are you referring to the list that starts with "dev@ archives and >>> statistics"? >>> >>> How about I add a "process" section at the end with that information? >>> >>> On Tue, Feb 16, 2021 at 5:46 PM

Re: [VOTE] Release 2.28.0, release candidate #1

2021-02-19 Thread Robert Bradshaw
+1 (binding) I validated the signatures and package contents, as well as running a small Python pipeline in a fresh virtual environment. On Fri, Feb 19, 2021 at 1:36 PM Chamikara Jayalath wrote: > +1 > > I ran the release candidate verification script and updated the > spreadsheet [1] > Also,

Re: Java/Python/Proto mismatch: MergeStatus.ALREADY_MERGED vs InvalidWindows

2021-02-18 Thread Robert Bradshaw
I think you're right about Python. I think it's fine for the SDK to prohibit (or require explicit user action) for ambiguous things like stacked sessions. This illegal state wouldn't generally need to be represented in proto (but maybe it'd be nice for quicker errors in cross language). On Thu,

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-18 Thread Robert Bradshaw
On Wed, Feb 17, 2021 at 1:56 PM Kenneth Knowles wrote: > > On Wed, Feb 17, 2021 at 1:06 PM Robert Bradshaw > wrote: > >> I would prefer to leave downstream triggering up to the runner (or, >> better, leave upstream triggering up to the runner, a la sink trigg

Re: Can we solve WindowFn.getOutputTime another way?

2021-02-17 Thread Robert Bradshaw
vior on Java on non-portable runners. At least until we can figure out what we really want and add it to the model. > > Kenn > > On Wed, Feb 17, 2021 at 11:16 AM Robert Bradshaw > wrote: > >> OK, so to move forward, shall we update the default Sessions to not do >> th

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-17 Thread Robert Bradshaw
stages. However continuation triggers silently switching to >> synchronized processing time has defeated that, and it wasn't clear to >> users why. >> >> On Wed, Feb 17, 2021 at 11:12 AM Robert Bradshaw >> wrote: >> >>> On Fri, Feb 12, 2021 at 9:0

Re: Can we solve WindowFn.getOutputTime another way?

2021-02-17 Thread Robert Bradshaw
less there is some special consideration why it doesn't matter. So > I wonder what happens when a pipeline has a few different joins. > > Kenn > > On Fri, Feb 12, 2021 at 12:37 AM Robert Bradshaw > wrote: > >> Yes, unless you manually set the timestamp combiner to e

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-17 Thread Robert Bradshaw
On Fri, Feb 12, 2021 at 9:09 AM Kenneth Knowles wrote: > > On Thu, Feb 11, 2021 at 9:38 PM Robert Bradshaw > wrote: > >> Of course the right answer is to just implement sink triggers and >> sidestep the question altogether :). >> >&

Re: [Vote] Publishing new website designs

2021-02-16 Thread Robert Bradshaw
e-pull-requests.storage.googleapis.com/13871/contribute/become-a-committer/index.html#what-are-the-traits-of-an-apache-beam-committer > (Note I don't have control over the preview staged at test-beam.surge.sh, > so that's now out of date for this page) > > +Kenneth Knowles +Robert Brads

Re: Can we solve WindowFn.getOutputTime another way?

2021-02-12 Thread Robert Bradshaw
to fix? > > On Fri, Feb 12, 2021 at 12:25 AM Robert Bradshaw > wrote: > >> The default timestamp combiner used to be earliest as well. >> >> On Fri, Feb 12, 2021 at 12:10 AM Reuven Lax wrote: >> >>> IIRC, this was introduced because at the time users complain

Re: Can we solve WindowFn.getOutputTime another way?

2021-02-12 Thread Robert Bradshaw
e allowed customizing the timestamp combiner, so > maybe this is less of a problem now? > > On Thu, Feb 11, 2021 at 10:53 PM Robert Bradshaw > wrote: > >> On Wed, Feb 10, 2021 at 8:03 PM Kenneth Knowles wrote: >> >>> >>> >>> On Wed, Feb 10, 2021

Re: Can we solve WindowFn.getOutputTime another way?

2021-02-11 Thread Robert Bradshaw
On Wed, Feb 10, 2021 at 8:03 PM Kenneth Knowles wrote: > > > On Wed, Feb 10, 2021 at 2:24 PM Alex Amato wrote: > >> >> >> On Wed, Feb 10, 2021 at 12:14 PM Kenneth Knowles wrote: >> >>> On a PR (https://github.com/apache/beam/pull/13927) we got into a >>> discussion of a very old and strange

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-11 Thread Robert Bradshaw
Of course the right answer is to just implement sink triggers and sidestep the question altogether :). In the meantime, I think leaving AfterSynchronizedProcessingTime in the model makes the most sense, and runners can choose an implementation between firing eagerly and waiting some amount of

Re: [Proposal] Requesting PMC approval to start planning for Beam Summits 2021

2021-02-11 Thread Robert Bradshaw
Are there any substantive changes from what we did last year? On Thu, Feb 11, 2021 at 1:40 PM Brittany Hermann wrote: > Dear Project Management Committee, > > The Beam Summit is a community event funded by a group of sponsors and > organized by a steering committee formed by members of the Beam

Re: [Vote] Publishing new website designs

2021-02-10 Thread Robert Bradshaw
oving the details. Our bullets highlight specific aspects of the > code of conduct. Specifically, we split up one of the items into separate > bullet points, and we did not emphasize "be concise" or "step down > considerately". I am OK going ahead with this change

Re: [Vote] Publishing new website designs

2021-02-10 Thread Robert Bradshaw
+1 based on some basic browsing around. It would have been nice if the old and new site could have been generated from the same content, to keep the old one around for a bit (if needed, in that case we might have more confidence nothing was lost), but probably not worth making and staging a copy.

Re: Not running tests that need DirectRunner in core?

2021-02-05 Thread Robert Bradshaw
When we were using maven rather than gradle, it was impossible to make the direct runner a dependency for core tests without making the direct runner a dependency for core itself (which was cyclic). I would be super happy if this got fixed; having to jump through hoops to run the majority of core

Re: Environment options for external transforms

2021-02-04 Thread Robert Bradshaw
ble to stage files to SDK > containers, so it's something we should consider making into a general > feature, perhaps based on the artifact API. > +1 > > On Thu, Feb 4, 2021 at 3:52 PM Robert Bradshaw > wrote: > >> On Thu, Feb 4, 2021 at 3:33 PM Kyle Weaver wrote: >

Re: Environment options for external transforms

2021-02-04 Thread Robert Bradshaw
should have led with this. Someone wanted to mount credentials into the > SDK harness [1]. So in this particular case the user just wants to mount > files into their SDK harness, which is a pretty common use case, so > resource hints are probably a more appropriate solution. > > [1] > https://lists.apache.org/thre

Re: Environment options for external transforms

2021-02-04 Thread Robert Bradshaw
On Thu, Feb 4, 2021 at 12:38 PM Kyle Weaver wrote: > So, an external transform is uniquely identified by its URN. An external >> transform identified by a URN may refer to an arbitrary composite which may >> have sub-transforms that refer to different environments. I think with the >> above

Re: How to create an emty file using apche beam

2021-02-03 Thread Robert Bradshaw
You could write a DoFn that "consumes" the output of write as a side input and touches the file manually. E.g. write_result = ... | beam.io.WriteToText(...) p | beam.Create([None]) | beam.Map(lambda unused_none, unused_side: create_file(), unused_side=write_result) where create_file()

Re: Multiple architectures support on Beam (ARM)

2021-01-27 Thread Robert Bradshaw
n ARM64 > > > On Tue, Jan 26, 2021 at 8:48 PM Robert Burke wrote: > > > > I believe so. > > > > The Go SDK requires in most instances for a user to Register their DoFns > at package init time, linked to the type/functions fully qualified path as > detemined

Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Bradshaw
> set to amd64 linux at present, but that's a separate issue. > > [1] https://golangcookbook.com/chapters/running/cross-compiling/ > [2] https://golang.org/cmd/cgo/ > > On Tue, Jan 26, 2021, 10:25 AM Robert Bradshaw > wrote: > >> +1 >> >> I don't think it wou

Re: Multiple architectures support on Beam (ARM)

2021-01-26 Thread Robert Bradshaw
+1 I don't think it would be that hard to build and release arm-based docker images. (Perhaps just a matter of changing the docker file to depend on a different base, and doing some cross-compile. That would suss out whether we're inadvertently taking on any incompatible dependencies.)

Re: [ANNOUNCE] New committer: Piotr Szuberski

2021-01-22 Thread Robert Bradshaw
Thanks, Piotr. Well deserved. On Fri, Jan 22, 2021 at 9:22 AM Tyson Hamilton wrote: > Congrats Piotr! Well deserved. > > On Fri, Jan 22, 2021 at 9:18 AM Ismaël Mejía wrote: > >> Congratulations Piotr ! Thanks for all your work ! >> >> On Fri, Jan 22, 2021 at 5:33 PM Alexey Romanenko >> wrote:

Re: Making preview (sample) time consistent on Direct runner

2021-01-21 Thread Robert Bradshaw
is is it provides the runner the ability to say to these (potentially long-running usefns) "please stop gracefully as soon as you can." > > On Thu, Jan 21, 2021 at 8:34 PM Robert Bradshaw > wrote: > > > > I don't know that SDF vs. BoundedSources changes things

Re: [ANNOUNCE] New PMC Member: Chamikara Jayalath

2021-01-21 Thread Robert Bradshaw
Congratulations, Cham! On Thu, Jan 21, 2021 at 3:13 PM Brian Hulette wrote: > Great news, congratulations Cham! > > On Thu, Jan 21, 2021 at 3:08 PM Robin Qiu wrote: > >> Congratulations, Cham! >> >> On Thu, Jan 21, 2021 at 3:05 PM Tyson Hamilton >> wrote: >> >>> Woo! Congrats Cham! >>> >>> On

Re: Making preview (sample) time consistent on Direct runner

2021-01-21 Thread Robert Bradshaw
I don't know that SDF vs. BoundedSources changes things here--for both one can implement take(n) by running until one has N elements and then canceling the pipeline. One could have a more sophisticated First(n) operator that has a "back-edge" to checkpoint/splits the upstream operators once a

Re: Null checking in Beam

2021-01-20 Thread Robert Bradshaw
On Wed, Jan 20, 2021 at 1:48 PM Kenneth Knowles wrote: > Yes, completely sound nullability checking has been added to the project > via checkerframework, based on a large number of NPE bugs (1-5% depending > on how you search, but many other bugs likely attributable to > nullness-based design

Re: PTransform Annotations Proposal

2021-01-14 Thread Robert Bradshaw
bert Burke wrote: > > > Hmmm. Fair. I'm mostly concerned about the pathological case where we > end > > > up with a distinct Environment per transform, but there are likely > > > practical cases where that's reasonable (High mem to GPU to TPU, to > ARM) > &g

Re: Planning a freeze on website changes to merge new designs

2021-01-13 Thread Robert Bradshaw
; > On Wed, Jan 13, 2021 at 3:59 AM Brian Hulette wrote: > >> >> >> On Mon, Jan 11, 2021 at 11:00 AM Robert Bradshaw >> wrote: >> >>> On Mon, Jan 11, 2021 at 10:38 AM Brian Hulette >>> wrote: >>> >>>> I spoke with Gri

Re: Can composite transforms have zero subtransforms?

2021-01-12 Thread Robert Bradshaw
Yes, a PTansform can have no sub-transforms, as long as it only returns its inputs. Updating the docs would be a good idea. On Tue, Jan 12, 2021 at 1:04 PM Brian Hulette wrote: > A recent bug with SqlTransform on Dataflow Runner V2 [1] revealed an > interesting ambiguity in the Beam model: it's

Re: Planning a freeze on website changes to merge new designs

2021-01-11 Thread Robert Bradshaw
cognizing and resolving conflicts on the owners of the website-revamp branch. It might be worth highlighting an example of a content change that makes any of these workflows difficult. > [1] https://github.com/apache/beam/tree/website-revamp > > On Mon, Jan 11, 2021 at 10:03 AM R

Re: Planning a freeze on website changes to merge new designs

2021-01-11 Thread Robert Bradshaw
A site-wide freeze during which there was a huge, rushed code dump was not the most effective way to manage or review the large website changes last time, and I don't think it would be a good idea to attempt that again. Instead, can we create a parallel directory/site in our repo, incrementally

Re: [RESULT][VOTE] Release 2.27.0, release candidate #4

2021-01-08 Thread Robert Bradshaw
Somewhat belated, but another +1 (binding); the release artifacts and signatures all look good to me. Thanks for doing this release! On Fri, Jan 8, 2021 at 11:01 AM Pablo Estrada wrote: > Hi all, > I'm happy to announce that we have unanimously approved this release. > > There are 8 approving

Re: Standarizing the "Runner" concept across website content

2021-01-06 Thread Robert Bradshaw
+1 to keeping the distinction between Runner and Engine as Kenn described, and cleaning up the site with these in mind (I don't think the term engine is widely used yet). On Wed, Jan 6, 2021 at 11:15 AM Yichi Zhang wrote: > I agree with what kenn said, in most cases I would refer to the term >

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-21 Thread Robert Bradshaw
If readers are expensive to create, this seems like an important (and not too difficult) optimization. On Mon, Dec 21, 2020 at 11:04 AM Jan Lukavský wrote: > Hi Boyuan, > > I think your analysis is correct - with one exception. It should be > possible to reuse the reader if and only if the

Re: Combine with multiple outputs case Sample and the rest

2020-12-21 Thread Robert Bradshaw
There are two ways to emit multiple outputs: either to multiple distinct PCollections (e.g. withOutputTags) or multiple (including 0) outputs to a single PCollection (the difference between Map and FlatMap). In full generality, one can always have a CombineFn that outputs lists (say *) followed by

Re: Help measuring upcoming performance increase in flink runner on production systems

2020-12-21 Thread Robert Bradshaw
I agree. Borrowing the mutation detection from the direct runner as an intermediate point sounds like a good idea. On Mon, Dec 21, 2020 at 8:57 AM Kenneth Knowles wrote: > I really think we should make a plan to make this the default. If you test > with the DirectRunner it will do mutation

Re: Usability regression using SDF Unbounded Source wrapper + DirectRunner

2020-12-16 Thread Robert Bradshaw
If all it takes is bumping these numbers up a bit, that seems like a reasonable thing to do ASAP. (I would argue that perhaps they shouldn't be static, e.g. it might be preferable to start emitting results right away, but use larger batches for the steady state if there are performance benefits.)

Re: [VOTE] Release 2.26.0, release candidate #1

2020-12-10 Thread Robert Bradshaw
+1 (binding). I've verified the release artifacts and signatures, and validated some simple pipelines with a freshly installed wheel. On Thu, Dec 10, 2020 at 10:00 AM Brian Hulette wrote: > +1 (non-binding). Ran a Python pipeline using DataframeTransform on > DirectRunner and DataflowRunner

Re: Dynamic timers in python sdk.

2020-12-10 Thread Robert Bradshaw
ew is that there aren't two types of timers--"static" timers are just dynamic timers with a fixed (aka static) tag (that we provide for you as a convenience). On Thu, Dec 10, 2020 at 10:43 AM Robert Bradshaw > wrote: > >> Yep. >> >> A slight variant on this is to

Re: Dynamic timers in python sdk.

2020-12-10 Thread Robert Bradshaw
et(timestamp) -> timer.set(timestamp, dynamic_timer_tag=a_tag) > timer.clear() -> timer.clear(dynamic_timer_tag=a_tag) > > and have the default value of dynamic_timer_tag to be empty (the special > case) > > On Wed, Dec 9, 2020 at 5:12 PM Robert Bradshaw > wrote: > >> O

Re: Dynamic timers in python sdk.

2020-12-09 Thread Robert Bradshaw
On Wed, Dec 9, 2020 at 3:48 PM Kyle Weaver wrote: > Possibly a dumb question, but: if "the static timer is just a special case > of the dynamic timer," why do we need to use different classes at all? > I agree, I would argue that there is little if any value to the user to distinguish between

Re: [Proposal] Remove @Experimental from Splittable DoFn APIs

2020-12-03 Thread Robert Bradshaw
Makes sense to me. On Thu, Dec 3, 2020 at 12:29 PM Boyuan Zhang wrote: > > Hi folks, > > As we are reaching a stable state on Splittable DoFn APIs both in Java and > Python SDK, I'm proposing to remove Experimental annotations from these APIs. > > I have opened one PR[1] to do so. Please feel

Re: PTransform Annotations Proposal

2020-11-25 Thread Robert Bradshaw
good enough. We have a proposal, we > have clear boundaries for it. It's fine if the discussion continues, but I > see no evidence of concerns that should prevent starting an implementation, > because it seems we'll need both anyway. > > On Wed, Nov 25, 2020, 10:25 AM Robert Bra

Re: PTransform Annotations Proposal

2020-11-25 Thread Robert Bradshaw
I for one am ready to see a PR. +1 > On Mon, Nov 23, 2020, 6:02 PM Kenneth Knowles wrote: >> >> >> >> On Mon, Nov 23, 2020 at 3:04 PM Robert Bradshaw wrote: >>> >>> On Fri, Nov 20, 2020 at 11:08 AM Mirac Vuslat Basaran >>> wrote: >&

Re: PTransform Annotations Proposal

2020-11-23 Thread Robert Bradshaw
ds to be detailed. > > >> > > >> I am curious about how the end user APIs of this will look maybe in > > >> Java or Python, just an extra method to inject a Map or via Java > > >> annotations/Python decorators? > > >> > > >> We

Re: beam flink-runner distribution implementation

2020-11-19 Thread Robert Bradshaw
Guage certainly seems wrong for DistributionResult. Yes, using a Histogram would be a welcome PR. On Thu, Nov 19, 2020 at 12:58 PM Kyle Weaver wrote: > > What are the advantages of using a Histogram instead of a Gauge? > > Also, check out this design doc for adding histogram metrics to Beam if

Re: PTransform Annotations Proposal

2020-11-17 Thread Robert Bradshaw
cation via >> Environments but it could also just delegate this to a resource >> manager which is what we could do for example for GPU (or data >> locality) cases on the Spark/Flink classic runners. If we tie this to >> environments we will leave classic runners out of the

Re: PTransform Annotations Proposal

2020-11-16 Thread Robert Bradshaw
I agree things like GPU, high-mem, etc. belong to the environment. If annotations are truly advisory, one can imagine merging environments by taking the union of annotations and still producing a correct pipeline. (This would mean that annotations would have to be a multi-map...) On the other

Re: Proposal: ToStringFn

2020-10-29 Thread Robert Bradshaw
so thanks for the clarification > because > it makes more sense now, my initial understanding was that it was more to > 'debug' SDK Harness processed elements (that's why I mentioned Instructions) > but > it is clearly beyond that. Yeah, I think this will be useful. > On Thu, Oct 29, 2

Re: Proposal: ToStringFn

2020-10-29 Thread Robert Bradshaw
On Thu, Oct 29, 2020 at 3:18 AM Ismaël Mejía wrote: > > Thanks for sharing, > > I was initially confused by the title/terminology, I thought it was > about an end-user transform but this is a 'protocol' for a runner to > get the string representation of an element encoded by a SDK Harness >

Re: [VOTE] Release 2.25.0, release candidate #2

2020-10-23 Thread Robert Bradshaw
+1 (binding). I verified the release artifacts and signatures, and tried a couple of Python pipelines from an install of a wheel in a fresh virtual environment. All looks good to me. On Thu, Oct 22, 2020 at 4:54 PM Tyson Hamilton wrote: > > +1 > > I went through the Nexmark queries and

Re: [DISCUSS] Sensible dependency upgrades

2020-10-23 Thread Robert Bradshaw
On Fri, Oct 23, 2020 at 10:16 AM Luke Cwik wrote: > > An additional thing I forgot to mention was that if we only had portable > runners our BOM story would be simplified since we wouldn't have the runner > on the classpath and users would have a consistent experience across runners > with

Re: [Proposal] Website Revamp project

2020-10-19 Thread Robert Bradshaw
Welcome. Looking forward to the website improvements. I would like to call out that it'd be a good idea to have a plan for how this can be developed/reviewed incrementally. We were able to get the last major website change in, but huge monolithic PRs that change the world are difficult to review

Re: Support for Flink 1.11

2020-10-16 Thread Robert Bradshaw
Support for Flink 1.11 is https://issues.apache.org/jira/browse/BEAM-10612 . It has been implemented and will be included in the next release (Beam 2.25). In the meantime, you could try building yourself from head. On Fri, Oct 16, 2020 at 4:39 AM Kishor Joshi wrote: > > Hi Team, > > Since the

Re: [VOTE] JupyterLab Sidepanel extension release v1.0.0 for BEAM-10545 RC #1

2020-10-16 Thread Robert Bradshaw
Thanks, Ning and Ahmet. +1 (binding) Approve the release. On Fri, Oct 16, 2020 at 1:34 PM Ning Kang wrote: > Sorry, if you cannot see the missing thread history in the previous > thread, here is another copy: > > On Fri, Oct 16, 2020 at 9:55 AM Robert Bradshaw > wrote: > &

Re: [DISCUSS] Clearing timers (https://github.com/apache/beam/pull/12836)

2020-09-18 Thread Robert Bradshaw
Personally, I think that it makes sense to not see an old timer after one has cleared (or moved) it. This is consistent with the idea that the behavior should be the same if every element was processed in its own bundle. The fact that a user can (sometimes, but not always) see an old timer after

Re: [DISCUSS] Deprecation of AWS SDK v2 IO connectors

2020-09-15 Thread Robert Bradshaw
should > deprecate a v1 IO ONLY when we have full feature parity in the v2 version. > I think we don't have a replacement for AWSv1 S3 IO so that one should not > be > deprecated. > > On Tue, Sep 15, 2020 at 6:07 PM Robert Bradshaw > wrote: > > > > The 10x-100x

Re: [DISCUSS] Deprecation of AWS SDK v2 IO connectors

2020-09-15 Thread Robert Bradshaw
The 10x-100x ratio looks like an answer right there about (non-)suitability for deprecation. The new question would be *why* people are using the v1 APIs. Is it because it was the original, or that it's been around longer, or it has more features?

<    1   2   3   4   5   6   7   8   9   10   >