Re: [QUESTION] Why no auto labels?

2023-10-20 Thread Robert Bradshaw via user
I'll take another look at the PR. My inclination is still to use uuids to uniquify. I think that's worth the cost to the readability hit (I'm OK reducing this down to 6-8 hex digits which will still give very low chances of collisions, though it doesn't solve the first one). If someone cares about

Re: [QUESTION] Why no auto labels?

2023-10-20 Thread Joey Tran
Just want to bump this discussion again. I'm introducing Beam to other developers at my Schrodinger now and the first (of hopefully many!) developer has started migrating our internal workflows to Beam. As I suspected though, he's complained about the iteration cycles spent from using the same

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw wrote: > On Fri, Oct 13, 2023 at 10:08 AM Joey Tran > wrote: > Are there places on the SDK side that expect unique labels? Or in >> non-updateable runners? >> > > That's a good question. The label eventually ends up here: >

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Robert Bradshaw via user
On Fri, Oct 13, 2023 at 10:08 AM Joey Tran wrote: > That makes sense. Would you suggest the new option simply suppress the > RuntimeError and use the non-unique label? > Yes. (Or, rather, not raise it.) > Are there places on the SDK side that expect unique labels? Or in > non-updateable

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
That makes sense. Would you suggest the new option simply suppress the RuntimeError and use the non-unique label? Are there places on the SDK side that expect unique labels? Or in non-updateable runners? Would making `--auto_unique_labels` a mutually exclusive option with streaming be a

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Robert Bradshaw via user
Thanks for the PR. I think we should follow Java and allow non-unique labels, but not provide automatic uniquification, In particular, the danger of using a counter is that one can get accidental (and potentially hard to check) off-by-one collisions. As a concrete example, imagine one partitions

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
For posterity: https://github.com/apache/beam/pull/28984 On Tue, Oct 10, 2023 at 7:29 PM Robert Bradshaw wrote: > I would definitely support a PR making this an option. Changing the > default would be a rather big change that would require more thought. > > On Tue, Oct 10, 2023 at 4:24 PM Joey

Re: [QUESTION] Why no auto labels?

2023-10-10 Thread Joey Tran
Bump on this. Sorry to pester - I'm trying to get a few teams to adopt Apache Beam at my company and I'm trying to foresee parts of the API they might find inconvenient. If there's a conclusion to make the behavior similar to java, I'm happy to put up a PR On Thu, Oct 5, 2023, 12:49 PM Joey Tran

Re: [QUESTION] Why no auto labels?

2023-10-05 Thread Joey Tran
Is it really toggleable in Java? I imagine that if it's a toggle it'd be a very sticky toggle since it'd be easy for PTransforms to accidentally rely on it. On Thu, Oct 5, 2023 at 12:33 PM Robert Bradshaw wrote: > Huh. This used to be a hard error in Java, but I guess it's togglable > with an

Re: [QUESTION] Why no auto labels?

2023-10-05 Thread Robert Bradshaw via user
Huh. This used to be a hard error in Java, but I guess it's togglable with an option now. We should probably add the option to toggle Python too. (Unclear what the default should be, but this probably ties into re-thinking how pipeline update should work.) On Thu, Oct 5, 2023 at 4:58 AM Joey Tran

Re: [QUESTION] Why no auto labels?

2023-10-05 Thread Joey Tran
Makes sense that the requirement is the same, but is the label auto-generation behavior the same? I modified the BeamJava wordcount example[1] to do the regex filter twice in a row, and unlike the BeamPython example I posted before, it just warns instead of throwing an exception. Tangentially, is

Re: [QUESTION] Why no auto labels?

2023-10-04 Thread Byron Ellis via user
This feels like something that maybe should be more explicit? Overloading the transform name to provide a unique stable id feels like perhaps too much magic... also maybe feels like this is leaking specific runner behavior? I get that it's convenient On Wed, Oct 4, 2023 at 9:16 AM Robert Bradshaw

Re: [QUESTION] Why no auto labels?

2023-10-04 Thread Robert Bradshaw via user
BeamJava and BeamPython have the exact same behavior: transform names within must be distinct [1]. This is because we do not necessarily know at pipeline construction time if the pipeline will be streaming or batch, or if it will be updated in the future, so the decision was made to impose this

Re: [QUESTION] Why no auto labels?

2023-10-03 Thread XQ Hu via user
That suggests the default label is created as that, which indeed causes the duplication error. On Tue, Oct 3, 2023 at 9:15 PM Joey Tran wrote: > Not sure what that suggests > > On Tue, Oct 3, 2023, 6:24 PM XQ Hu via user wrote: > >> Looks like this is the current behaviour. If you have `t = >>

Re: [QUESTION] Why no auto labels?

2023-10-03 Thread Joey Tran
Not sure what that suggests On Tue, Oct 3, 2023, 6:24 PM XQ Hu via user wrote: > Looks like this is the current behaviour. If you have `t = > beam.Filter(identity_filter)`, `t.label` is defined as > `Filter(identity_filter)`. > > On Mon, Oct 2, 2023 at 9:25 AM Joey Tran > wrote: > >> You don't

Re: [QUESTION] Why no auto labels?

2023-10-03 Thread XQ Hu via user
Looks like this is the current behaviour. If you have `t = beam.Filter(identity_filter)`, `t.label` is defined as `Filter(identity_filter)`. On Mon, Oct 2, 2023 at 9:25 AM Joey Tran wrote: > You don't have to specify the names if the callable you pass in is > /different/ for two `beam.Map`s,

Re: [QUESTION] Why no auto labels?

2023-10-02 Thread Joey Tran
You don't have to specify the names if the callable you pass in is /different/ for two `beam.Map`s, but if the callable is the same you must specify a label. For example, the below will raise an exception: ``` | beam.Filter(identity_filter) | beam.Filter(identity_filter) ```

Re: [QUESTION] Why no auto labels?

2023-10-02 Thread Bruno Volpato via user
If I understand the question correctly, you don't have to specify those names. As Reuven pointed out, it is probably a good idea so you have a stable / deterministic graph. But in the Python SDK, you can simply use pcollection | map_fn, instead of pcollection | 'Map' >> map_fn. See an example

Re: [QUESTION] Why no auto labels?

2023-10-01 Thread Joey Tran
Hmm, I'm not sure what you mean by "updating pipelines in place". Can you elaborate? I forgot to mention my question is posed from the context of a python SDK user, and afaict, there doesn't seem to be an obvious way to autogenerate names/labels. Hearing that the java SDK supports it makes me

Re: [QUESTION] Why no auto labels?

2023-10-01 Thread Reuven Lax via user
Are you talking about transform names? The main reason was because for runners that support updating pipelines in place, the only way to do so safely is if the runner can perfectly identify which transforms in the new graph match the ones in the old graph. There's no good way to auto generate

[QUESTION] Why no auto labels?

2023-09-30 Thread Joey Tran
After writing a few pipelines now, I keep getting tripped up from accidentally have duplicate labels from using multiple of the same transforms without labeling them. I figure this must be a common complaint, so I was just curious, what the rationale behind this design was? My naive thought off