I'll take another look at the PR. My inclination is still to use uuids
to uniquify. I think that's worth the cost to the readability hit (I'm
OK reducing this down to 6-8 hex digits which will still give very low
chances of collisions, though it doesn't solve the first one). If
someone cares about
Just want to bump this discussion again. I'm introducing Beam to other
developers at my Schrodinger now and the first (of hopefully many!)
developer has started migrating our internal workflows to Beam. As I
suspected though, he's complained about the iteration cycles spent from
using the same
On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw wrote:
> On Fri, Oct 13, 2023 at 10:08 AM Joey Tran
> wrote:
>
Are there places on the SDK side that expect unique labels? Or in
>> non-updateable runners?
>>
>
> That's a good question. The label eventually ends up here:
>
On Fri, Oct 13, 2023 at 10:08 AM Joey Tran
wrote:
> That makes sense. Would you suggest the new option simply suppress the
> RuntimeError and use the non-unique label?
>
Yes. (Or, rather, not raise it.)
> Are there places on the SDK side that expect unique labels? Or in
> non-updateable
That makes sense. Would you suggest the new option simply suppress the
RuntimeError and use the non-unique label? Are there places on the SDK side
that expect unique labels? Or in non-updateable runners?
Would making `--auto_unique_labels` a mutually exclusive option with
streaming be a
Thanks for the PR.
I think we should follow Java and allow non-unique labels, but not provide
automatic uniquification, In particular, the danger of using a counter is
that one can get accidental (and potentially hard to check) off-by-one
collisions. As a concrete example, imagine one partitions
For posterity: https://github.com/apache/beam/pull/28984
On Tue, Oct 10, 2023 at 7:29 PM Robert Bradshaw wrote:
> I would definitely support a PR making this an option. Changing the
> default would be a rather big change that would require more thought.
>
> On Tue, Oct 10, 2023 at 4:24 PM Joey
Bump on this. Sorry to pester - I'm trying to get a few teams to adopt
Apache Beam at my company and I'm trying to foresee parts of the API they
might find inconvenient.
If there's a conclusion to make the behavior similar to java, I'm happy to
put up a PR
On Thu, Oct 5, 2023, 12:49 PM Joey Tran
Is it really toggleable in Java? I imagine that if it's a toggle it'd be a
very sticky toggle since it'd be easy for PTransforms to accidentally rely
on it.
On Thu, Oct 5, 2023 at 12:33 PM Robert Bradshaw wrote:
> Huh. This used to be a hard error in Java, but I guess it's togglable
> with an
Huh. This used to be a hard error in Java, but I guess it's togglable
with an option now. We should probably add the option to toggle Python too.
(Unclear what the default should be, but this probably ties into
re-thinking how pipeline update should work.)
On Thu, Oct 5, 2023 at 4:58 AM Joey Tran
Makes sense that the requirement is the same, but is the label
auto-generation behavior the same? I modified the BeamJava
wordcount example[1] to do the regex filter twice in a row, and unlike the
BeamPython example I posted before, it just warns instead of throwing an
exception.
Tangentially, is
This feels like something that maybe should be more explicit? Overloading
the transform name to provide a unique stable id feels like perhaps too
much magic... also maybe feels like this is leaking specific runner
behavior? I get that it's convenient
On Wed, Oct 4, 2023 at 9:16 AM Robert Bradshaw
BeamJava and BeamPython have the exact same behavior: transform names
within must be distinct [1]. This is because we do not necessarily know at
pipeline construction time if the pipeline will be streaming or batch, or
if it will be updated in the future, so the decision was made to impose
this
That suggests the default label is created as that, which indeed causes the
duplication error.
On Tue, Oct 3, 2023 at 9:15 PM Joey Tran wrote:
> Not sure what that suggests
>
> On Tue, Oct 3, 2023, 6:24 PM XQ Hu via user wrote:
>
>> Looks like this is the current behaviour. If you have `t =
>>
Not sure what that suggests
On Tue, Oct 3, 2023, 6:24 PM XQ Hu via user wrote:
> Looks like this is the current behaviour. If you have `t =
> beam.Filter(identity_filter)`, `t.label` is defined as
> `Filter(identity_filter)`.
>
> On Mon, Oct 2, 2023 at 9:25 AM Joey Tran
> wrote:
>
>> You don't
Looks like this is the current behaviour. If you have `t =
beam.Filter(identity_filter)`, `t.label` is defined as
`Filter(identity_filter)`.
On Mon, Oct 2, 2023 at 9:25 AM Joey Tran wrote:
> You don't have to specify the names if the callable you pass in is
> /different/ for two `beam.Map`s,
You don't have to specify the names if the callable you pass in is
/different/ for two `beam.Map`s, but if the callable is the same you must
specify a label. For example, the below will raise an exception:
```
| beam.Filter(identity_filter)
| beam.Filter(identity_filter)
```
If I understand the question correctly, you don't have to specify those
names.
As Reuven pointed out, it is probably a good idea so you have a stable /
deterministic graph.
But in the Python SDK, you can simply use pcollection | map_fn,
instead of pcollection
| 'Map' >> map_fn.
See an example
Hmm, I'm not sure what you mean by "updating pipelines in place". Can you
elaborate?
I forgot to mention my question is posed from the context of a python SDK
user, and afaict, there doesn't seem to be an obvious way to autogenerate
names/labels. Hearing that the java SDK supports it makes me
Are you talking about transform names? The main reason was because for
runners that support updating pipelines in place, the only way to do so
safely is if the runner can perfectly identify which transforms in the new
graph match the ones in the old graph. There's no good way to auto generate
After writing a few pipelines now, I keep getting tripped up from
accidentally have duplicate labels from using multiple of the same
transforms without labeling them. I figure this must be a common complaint,
so I was just curious, what the rationale behind this design was? My naive
thought off
21 matches
Mail list logo