Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-11 Thread Reuven Lax
I think it's best for the users to pick their own identifier for TupleTags. As part of the snapshot/update proposal, we want to be able update pipelines across refactorings. We tell people to supply a stable name for transforms (in apply) specifically to make this doable (so we can match up transfo

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Romain Manni-Bucau
Le 10 avr. 2018 22:59, "Robert Bradshaw" a écrit : On Tue, Apr 10, 2018 at 1:49 PM Romain Manni-Bucau wrote: > > Le 10 avr. 2018 21:25, "Robert Bradshaw" a écrit : > > On Tue, Apr 10, 2018 at 12:10 PM Romain Manni-Bucau > wrote: > >> This is interesting cause it leads to "why do the workers n

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Robert Bradshaw
On Tue, Apr 10, 2018 at 1:49 PM Romain Manni-Bucau wrote: > > Le 10 avr. 2018 21:25, "Robert Bradshaw" a écrit : > > On Tue, Apr 10, 2018 at 12:10 PM Romain Manni-Bucau > wrote: > >> This is interesting cause it leads to "why do the workers need to do it >> again instead of reusing the computed

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Romain Manni-Bucau
Le 10 avr. 2018 21:25, "Robert Bradshaw" a écrit : On Tue, Apr 10, 2018 at 12:10 PM Romain Manni-Bucau wrote: > This is interesting cause it leads to "why do the workers need to do it > again instead of reusing the computed one?". Technically the answer is > trivial but in terms of design I thi

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Robert Bradshaw
On Tue, Apr 10, 2018 at 12:10 PM Romain Manni-Bucau wrote: > This is interesting cause it leads to "why do the workers need to do it > again instead of reusing the computed one?". Technically the answer is > trivial but in terms of design I think beam tends to abuse static init > block - even in

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Romain Manni-Bucau
This is interesting cause it leads to "why do the workers need to do it again instead of reusing the computed one?". Technically the answer is trivial but in terms of design I think beam tends to abuse static init block - even in dofn api - which easily lead to issues when we will want to support m

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Ben Chambers
I believe it doesn't need to be stable across refactoring, only across all workers executing a specific version of the code. Specifically, it is used as follows: 1. Create a pipeline on the user's machine. It walks the stack until the static initializer block, which provides an ID. 2. Send the pip

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Romain Manni-Bucau
Well issue is more about all the existing tests currently. Out of curiosity: how walking the stack is stable since the stack can change? Stop condition is the static block of a class which can use method so refactoring and therefore is not stable. Should it be deprecated? Le 10 avr. 2018 19:17, "

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Robert Bradshaw
If it's too slow perhaps you could use the constructor where you pass an explicit id (though in my experience walking the stack isn't that slow). On Tue, Apr 10, 2018 at 10:09 AM Romain Manni-Bucau wrote: > Oops cross post sorry. > > Issue i hit on this thread is it is used a lot in tests abd it

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Thomas Groh
It may be reasonable to port most of those TupleTags to have an explicit, rather than generated ID, which will remove the need to inspect the stack trace. However, as mentioned, the constructor shouldn't provide an unstable ID, as otherwise most pipelines won't work on production runners. On Tue,

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Romain Manni-Bucau
Le 10 avr. 2018 18:40, "Robert Bradshaw" a écrit : These values should be, inasmuch as possible, stable across VMs. How slow is slow? Doesn't this happen only once per VM startup? Once per jvm and idea launches a jvm per test and the daemon does save enough time, you still go through the whole

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Romain Manni-Bucau
Oops cross post sorry. Issue i hit on this thread is it is used a lot in tests abd it slows down tests for nothing like with generatesequence ones Le 10 avr. 2018 19:00, "Romain Manni-Bucau" a écrit : > > > Le 10 avr. 2018 18:40, "Robert Bradshaw" a écrit : > > These values should be, inasmuch

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Thomas Groh
In fact, this is explicitly to work with `static final` TupleTags, and using a non-stable isn't feasible. A static final TupleTag won't be serialized in the closure of an object that uses it - it will be instantiated independently in any other ClassLoader, such as on a remote JVM. If you use a con

Re: org.apache.beam.sdk.values.TupleTag#genId and stacktraces?

2018-04-10 Thread Robert Bradshaw
These values should be, inasmuch as possible, stable across VMs. How slow is slow? Doesn't this happen only once per VM startup? On Tue, Apr 10, 2018 at 9:33 AM Romain Manni-Bucau wrote: > Hi > > does org.apache.beam.sdk.values.TupleTag#genId need to get the > stacktrace or can we use any id gen