Re: Compile-time type hierarchy information in pattern switch
- Mail original - > De: "Brian Goetz" <brian.go...@oracle.com> > À: "Peter Levart" <peter.lev...@gmail.com>, "Remi Forax" <fo...@univ-mlv.fr> > Cc: "amber-spec-experts" <amber-spec-experts@openjdk.java.net> > Envoyé: Vendredi 6 Avril 2018 14:48:50 > Objet: Re: Compile-time type hierarchy information in pattern switch > This may be O(n), but its not really something I want to do when linking > a call site... I agree :) Anyway, i've implemented the algorithm of Peter, fix a typo (i++ instead of i--) and add the fact that conceptually the supertype of an interface is java.lang.Object https://github.com/forax/exotic/blob/master/src/main/java/com.github.forax.exotic/com/github/forax/exotic/TypeSwitch.java#L96 I've also implemented a strategy that use if instanceof but it doesn't perform well, i suppose it's because with a real instanceof the VM gather a profile while with Class.isInstance(), it does not. It can be fixed by adding a special method handle combiner to java.lang.invoke.MethodHandles. That's said i may be wrong, it's perhaps something else. TypeSwitchBenchMark.big_big_instanceof_cascade avgt 15 367.409 ± 1.844 ns/op TypeSwitchBenchMark.big_big_type_switch avgt 15 52.238 ± 0.455 ns/op TypeSwitchBenchMark.small_big_instanceof_cascadeavgt 15 34.940 ± 0.287 ns/op TypeSwitchBenchMark.small_big_type_switch avgt 15 52.204 ± 0.319 ns/op TypeSwitchBenchMark.small_small_instanceof_cascade avgt 157.320 ± 0.100 ns/op TypeSwitchBenchMark.small_small_type_switch avgt 156.122 ± 0.027 ns/op The first big/small is for the number of cases, the second is for the number of classes seen at runtime, so small_big_type_switch means a small number of cases with a lot of runtime classes. To summarize, if the number of classes seen at runtime is small, the type_switch wins (it uses if getClass), if there are a lot of cases, the type_switch wins (it uses ClassValue.get()) but if there are few cases and a lot of classes, the type_switch is behind :( Rémi > > On 4/6/2018 5:01 AM, Peter Levart wrote: >> >> >> On 04/05/2018 11:20 PM, Peter Levart wrote: >>> >>> >>> On 04/05/18 22:28, Remi Forax wrote: >>>> the way to detect it is to use the DAG of the supertypes (lazily >>>> constructed*), from the last to the first case, the idea is to >>>> propagate the index of down to the super types, if during the >>>> propagation, you find a supertype which is also a case and with an >>>> index lower that the currently propagated, then it's a failure. >>>> >>>> Rémi >>>> >>>> * you do not have to actually create the DAG, just be able to >>>> traverse it from the subtype to the supertypes. >>> >>> Yes, this idea is similar to mine. We just have to find a conflict >>> between subtype relationships and compile time order of cases which >>> could be viewed as forming implicit pair-by-pair relationships of >>> consecutive cases. If there's a cycle, we have a conflict. >>> >> >> And Remi's algorithm is of course the best implementation of this >> search. Here's a variant that does not need an index, just a set of >> types: >> >> start with an empty set S >> for each case type T from the last case up to the first: >> if S contains T: >> bail out with error >> add T and all its supertypes to S >> >> The time complexity of this algorithm is O(n). It takes at most n * k >> lookups into a (hash)set where k is an average number of supertypes of >> a case type. Usually, when case types share common supertypes not >> far-away, the algorithm can prune branches in type hierarchy already >> visited. Implementation-wise, if the algorithm uses a HashMap, mapping >> visited type to case type it was visited from (back to Remi's index of >> case), it can also produce a meaningful diagnostic message, mentioning >> precisely which two cases are in wrong order according to type hierarchy: >> >> Class[] caseTypes = ...; >> TypeVisitor visitor = new TypeVisitor(); >> for (int i = caseTypes.length - 1; i >= 0; i++) { >> visitor.visitType(caseTypes[i]); >> } >> >> class TypeVisitor extends HashMap<Class, Class> { >> >> void visitType(Class caseType) { >> Class conflictingCaseType = putIfAbsent(caseType, >> caseType); >> if (conflictingCaseType != null) { >> throw new IllegalStateException( >> "Case "
Re: Compile-time type hierarchy information in pattern switch
This may be O(n), but its not really something I want to do when linking a call site... On 4/6/2018 5:01 AM, Peter Levart wrote: On 04/05/2018 11:20 PM, Peter Levart wrote: On 04/05/18 22:28, Remi Forax wrote: the way to detect it is to use the DAG of the supertypes (lazily constructed*), from the last to the first case, the idea is to propagate the index of down to the super types, if during the propagation, you find a supertype which is also a case and with an index lower that the currently propagated, then it's a failure. Rémi * you do not have to actually create the DAG, just be able to traverse it from the subtype to the supertypes. Yes, this idea is similar to mine. We just have to find a conflict between subtype relationships and compile time order of cases which could be viewed as forming implicit pair-by-pair relationships of consecutive cases. If there's a cycle, we have a conflict. And Remi's algorithm is of course the best implementation of this search. Here's a variant that does not need an index, just a set of types: start with an empty set S for each case type T from the last case up to the first: if S contains T: bail out with error add T and all its supertypes to S The time complexity of this algorithm is O(n). It takes at most n * k lookups into a (hash)set where k is an average number of supertypes of a case type. Usually, when case types share common supertypes not far-away, the algorithm can prune branches in type hierarchy already visited. Implementation-wise, if the algorithm uses a HashMap, mapping visited type to case type it was visited from (back to Remi's index of case), it can also produce a meaningful diagnostic message, mentioning precisely which two cases are in wrong order according to type hierarchy: Class[] caseTypes = ...; TypeVisitor visitor = new TypeVisitor(); for (int i = caseTypes.length - 1; i >= 0; i++) { visitor.visitType(caseTypes[i]); } class TypeVisitor extends HashMap{ void visitType(Class caseType) { Class conflictingCaseType = putIfAbsent(caseType, caseType); if (conflictingCaseType != null) { throw new IllegalStateException( "Case " + conflictingCaseType.getName() + " matches a subtype of what case " + caseType.getName() + " matches but is located after it"); } visitSupertypes(caseType, caseType); } private void visitSupertypes(Class type, Class caseType) { Class superclass = type.getSuperclass(); if (superclass != null && putIfAbsent(superclass, caseType) == null) { visitSupertypes(superclass, caseType); } for (Class superinterface : type.getInterfaces()) { if (putIfAbsent(superinterface, caseType) == null) { visitSupertypes(superinterface, caseType); } } } } Regards, Peter
Re: Compile-time type hierarchy information in pattern switch
On 04/05/18 21:42, Brian Goetz wrote: That's if you want to "fix" the order of cases at link-time in order to compute optimal dispatch logic. If you only want to verify and bail-out if they are not sorted already (i.e. you only accept changes in type hierarchy that don't change order of cases), you always need just n-1 comparisons. Perhaps I'm dense, but I don't see this. Suppose I have completely unrelated interfaces I, J, K, and L. The user says: case I: case J: case K: case L: which is fine because they're unordered. At runtime, any of the following type relations could have been injected: J <: I, K <: I, L <: I K <: J, L <: J L <: K and these would cause the switch to be misordered (and would have been rejected at compile time.) How am I to detect any of these with just three comparisons? If I pick the obvious n-1 (compare each to their neighbor) I wouldn't detect any of { L <: J, K <: I, L <: I }. You're right. Linear sorting would not help as there's no total order that could be derived from subtyping relationships. But as you say at the end, subtyping relationships form a directed acyclic graph on which you can perform topological sorting in linear time. Let's start with a list of cases that have already been ordered topologically at compile time. Say I, J, K, L (as in your example above). The types could be completely unrelated or there could be type relationships among them. Let's add to them synthetic "subtype" relationships (marked with <. to distinguish them from real subtype relationships <:) according to compile-time order of cases): I <. J J <. K K <. L Together with real direct subtype relationships, those form a graph. We just have to find out if this graph is acyclic or not. If it does not have a cycle, the order of case(s) is still OK and the switch is still valid. Otherwise the subtype relationships have changed in a way that makes the compile-time order of cases invalid. Finding cycle can be performed in linear time. Have I missed something this time too? Regards, Peter
Re: Compile-time type hierarchy information in pattern switch
- Mail original - > De: "Brian Goetz" <brian.go...@oracle.com> > À: "Tagir Valeev" <amae...@gmail.com>, "amber-spec-experts" > <amber-spec-experts@openjdk.java.net> > Envoyé: Jeudi 5 Avril 2018 21:44:30 > Objet: Re: Compile-time type hierarchy information in pattern switch >> Is it too harsh to reject the whole class if the assumptions on class >> hierarchy which were necessary to compile the switch statements used >> in the class are not valid at runtime? > > That is one of the questions! And the other question is: is this too > expensive to do this check at runtime, given that it will fail so > infrequently. > > If we can detect it cheaply enough, though, we can also repair the > situation and fall back to linear testing of patterns. This seems > better (we can execute the statement the user wrote) than failing. My > real question is can I punt on trying to detect it, and still optimize > the common cases down to O(1) dispatch the way to detect it is to use the DAG of the supertypes (lazily constructed*), from the last to the first case, the idea is to propagate the index of down to the super types, if during the propagation, you find a supertype which is also a case and with an index lower that the currently propagated, then it's a failure. Rémi * you do not have to actually create the DAG, just be able to traverse it from the subtype to the supertypes.
Re: Compile-time type hierarchy information in pattern switch
Is it too harsh to reject the whole class if the assumptions on class hierarchy which were necessary to compile the switch statements used in the class are not valid at runtime? That is one of the questions! And the other question is: is this too expensive to do this check at runtime, given that it will fail so infrequently. If we can detect it cheaply enough, though, we can also repair the situation and fall back to linear testing of patterns. This seems better (we can execute the statement the user wrote) than failing. My real question is can I punt on trying to detect it, and still optimize the common cases down to O(1) dispatch
Re: Compile-time type hierarchy information in pattern switch
That's if you want to "fix" the order of cases at link-time in order to compute optimal dispatch logic. If you only want to verify and bail-out if they are not sorted already (i.e. you only accept changes in type hierarchy that don't change order of cases), you always need just n-1 comparisons. Perhaps I'm dense, but I don't see this. Suppose I have completely unrelated interfaces I, J, K, and L. The user says: case I: case J: case K: case L: which is fine because they're unordered. At runtime, any of the following type relations could have been injected: J <: I, K <: I, L <: I K <: J, L <: J L <: K and these would cause the switch to be misordered (and would have been rejected at compile time.) How am I to detect any of these with just three comparisons? If I pick the obvious n-1 (compare each to their neighbor) I wouldn't detect any of { L <: J, K <: I, L <: I }. Skipping ahead, yes, guards do play part in the ordering, and (a) we can't detect changes to data in at runtime and (b) we can't even necessarily order the guards. But we can detect changes to type tests at runtime. The question is whether we should. Another problem with re-ordering cases at link time is when you support fall-through. What are fall-through(s) in a switch with re-ordered cases? Our story here is straightforward; we lower a switch whose labels are patterns to a switch whose labels are ints, and encode the patterns (or parts of them) as the static bootstrap arguments of the classifier bootstrap (just a more sophisticated version of what we do for longs, strings, and enums, as discussed previously.) The classifier spits out a number, and int switch mechanics does the rest. The question is to what degree we can rely on the compile-time assertion that the inputs are topologically sorted.
Re: Compile-time type hierarchy information in pattern switch
Hello! Is it too harsh to reject the whole class if the assumptions on class hierarchy which were necessary to compile the switch statements used in the class are not valid at runtime? E.g. compiler may gather all the assumptions across all the pattern-matching switches within the class and add some instructions to the which check these assumptions at once (probably calling some validation method which receives the expected hierarchy in some packed way)? This way the fail-fast behavior will be guaranteed (class refuses to initialize) and while some expensive runtime checks are to be made during class initialization, in case of several pattern switches in the same class, the number of checks will be reduced (although they still will be performed even if no such switch is actually executed). With best regards, Tagir Valeev. On Thu, Apr 5, 2018 at 6:40 PM, <fo...@univ-mlv.fr> wrote: > - Mail original - > > De: "Brian Goetz" <brian.go...@oracle.com> > > À: "Remi Forax" <fo...@univ-mlv.fr> > > Cc: "mark" <m...@io7m.com>, "amber-spec-experts" < > amber-spec-experts@openjdk.java.net> > > Envoyé: Jeudi 5 Avril 2018 17:25:36 > > Objet: Re: Compile-time type hierarchy information in pattern switch > > > Yes, this is surely an option. > > > > But it doesn't answer the underlying question -- if the hierarchy > > changes in various ways between compile and runtime, what behavior can > > the user count on, and what changes yield "undefined" behavior? > > no, it's not undefined, at least not an "undefined behavior" as in C. > At runtime, the code executed will be the one compiled. A hierarchy > changes is not a backward compatible changes, so one can expect surprise > and not something undefined. > > > > > While its easy to say "you should do what the code says", taking that > > too far ties tie our hands behind our back, and makes switches that > > should be O(1) into O(n). > > ???, not sure to understand. > If we record which case was executed for a given class in a hashmap and > use it as a cache, it will be always O(1) for all subsequent calls with the > same class. > > Rémi > > > > > On 4/5/2018 11:21 AM, Remi Forax wrote: > >> Or we can not try to do any check at runtime that validate the view of > the world > >> at compile time. > >> Currently, there is no check that verifies that the catch are in the > right order > >> or that a cascade of if-instanceofs means the same thing at compile > time and at > >> runtime. > >> > >> My opinion, we should just run the code that was compiled, even if the > world as > > > changed between the compilation and the execution. >
Re: Compile-time type hierarchy information in pattern switch
- Mail original - > De: "Brian Goetz" <brian.go...@oracle.com> > À: "Remi Forax" <fo...@univ-mlv.fr> > Cc: "mark" <m...@io7m.com>, "amber-spec-experts" > <amber-spec-experts@openjdk.java.net> > Envoyé: Jeudi 5 Avril 2018 17:25:36 > Objet: Re: Compile-time type hierarchy information in pattern switch > Yes, this is surely an option. > > But it doesn't answer the underlying question -- if the hierarchy > changes in various ways between compile and runtime, what behavior can > the user count on, and what changes yield "undefined" behavior? no, it's not undefined, at least not an "undefined behavior" as in C. At runtime, the code executed will be the one compiled. A hierarchy changes is not a backward compatible changes, so one can expect surprise and not something undefined. > > While its easy to say "you should do what the code says", taking that > too far ties tie our hands behind our back, and makes switches that > should be O(1) into O(n). ???, not sure to understand. If we record which case was executed for a given class in a hashmap and use it as a cache, it will be always O(1) for all subsequent calls with the same class. Rémi > > On 4/5/2018 11:21 AM, Remi Forax wrote: >> Or we can not try to do any check at runtime that validate the view of the >> world >> at compile time. >> Currently, there is no check that verifies that the catch are in the right >> order >> or that a cascade of if-instanceofs means the same thing at compile time and >> at >> runtime. >> >> My opinion, we should just run the code that was compiled, even if the world >> as > > changed between the compilation and the execution.
Re: Compile-time type hierarchy information in pattern switch
Yes, this is surely an option. But it doesn't answer the underlying question -- if the hierarchy changes in various ways between compile and runtime, what behavior can the user count on, and what changes yield "undefined" behavior? While its easy to say "you should do what the code says", taking that too far ties tie our hands behind our back, and makes switches that should be O(1) into O(n). On 4/5/2018 11:21 AM, Remi Forax wrote: Or we can not try to do any check at runtime that validate the view of the world at compile time. Currently, there is no check that verifies that the catch are in the right order or that a cascade of if-instanceofs means the same thing at compile time and at runtime. My opinion, we should just run the code that was compiled, even if the world as changed between the compilation and the execution.
Re: Compile-time type hierarchy information in pattern switch
Hi, On 04/04/2018 07:07 PM, Brian Goetz wrote: The intended implementation strategy is to lower complex switches to densely-numbered `int` switches, and then invoke a classifier function that takes a target and returns the int corresponding to the lowered case number. The classifier function will be an `invokedynamic`, whose static bootstrap will contain a summary of the patterns. (We've already done this for switches on strings, enums, longs, non-dense ints, etc.) To deliver an early error, that means that (a) the compiler must encode through the static argument list all the assumptions it needs verified at runtime (e.g., `String <: Object`), and (b) at linkage time (the first time the switch is executed), those have to be tested. Doing so is plenty easy, but there's a startup cost, which could be as bad as _O(n^2)_, if I have to validate that no two case labels are ordered inconsistently with subtyping. Not necessarily. O(n log n) at worst for stable-sorting n cases which, if already sorted in compile time (i.e. no subtype changes between compile and link time), are resorted using just n-1 comparisons. That's if you want to "fix" the order of cases at link-time in order to compute optimal dispatch logic. If you only want to verify and bail-out if they are not sorted already (i.e. you only accept changes in type hierarchy that don't change order of cases), you always need just n-1 comparisons. The question is whether you only want to re-order / check-order according to type hierarchy or also according to other aspects of "dominance", for example: case Point p where (p.x >= 0 && p.y >= 0): ... case Point p where (p.x >= 0): ... Other aspects of dominance usually don't change between compile and link time, so stable-sorting cases could take just type hierarchy into account, unless you also allow type-hierarchy based conditions in where patterns, for example: case Holder h where (h.value instanceof TypeA): ... case Holder h where (h.value instanceof TypeB): ... Another problem with re-ordering cases at link time is when you support fall-through. What are fall-through(s) in a switch with re-ordered cases? For example: interface A {} interface B extends A {} switch (x) { case B b: ... // fall-through... case A a: A ab = ... ? a : b; ... What happens when you remove A from supertypes of B in a separately compiled code: interface A {} interface B {} Perhaps there's no need to worry about this as verifier would already catch such invalid code during runtime. So fall-through(s) could just stay the same even if cases are virtually reordered for the purpose of computing dispatch logic. The fall-through logic could sometimes survive changes in type hierarchy unnoticed by verifier but would give questionable results when executed. But that could be said for any logic, not necessarily concerned with switch statements. Here's some experiment I played with that clearly separates compile-time, link-time and run-time parts of logic and is just API. You can even simulate the effects of adding subtype relationship(s) between compile-time of switch and link-time: http://cr.openjdk.java.net/~plevart/misc/TypeSwitch/TypeSwitch.java Regards, Peter A possible mitigation is to do the check as a system assertion, which only gets run if we are run with `-esa`; we then might still have some static code bloat (depending on how we encode the assumptions), but at least skip the dynamic check most of the time. On 4/4/2018 1:01 PM, Mark Raynsford wrote: I'm still giving thought to everything you've written, but I am wondering: How feasible is it to get the above to fail early with an informative exception/Error?
Re: Compile-time type hierarchy information in pattern switch
The intended implementation strategy is to lower complex switches to densely-numbered `int` switches, and then invoke a classifier function that takes a target and returns the int corresponding to the lowered case number. The classifier function will be an `invokedynamic`, whose static bootstrap will contain a summary of the patterns. (We've already done this for switches on strings, enums, longs, non-dense ints, etc.) To deliver an early error, that means that (a) the compiler must encode through the static argument list all the assumptions it needs verified at runtime (e.g., `String <: Object`), and (b) at linkage time (the first time the switch is executed), those have to be tested. Doing so is plenty easy, but there's a startup cost, which could be as bad as _O(n^2)_, if I have to validate that no two case labels are ordered inconsistently with subtyping. A possible mitigation is to do the check as a system assertion, which only gets run if we are run with `-esa`; we then might still have some static code bloat (depending on how we encode the assumptions), but at least skip the dynamic check most of the time. On 4/4/2018 1:01 PM, Mark Raynsford wrote: I'm still giving thought to everything you've written, but I am wondering: How feasible is it to get the above to fail early with an informative exception/Error?
Re: Compile-time type hierarchy information in pattern switch
On 2018-04-03T12:36:43 -0400 Brian Goetzwrote: > > Here's one that I suspect we're not expecting to recover terribly well > from: hierarchy inversion. Suppose at compile time A <: B. So the > following is a sensible switch body: > > case String: println("String"); break; > case Object: println("Object"); break; > > Now, imagine that by runtime, String no longer extends Object, but > instead Object absurdly extends String. Do we still expect the above to > print String for all Strings, and Object for everything else? Or is the > latter arm now dead at runtime, even though it wouldn't compile after > the change? Or is this now UB, because it would no longer compile? I'm still giving thought to everything you've written, but I am wondering: How feasible is it to get the above to fail early with an informative exception/Error? -- Mark Raynsford | http://www.io7m.com
Compile-time type hierarchy information in pattern switch
Along the lines of the previous discussion about separate compilation skew with enums ... I'm trying to find the right place to draw the line with respect to post-compilation class hierarchy changes. Recall that we can impose a _dominance ordering_ on patterns; pattern P dominates Q if everything that is matched by Q also is matched by P. We already use this today, in catch blocks, to reject programs with dead code; you can't say `catch Exception` before `catch IOException`, because the latter block would be dead. We want to do the same with patterns, so: case String x: ... case Object x: ... is OK but case Object x: ... case String x: ... is rejected at compile time. Separately, we'd like for pattern matching to be efficient; the definition of "inefficient" would be for pattern matching to be inherently O(n), when we can frequently do much better. There's plenty of literature on compiling patterns to decision trees, but none of them address the problem we have to: separate compilation. So any decision tree computed at compile time might be wrong in undesirable ways by runtime. We could also compute a decision tree at runtime using indy; while this is our intent, the devil is in the details. We don't want computing the tree to be too expensive, nor do we want to have to capture O(n^2) compile-time constraints to be validated at runtime. So I'd like to focus on what changes we're willing to accept between compilation and runtime, what our expectations would be for those changes. We've already discussed one of these: novel values in enum / sealed type switches, and for them, the answer is throwing some sort of exception. Another that we dealt with long ago is changing enum ordinals; we decided at the time that we're willing for this to be a BC change, so we generate extra code that uses the as-runtime ordinals rather than the as-compile-time ordinals when lowering the switch into an integer switch. (If we weren't willing to tolerate such changes, we'd have a simpler translation: just lower an enum switch to a switch on its ordinal.) Here's one that I suspect we're not expecting to recover terribly well from: hierarchy inversion. Suppose at compile time A <: B. So the following is a sensible switch body: case String: println("String"); break; case Object: println("Object"); break; Now, imagine that by runtime, String no longer extends Object, but instead Object absurdly extends String. Do we still expect the above to print String for all Strings, and Object for everything else? Or is the latter arm now dead at runtime, even though it wouldn't compile after the change? Or is this now UB, because it would no longer compile? A more realistic example of a hierarchy change is introducing an interface. If we have: interface I { } class C { } and a switch case I: ... case C: ... and later, we make C implement I, we have a similar situation; the switch would no longer compile. Are we allowed to make optimizations based on the compile-time knowledge that C As an example, suppose A, B, C, ... Z are final classes, and I is an interface implemented by none of them. Then I can dispatch: case A: ... case B: ... ... case I: ... ... case Z: ... case Object: ... in two type operations; hash the class of the target and look it up in a table containing A...Z, and then do a test against I. However, if I'm required to deal with the case where some of A..Z are retrofitted to implement I after compile time, and I'm expected to process the switch in order based on how it is written, then I have to fall back to O(1) type operations at runtime, or, I have to do as many as O(n^2) type comparisons at link time. These are steep cliffs to fall off of. (Mandating throwing an exception at link time is also expensive.) Today, all switch cases are totally unordered, so we're free to execute them in O(1) time. I'd like for that to continue to be the case, even as we add more complex switches. So, let's have a conversation about expectations for what we should do for a switch at runtime that would no longer compile due to post-compilation hierarchy changes (new supertypes, hierarchy inversions, removed supertypes, final <--> nonfinal, etc.)