[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154893#comment-17154893 ] Julian Hyde commented on CALCITE-3786: -- I was mistaken. It actually affected multiple {{RelNode}} methods: {{isDistinct()}}, {{isKey(ImmutableBitSet)}}, {{getQuery()}}, {{getRows()}}, {{getVariablesStopped()}}, {{computeSelfCost()}}, {{isValid(boolean)}}, {{getCollationList()}}, {{getChildExps()}}. I'll change all of these methods to 'deprecated - to be removed before 1.25'. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Attachments: image-2020-06-23-12-47-25-599.png, screenshot-1.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png, > screenshot-6.png > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154135#comment-17154135 ] Danny Chen commented on CALCITE-3786: - I'm afraid this is a mistake, i meant to change them all to be deprecated in 1.25, the {{RelNode.isDistinct()}} is missed. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Attachments: image-2020-06-23-12-47-25-599.png, screenshot-1.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png, > screenshot-6.png > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154096#comment-17154096 ] Julian Hyde commented on CALCITE-3786: -- [~danny0405], In this change, you marked {{AbstractRelNode.isDistinct()}} 'deprecated to be removed before 1.25', whereas {{RelNode.isDistinct()}} was and still is marked 'deprecated to be removed before 2.0'. We can't remove one without the other. Which release should it be? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Attachments: image-2020-06-23-12-47-25-599.png, screenshot-1.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png, > screenshot-6.png > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143173#comment-17143173 ] Haisheng Yuan commented on CALCITE-3786: OK, since you desire it so much, let me show you some graphs. The benchmark test hardly tell anything useful and match the real use case, let's run tpch query to see how it goes. Let's use {{TpchTest.testQuery05}}, which is slow enough to sample enough data for these methods. 1. Master before your change ([978bb7|https://github.com/apache/calcite/commit/978bb7ea44969351468d1b5e240e8f57af7e5770]) !screenshot-1.png|width=895,height=299! !screenshot-2.png|width=850,height=364! 2. Your patch with part 2 ([b18470|https://github.com/apache/calcite/commit/b18470797eaec8c4e64382c0b742ae83f758275f]) !screenshot-3.png|width=869,height=456! !screenshot-4.png|width=851,height=407! 3. My patch committed to master that is [reverted|https://github.com/apache/calcite/commit/a551d4bfa3b97c42e74fe4e30ca66ba493719c96] by you ([b00c1f|https://github.com/apache/calcite/commit/b00c1fd6e26b5e3cc1a810c9f862b184649cd1a6]) !screenshot-5.png|width=861,height=299! 4. My patch that make digestEquals and digestHash private per your request ([173624|https://github.com/apache/calcite/commit/1736242612634359164be483334884cb9113f804]) !image-2020-06-23-12-47-25-599.png|width=869,height=307! Note that recomputeDigest in my patches costs nothing, it just clears the hash to 0. It turned out my initial patch that got +1 from 2 committers, yet you said which is a mess, far from +1, then reverted, yields the best performance and use the least memory. Even my last patch with private digestEquals after your review, still perform better than yours and master with string digest. Now do you think it is worth allowing digestEquals and digestHash to be protected instead of private, so that if downstream projects think the default digestEquals and digestHash are performance bottleneck (it doesn't seem to be for this TPCH query), they can override it like [this|https://github.com/apache/calcite/commit/b00c1fd6e26b5e3cc1a810c9f862b184649cd1a6#diff-16c976c118188d143e34105a8c79f235], or they can choose to override the default behavior in case the downstream data type doesn't have fields in CALCITE-3793 that the default methods can't handle appropriately? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Attachments: image-2020-06-23-12-47-25-599.png, screenshot-1.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png, screenshot-5.png, > screenshot-6.png > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142508#comment-17142508 ] Danny Chen commented on CALCITE-3786: - > Wait your fix to be released with 1.24 I have fixed it in CALCITE-3793 which is in release 1.23. > I just move your code from one place to another place I don't care where the code was moved, i just want to know if the new patch is better(collect the items each time #equals instead of cache the object references). > RexCall toString() still stores string digest The RexCall string may still be problematic, but the new patch still does not solve that, so let's put it aside and not struggle on it. > Do you really want to do benchmark test competition? I did, and this is *a must*. Especially we have spent so much time on this issue and you modify based on my code, i need to see if the modification deserves that. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142199#comment-17142199 ] Haisheng Yuan commented on CALCITE-3786: {quote}My example code just illustrates that row type for downstream may not have fields, but they must also obey the rule that the type should equal(in the version before 1.22, we compare with just row type, but later we switch to field list which may cause problem, the code fix that). {quote} So the only thing that downstream can do is wait. Wait your fix to be released with 1.24, because they have no choice to override the default behavior in their own operator, do they? Anyway, the digestEquals and digestHash has been changed to private completely per your request, should not be a problem that concerns you. {quote}you have to cache the hashcode for RexNodes, it sacrifice performance for memory. {quote} I can change it to only cache hashcode to RexCall if you think it is wasting memory. Do you know that RexNode can be shared between operators, especially when logical operator is transformed to physical operator. Do you really want to compute hashCode for each RexNode again even they are shared? {quote}i don't think #getDigest should be a public API, the digest is a planner internal abstraction {quote} I 100% agree with you. I don't think it should be a public API either. But it is already a public API, no matter you like it or not, since the first day. Dude, don't be pissed off. You have reviewed my code multiple times, helped improve my PR and remove bunch of ugly codes and unnecessary subclasses. It is also a part of your work, isn't it? Look, I didn't revert your code at all, I just move your code from one place to another place, e.g. from Digest to AbstractRelNode, don't you agree? It is just a tiny refactoring on top of your contribution. Some of the change is just like I collect the money that you have left on the table. For example: - The RexCall toString() still stores string digest. When the toString is called, we still can't avoid OOM because of the digest there. - The Janino metadata code generator still calls RexNode#toString, it will use the large RexCall string as the key, until the metadata is cleared. Do you really want to do benchmark test competition? Especially for memory consumption? It is not fair to you. You have done tons of work, made the money, but forgot to take the money. I am just helping you take the money that is left on the table by you. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142172#comment-17142172 ] Chunwei Lei commented on CALCITE-3786: -- > Your patch collects the digest items every time we do #equals, and you have >to cache the hashcode for RexNodes, it sacrifice performance for memory, so i >would be hard to say it is better, we need a benchmark to prove that. Also >note that the #hashCode and #equals have different items, that means 2 equals >Digest can have different hashCode which is error-prone. You have a point. > BTW, i don't think #getDigest should be a public API, the digest is a planner >internal abstraction, the only interface we should impose is #explainTerms, so >i don't think there is compatibility problem for master. I don't think so. We should be more serious about the public API just like others. Because we don't know how much the downstream projects depend on it. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142132#comment-17142132 ] Chunwei Lei commented on CALCITE-3786: -- IMHO, the latest change proposed by [~hyuan] is considerable and reasonable. Is it possible to provide a benchmark to show the improvement? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142117#comment-17142117 ] Danny Chen commented on CALCITE-3786: - > each operator has to getFieldList from rowtype and compare, but some > operators don't want to do so, but they have no control I would say that the row type equivalence is *a must* for current transformation, you can not recognize that 2 nodes with different fields number or types are equivalent. The planner also have a strict check inside actually. Thus make the row type comparison transient to downstream project nodes reduce the redundant work. My example code just illustrates that row type for downstream may not have fields, but they must also obey the rule that the type should equal(in the version before 1.22, we compare with just row type, but later we switch to field list which may cause problem, the code fix that). > Do you think it is better to leave some space for operators to define their > own logic? I did, that is why we have different SqlExplainLevel for customization of different cases, for 90% cases the digest items and toString items are almost the same, the Enum keeps the customization and also saves the redundant interfaces. As for other things, i.e. row type. hints, traits, i don't think we should/need to open customization for them, they are common and required. Your patch collects the digest items every time we do #equals, and you have to cache the hashcode for RexNodes, it sacrifice performance for memory, so i would be hard to say it is better, we need a benchmark to prove that. BTW, i don't think #getDigest should be a public API, the digest is a planner internal abstraction, the only interface we should impose is #explainTerms, so i don't think there is compatibility problem for master. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142038#comment-17142038 ] Haisheng Yuan commented on CALCITE-3786: [~danny0405] Thank you for replying and reviewing. You are correct that using DIGEST_ATTRIBUTE can avoid plan diffs. However, do you think it is better that each operator can control how they define their equivalence. Like I said in previous example, each operator has to getFieldList from rowtype and compare, but some operators don't want to do so, but they have no control... You already have an example in [your code|https://github.com/apache/calcite/commit/69f25863f5f4197c17927a39a82cbf1cffd12b80#diff-cb90b503c85180c9be40a4984f8a1a54R128]. We can't know all the different requirements before-hands, right? Do you think it is better to leave some space for operators to define their own logic? Anyway, to ease your concern, now digestEquals and digestHash are private, and there is only 1 RelDigest subclass. There are so many inner class in Calcite, we can't remove all the inner classes, right? And we'd better keep backward compatibility of {{getDigest}} if we can keep it easily, do you agree? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141764#comment-17141764 ] Danny Chen commented on CALCITE-3786: - > You haven't clarify how can I customize it in current master without > introducing plan changes I did, but you seems to have some confusion, maybe you should check how the customization works before my patch, you would find the answer, or you may take [1] as an example. [1] https://github.com/apache/calcite/blob/797b48754d074d53e853a67bf249c155f51b71dc/core/src/main/java/org/apache/calcite/rel/core/Project.java#L262 > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141718#comment-17141718 ] Haisheng Yuan commented on CALCITE-3786: {quote} Make the Digest an inner class {quote} Is anything wrong with inner classes? It is everywhere in Calcite. If it is evil, should we request Oracle to remove Java inner classes instead? {quote} Add #getRelDigest #digestEquals #digestHash (plus the original #getDigest #explainTerms) {quote} getDigest (which returns a String) is a public API, do you really want to break it even you can choose not to break it? If you think backward compatibility is not important, I am more than happy to remove getRelDigest, just using getDigest instead. #digestEquals #digestHash can be used to customize the rel equivalence comparison, decouple it from explainTerms. Do you think no downstream need it? You haven't clarify how can I customize it in current master without introducing plan changes > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141682#comment-17141682 ] Danny Chen commented on CALCITE-3786: - I don't think that reduce the redundant object references can significantly reduce memory consumption. In order to remove the object references, you have to: - Make the Digest an inner class - Add #getRelDigest #digestEquals #digestHash (plus the original #getDigest #explainTerms) I don't think these interfaces are necessary and have clear semantics, in order to reduce memory usage which is not a bottleneck. Expose a single interface #explainTerms with different SqlExplainLevel is well to use and compatible with the old behaviors. Maybe we really have disagreement on the design, i have to say i will not accept it. If possible, move to the RexNode memo, which can indeed reduce memory consumption significantly, and i would give a definite +1 for that. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141648#comment-17141648 ] Haisheng Yuan commented on CALCITE-3786: {quote} I don't think having an interface there with 3 sub-classes makes any sense. {quote} Thank you for your feedback. I know that you will question on the 3 sub-classes, so here is another version with only 1 sub-class. Does it make more sense to you? https://github.com/apache/calcite/pull/2039/commits/897163219d981565a524121d47ec7845c971c473 Under what circumstances will you accept the change? Even when this patch can significantly reduce memory consumption comparing with current master branch, you won't accept it? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141639#comment-17141639 ] Danny Chen commented on CALCITE-3786: - I don't think having an interface there with 3 sub-classes makes any sense. I think i have represented my points as clear, so, i would not repeat it again and again. And stop talking with emotions, i will choose to ignore that. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141516#comment-17141516 ] Haisheng Yuan commented on CALCITE-3786: {quote} Just implement the #explainTerms correctly with the items you want to involve in. {quote} Really? Could you please kindly show me the code snippet? Won't it introduce tons of "meaningless test plan changes"? {quote} Use different SqlExplainLevel to distinguish just like before. {quote} How? By setting explain level to NO_ATTRIBUTES? Won't all the plan nodes be affected? {quote} especially when we are talking about "semantic equivalent" not strict #equals. {quote} There is no "semantic equivalent" or "strict #equals", which are terms coined by you. If one rel instance can be used to represent the other instance, then they are equal. The digestEquals was introduced (which I have changed to private, because you said it is a mess), because we can't override #equals and #hashCode, which are final. If we change them directly, then it will be a large breaking change, when downstream projects use RelNode as HashMap key, in which case they have to change the HashMap to IdentityHashMap. {quote} Just like Julian said, Cons is a basic data structure, it can be used in many codes, but digest is a planner internal thing {quote} Well, are you aware of that, in non-federation and non-MV project that uses Calcite, it may generate thousands or tens of thousands RelNodes during query planning, but how many ConsList will be generated? Zero. The use case is not uncommon, it is even dominant. Do you think we should care more about a basic data structure that may not be used at all, than a "planner internal thing" that is ubiquitous in Calcite, and being used no matter what case? {quote} it is not necessary from my side if we can make the code more concise. {quote} Do you think a 256 lines Digest.java is conciser than a 52 lines RelDigest.java, which is essentially a 4 line interface: {code:java} public interface RelDigest { void clear(); RelNode getRel(); } {code} just like you think 9 lines of code are simpler that one line of code? Could you please kindly point out which part of [PR#2039|https://github.com/apache/calcite/pull/2039] is not concise so that I can make it conciser? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141268#comment-17141268 ] Danny Chen commented on CALCITE-3786: - > Could you please enlighten me on how to properly customize it? Just implement the #explainTerms correctly with the items you want to involve in. > Or My operator has some additional field, that participate in the operator > identity, aka digest comparison, but I don't want to output it in the explain > terms, is there anyway I can achieve this? Use different SqlExplainLevel to distinguish just like before. > Object#hashCode and #equals final so that we have a bug-free world. I think things are much different when there are 3 methods there, especially when we are talking about "semantic equivalent" not strict #equals. And i'm confused why we reference the Flink code, to prove what ? > 4 bytes to the Cons data structure to serve his purpose Just like Julian said, Cons is a basic data structure, it can be used in many codes, but digest is a planner internal thing, it is better if we can avoid theses duplicate object references, but it is not necessary from my side if we can make the code more concise. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141240#comment-17141240 ] Haisheng Yuan commented on CALCITE-3786: {quote} We should be cautious for #equals #hashcode #explainTerms to have similar variables, it is error-prone for downstream projects to implement for their custom nodes {quote} I astonishingly found that you are correct. #equals #hashCode #toString have similar variables likewise, e.g. [StreamShardMetadata.java|https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardMetadata.java#L112] in Flink, it is exactly what you said. Should I open a pull request to Flink to always use "toString().hashCode" and "toString().equals" or use Digest instead? What is worse, 90% of Java classes have exactly same variables for these methods. The worst thing is, they are default Java methods for every Java object. Damn it! We should propose to JCP members to make Object#hashCode and #equals final so that we have a bug-free world. {quote} The additional objects references should not be a bottleneck, removing it does not gain much. {quote} After deep meditation, I finally got your reasoning. Maybe because I still live in an era when the RAM is 128 MB. I can't help reflecting myself, am I thinking wrong in CALCITE-4049? Initially I agree with Xiening and Julian's criticize about our buddy adding 4 bytes to the Cons data structure to serve his purpose. I pondered whether we are too nitpicking to him. 1 ConsList only add 40KB, which is also acceptable. Isn't it? Anyway, I have addressed your kind comments, removed the digestEquals/digestHash from operators as you suggested, hope that can ease your concern. I have opened a PR for review: https://github.com/apache/calcite/pull/2039, which is open for discussion, comment and -1. Thanks. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140958#comment-17140958 ] Haisheng Yuan commented on CALCITE-3786: OK, let me put it in another way. What if my Join and Project operator want to customize the digest equal logic? e.g. In the default implementation digest equal comparison, we always ignore column names, only compare the field types. It is OK for my Project operator. But my Join and Filter operator doesn't want to do this, these 2 operators want the rowtype to be fully equal, even for column names. Could you please enlighten me on how to properly customize it? Or My operator has some additional field, that participate in the operator identity, aka digest comparison, but I don't want to output it in the explain terms, is there anyway I can achieve this? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140226#comment-17140226 ] Danny Chen commented on CALCITE-3786: - Removing digestEquals and digestHash does not really solve the concerns because the logic is still there, without a explicit interface, it's even harder for downstream projects to customize the hashCode/Digest format. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140202#comment-17140202 ] Haisheng Yuan commented on CALCITE-3786: [~danny0405] Thank you for your feedback. Definitely will respect your opinion. What if I remove digestEquals and digestHash from all the default base operators, Logical and Enumerable operators? They are completely optional, not required to implement. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140152#comment-17140152 ] Danny Chen commented on CALCITE-3786: - [~vladimirsitnikov] I didn't intend to do modify the RexNode normalization at first, but when i fix to the RexNode#equals instead of the pure string, i found that *i have to* because it seems not right that 2 RexCalls do not equals but have same digest. I can think of 2 more benefits to switch from pure string normalization to nodes: - We can add more normalizations logic more easily(put complex rexnodes behind), i.e. i plan to add a new component RexNormalizaiton - We should make the configuration with more flexibility, user/downstream can config through FrameworkConfig [~hyuan] Current patch already solved 2, 3 of your questions, for 1, i would fix if [~laurent] and i make agreements, but remove the Digest and add 2 more interfaces to RelNodes(#hashCode and #equals) goes too far away for these reasons: - We should be cautious for #equals #hashcode #explainTerms to have similar variables, it is error-prone for downstream projects to implement for their custom nodes - And it is hard to clarify the semantics between strict #equals1 and equivalent #equals1, what if i want a normal #equals1 there - The additional objects references should not be a bottleneck, removing it does not gain much. Based on that, i would definitely give a -1 for the commit f970df. As for 4, i'm looking forward to the RexNode memo(maybe), but it has no relationship with this issue. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 6h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140010#comment-17140010 ] Haisheng Yuan commented on CALCITE-3786: I think my objection on this proposal might be mis-interpreted, maybe because I was not clear on myself. I am objecting it, not because it won't reduce memory consumption, or it will make things worse than current master. Quite the opposite, I believe with the proposal, memory footprint will be reduced and improved, which I already stated in above comments. As long as we stop using string digest, things will become much better, without any doubt. However, is this the best approach? I don't think so. So no matter what the performance result is, my stance on the proposal doesn't change. I am still against it. Because this is not the most appropriate way we would adopt. [~danny0405] has worked a lot on this issue, and we all appreciate the amount of work he has done. He is very smart and diligent, I couldn't do it as good as he can on this matter. Even [~laurent] and I casted -1 on it, and people requested Danny to revert the change, but since he refused to revert it, let's just keep it in master. I believe that Danny believes his change is beneficial for all Calcite users, all with good intention. [~danny0405], Laurent made some code review comments, to try to help improve your code, even some are minor optimization, can't reflect on the benchmark, but definitely won't make your code run slower. What if we all take a step back, suppose this JIRA is about to improve the memory consumption, there are several steps here. We have finished the first step, thanks [~danny0405] for taking the initiative to complete the most important 1st step. Now let's think the next step. Is there anything that we can further improve based the first step? Some viewpoints (or questions) that I share with [~laurent] are: 1. Can the field of Digest be optimized to use less memory? 2. Can operators customize their own identity? 3. Can operators customize the string representation by omitting some fields? 4. Can RexNode be further improved? Here is the the next step draft change I have in my mind based the above questions and Laurent's comments. Can someone take a look? [~danny0405] Feel free to take from it if you think there are some improvements: https://github.com/apache/calcite/commit/f970df50eb69389bbc81bd94e0c8f9da033e5e01 The main change is RelDigest.java, AbstractRelNode.java and RelSubset.java. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140008#comment-17140008 ] Julian Hyde commented on CALCITE-3786: -- [~danny0405] and [~vlsi] and others. For the record, I don't think that it is essential that we add *performance tests* to the code base. A benchmark - perhaps only run once, to help us make a design decision empirically - would be sufficient. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140006#comment-17140006 ] Julian Hyde commented on CALCITE-3786: -- [~vlsi] I am asking that each person - the contributor and the reviewers - is respectful of the other's time. If you had been able to say "I think I can fix those plan changes, but it will take considerable effort on my part. If I do it, will you accept it?" then it would have saved your time. Danny does not have to agree with you, but he must not waste your time. The same goes for performance benchmarks. If Danny is never going to be able to fulfill your criteria, let's all realize that as soon as possible. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139390#comment-17139390 ] Vladimir Sitnikov commented on CALCITE-3786: [~julianhyde], I agree that we should not nitpick the fine details within benchmarks. I agree we should not raise the bar again and again. However, all the comments are not something advanced. They are very basic requests. I just ask to make the benchmark to measure the thing the patch touches. I don't even ask to implement multiple different benchmarks. Performance in this PR is very important (even in the case being able to customize digest is important as well), so I would expect that there should be reasonable tests before the PR is merged. -- The current benchmark measures the wrong thing: it uses the same object instances to lookup in the cache. In typical rule execution, one creates new (!) rels with a relbuilder, then they register the result in the planner. --- I could expect the following benchmarks: 1) Overall execution of the test suite (e.g. capture timings from slow_tests in CI). Note: the current PR seems to perform early normalization of RexNodes, which alone might shift the response time significantly. 2) Planner#registerRel execution time. For instance: create a planner, register same rels there. Then the benchmark would create new (!) rels and call planner.register (or whatever it is named). That benchmark code would be the same in both Calcite versions (original and patched). The measurement values should be "response time", and "allocation rate". 3) Digest computation time. This might be something like "create rel and compute its digest". The measurement values should be "response time", and "allocation rate". This might be interesting for fine-tuning digset implementation (e.g. to eliminate excessive array allocations), however, digest is cached, so "digest computation" benchmark alone provides very little information on the impact of the fix on the overall performance. 4) Memory footprint. It is probably to be measured with Java Object Layout. For instance, we create a relnode instance, and weight the object graph. The benchmark in https://github.com/apache/calcite/pull/2034 looks like #3, however, there are major flaws. {quote} I think we now need to stop asking him to jump over higher and higher hurdles{quote} It is sad to receive "-1 because it results in lots of irrelevant plan changes" when I suggest "early RexCall digest normalization". That is exactly "asking to jump over higher and higher hurdles". I had to invent and implement a more complicated approach. It is really sad to see that the very same contributor commits the patch that performs that early normalization (with all the "irrelevant plan changes") with not a single blink. Note: I do not complain that the PR removes "my" code. I complain that there's no acknowledgment that my original proposal (normalize RexCall in the constructor) was viable from the very beginning. --- I have no time to work on Calcite for the following 14 days, and I have no time to write benchmarks or even analyze the results. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139268#comment-17139268 ] Danny Chen commented on CALCITE-3786: - [~vladimirsitnikov] I used the GC profiler and here is the test data: {code:xml} Benchmark (digestType) (disjunctions) (joins) Mode Cnt ScoreError Units DigestBenchmark.getRelOBJECT 1 1 avgt5 0.082 ± 0.010 us/op DigestBenchmark.getRelOBJECT 1 10 avgt5 0.380 ± 0.025 us/op DigestBenchmark.getRelOBJECT 1 20 avgt5 0.732 ± 0.077 us/op DigestBenchmark.getRelOBJECT 10 1 avgt5 0.081 ± 0.010 us/op DigestBenchmark.getRelOBJECT 10 10 avgt5 0.364 ± 0.022 us/op DigestBenchmark.getRelOBJECT 10 20 avgt5 0.697 ± 0.046 us/op DigestBenchmark.getRelOBJECT 100 1 avgt5 0.081 ± 0.008 us/op DigestBenchmark.getRelOBJECT 100 10 avgt5 0.359 ± 0.025 us/op DigestBenchmark.getRelOBJECT 100 20 avgt5 0.726 ± 0.090 us/op DigestBenchmark.getRelSTRING 1 1 avgt5 1.269 ± 0.035 us/op DigestBenchmark.getRelSTRING 1 10 avgt5 10.609 ± 0.146 us/op DigestBenchmark.getRelSTRING 1 20 avgt5 28.708 ± 0.810 us/op DigestBenchmark.getRelSTRING 10 1 avgt5 1.365 ± 0.073 us/op DigestBenchmark.getRelSTRING 10 10 avgt5 10.640 ± 0.107 us/op DigestBenchmark.getRelSTRING 10 20 avgt5 28.171 ± 0.612 us/op DigestBenchmark.getRelSTRING 100 1 avgt5 1.354 ± 0.083 us/op DigestBenchmark.getRelSTRING 100 10 avgt5 11.583 ± 5.685 us/op DigestBenchmark.getRelSTRING 100 20 avgt5 27.828 ± 0.343 us/op Benchmark (digestType) (disjunctions) (joins) Mode Cnt Score Error Units DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 1 1 avgt5 ≈ 10⁻⁴ B/op DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 1 10 avgt5 0.005 ± 0.001B/op DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 1 20 avgt5 0.020 ± 0.002B/op DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 10 1 avgt5 0.001 ± 0.001B/op DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 10 10 avgt5 0.006 ± 0.001B/op DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 10 20 avgt5 0.022 ± 0.001B/op DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 100 1 avgt5 0.003 ± 0.001B/op DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 100 10 avgt5 0.018 ± 0.001B/op DigestBenchmark.getRel:·gc.alloc.rate.normOBJECT 100 20 avgt5 0.047 ± 0.006B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 1 1 avgt5 1840.004 ± 0.001B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 1 10 avgt5 8568.145 ± 0.002B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 1 20 avgt5 16008.839 ± 0.022B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 10 1 avgt5 1960.009 ± 0.001B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 10 10 avgt5 8568.180 ± 0.003B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 10 20 avgt5 16008.913 ± 0.018B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 100 1 avgt5 1960.054 ± 0.003B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 100 10 avgt5 8568.577 ± 0.284B/op DigestBenchmark.getRel:·gc.alloc.rate.normSTRING 100 20 avgt5 16009.819 ± 0.024B/op {code} > Add Digest interface to enable efficient hashCode(equals) for RexNode and
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139099#comment-17139099 ] Chunwei Lei commented on CALCITE-3786: -- [~julianhyde], I don't agree with you. The owner should provide a convincing benchmark, not the reviewers. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138896#comment-17138896 ] Julian Hyde commented on CALCITE-3786: -- [~vlsi], Danny did as we asked, and created a benchmark. I think we now need to stop asking him to jump over higher and higher hurdles. If his benchmark is unsatisfactory, I think you should modify it and run it so that it can answer the questions that you have on your mind. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138446#comment-17138446 ] Vladimir Sitnikov commented on CALCITE-3786: {quote} because it is the most straight-forward metric to illustrate the memory usage{quote} It is not. "getRuntime().totalMemory" has very little in common with the benchmark itself, and it depends on the GC heuristics. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138427#comment-17138427 ] Danny Chen commented on CALCITE-3786: - Thanks, [~vladimirsitnikov] and [~zabetak] ~ I have addressed the comments ~ [1] > Can you please clarify what is the number of objects in DigestToRelMap? They are the rel and all it's input's digest to node mapping, in order to simulate the planner node #register(I know there is also RelSubSet node there, but the affect should be the same). I have changed the level from Level.Invocation to Level.Iteration. Here is the latest data: The diff of performance: {code:java} Benchmark (digestType) (disjunctions) (joins) Mode Cnt Score Error Units DigestBenchmark.getRelOBJECT 11 avgt5 0.123 ± 0.004 us/op DigestBenchmark.getRelOBJECT 1 10 avgt5 0.447 ± 0.023 us/op DigestBenchmark.getRelOBJECT 1 20 avgt5 0.868 ± 0.085 us/op DigestBenchmark.getRelOBJECT 101 avgt5 0.126 ± 0.014 us/op DigestBenchmark.getRelOBJECT 10 10 avgt5 0.459 ± 0.029 us/op DigestBenchmark.getRelOBJECT 10 20 avgt5 0.920 ± 0.147 us/op DigestBenchmark.getRelOBJECT 1001 avgt5 0.119 ± 0.008 us/op DigestBenchmark.getRelOBJECT 100 10 avgt5 0.452 ± 0.030 us/op DigestBenchmark.getRelOBJECT 100 20 avgt5 0.857 ± 0.109 us/op DigestBenchmark.getRelSTRING 11 avgt5 1.320 ± 0.049 us/op DigestBenchmark.getRelSTRING 1 10 avgt5 10.588 ± 0.088 us/op DigestBenchmark.getRelSTRING 1 20 avgt5 27.863 ± 0.320 us/op DigestBenchmark.getRelSTRING 101 avgt5 1.352 ± 0.028 us/op DigestBenchmark.getRelSTRING 10 10 avgt5 10.612 ± 0.286 us/op DigestBenchmark.getRelSTRING 10 20 avgt5 27.865 ± 1.627 us/op DigestBenchmark.getRelSTRING 1001 avgt5 1.467 ± 0.683 us/op DigestBenchmark.getRelSTRING 100 10 avgt5 10.738 ± 0.075 us/op DigestBenchmark.getRelSTRING 100 20 avgt5 28.211 ± 0.449 us/op {code} The diff of memory usage: {code:java} Benchmark (digestType) (disjunctions) (joins) Mode Cnt Score Error Units DigestBenchmark.getRel:Max memory heapOBJECT 11 avgt5 228065280.000 bytes DigestBenchmark.getRel:Max memory heapOBJECT 1 10 avgt5 211812352.000 bytes DigestBenchmark.getRel:Max memory heapOBJECT 1 20 avgt5 215482368.000 bytes DigestBenchmark.getRel:Max memory heapOBJECT 101 avgt5 239599616.000 bytes DigestBenchmark.getRel:Max memory heapOBJECT 10 10 avgt5 218628096.000 bytes DigestBenchmark.getRel:Max memory heapOBJECT 10 20 avgt5 257949696.000 bytes DigestBenchmark.getRel:Max memory heapOBJECT 1001 avgt5 258998272.000 bytes DigestBenchmark.getRel:Max memory heapOBJECT 100 10 avgt5 211812352.000 bytes DigestBenchmark.getRel:Max memory heapOBJECT 100 20 avgt5 213385216.000 bytes DigestBenchmark.getRel:Max memory heapSTRING 11 avgt5 300417024.000 bytes DigestBenchmark.getRel:Max memory heapSTRING 1 10 avgt5 262144000.000 bytes DigestBenchmark.getRel:Max memory heapSTRING 1 20 avgt5 242745344.000 bytes DigestBenchmark.getRel:Max memory heapSTRING 101 avgt5 317194240.000 bytes DigestBenchmark.getRel:Max memory heapSTRING 10 10 avgt5 273154048.000 bytes DigestBenchmark.getRel:Max memory heapSTRING 10 20 avgt5 258473984.000 bytes DigestBenchma
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138340#comment-17138340 ] Stamatis Zampetakis commented on CALCITE-3786: -- Thanks for the benchmark [~danny0405]. One thing that caught my eye is the use of {{Level.INVOCATION}} for setting up the state. As indicated in the [javadoc|http://javadox.com/org.openjdk.jmh/jmh-core/1.7/org/openjdk/jmh/annotations/Level.html#Invocation], it is better to avoid using this option. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138302#comment-17138302 ] Vladimir Sitnikov commented on CALCITE-3786: {quote}I used the max used heap mem as the metric, is there better way to compare the memory there ?{quote} 1) gc.alloc.rate.norm -- that is the metric for "allocation rate per benchmark execution" 2) The memory consumption (e.g. the number of bytes a rel consumes in heap) is to be measured with Eclipse Memory Analyzer or with https://openjdk.java.net/projects/code-tools/jol/ {quote} is an impressive improvement(20x) for performance{quote} Can you please clarify what is the number of objects in {{DigestToRelMap}}? What is the purpose of StringDigest? I guess it should not impact much, but it looks excessive, and it was not there in the old implementation. The benchmark might suffer from dead code elimination, please follow http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_08_DeadCode.java Please add a test case when the rel is built dynamically. With calcite, we often build a relation, and then it is queried in the cache. It looks like the current benchmark measures use the same instances for the initial data population and for the measurements. Please prefer shorter benchmark names and benchmark parameter names. The current result table is very wide, and it is hard to follow. For instance: isStringDigest could be refactored into {code:java} enum DigestType { STRING, OBJECT } @Param DigestType digest; {code} > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138289#comment-17138289 ] Danny Chen commented on CALCITE-3786: - Hi, [~vladimirsitnikov] ~ I write a benchmark there [1] to compare the performance and memory usage diff between the pure string digest and the new Digest structure. {code:xml} Benchmark (isStringDigest) (joins) (whereClauseDisjunctions) Mode Cnt Score Error Units DigestBenchmark.getRelFromDigestToRelMap false 1 1 avgt5 0.113 ± 0.009 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 1 1 avgt5 376963072.000 bytes DigestBenchmark.getRelFromDigestToRelMap false 1 10 avgt5 0.146 ± 0.029 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 1 10 avgt5 346554368.000 bytes DigestBenchmark.getRelFromDigestToRelMap false 1100 avgt5 0.138 ± 0.014 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 1100 avgt5 348127232.000 bytes DigestBenchmark.getRelFromDigestToRelMap false 10 1 avgt5 0.452 ± 0.041 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 10 1 avgt5 397934592.000 bytes DigestBenchmark.getRelFromDigestToRelMap false 10 10 avgt5 0.450 ± 0.050 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 10 10 avgt5 383254528.000 bytes DigestBenchmark.getRelFromDigestToRelMap false 10100 avgt5 0.452 ± 0.085 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 10100 avgt5 353894400.000 bytes DigestBenchmark.getRelFromDigestToRelMap false 20 1 avgt5 0.819 ± 0.239 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 20 1 avgt5 327155712.000 bytes DigestBenchmark.getRelFromDigestToRelMap false 20 10 avgt5 0.814 ± 0.123 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 20 10 avgt5 427819008.000 bytes DigestBenchmark.getRelFromDigestToRelMap false 20100 avgt5 0.844 ± 0.218 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap false 20100 avgt5 366477312.000 bytes {code} {code:xml} Benchmark (isStringDigest) (joins) (whereClauseDisjunctions) Mode Cnt Score Error Units DigestBenchmark.getRelFromDigestToRelMap true 1 1 avgt5 1.797 ± 0.218 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap true 1 1 avgt5 412090368.000 bytes DigestBenchmark.getRelFromDigestToRelMap true 1 10 avgt5 1.824 ± 0.147 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap true 1 10 avgt5 405274624.000 bytes DigestBenchmark.getRelFromDigestToRelMap true 1100 avgt5 2.109 ± 0.453 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap true 1100 avgt5 402653184.000 bytes DigestBenchmark.getRelFromDigestToRelMap true 10 1 avgt5 12.118 ± 0.113 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap true 10 1 avgt5 346030080.000 bytes DigestBenchmark.getRelFromDigestToRelMap true 10 10 avgt5 12.231 ± 0.807 us/op DigestBenchmark.getRelFromDigestToRelMap:Max memory heap true 10
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137805#comment-17137805 ] Laurent Goujon commented on CALCITE-3786: - {quote} Laurent Goujon It seems that you have some promotion ideas for the Digest code, feel free to do that, i tried to answer each message you left in the PR, hope it helps. But i would not revert the PR for no other strong reason, sorry. {quote} The thing is that you left no time to answer back (sorry, I'm not working over the weekend), or did you send a courtesy check on either the pull request and the jira issue to see people got their questions answers/concerns alleviated. We usually try to reach a consensus on issues and code changes, but it doesn't seem to be the case here. And it's a bit disappointed you are not ready to revert temporarily your change unless there are strong reason whereas (ignoring my comments/concerns on github) the lack of concrete evaluation regarding memory usage and performance impact seems quite a strong reason to do so... > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136557#comment-17136557 ] Vladimir Sitnikov commented on CALCITE-3786: {quote}I would try to give a mem benchmark then ~{quote} Would you please compare the performance as well? It would be sad if the new approach is significantly slower. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136547#comment-17136547 ] Stamatis Zampetakis commented on CALCITE-3786: -- Hey [~danny0405] thanks for working on this. For the moment I am not going to comment on the approach will do that later if there are still disagreements. In the meantime, multiple people have requested to wait a bit and not commit the PR yet so I find it fair to revert the commit for the time being. I know that you have spent quite some time on it but reviewers are also investing their own. We are not in a rush to get a release out so I don't see a big problem keeping this off-master a bit more. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136473#comment-17136473 ] Danny Chen commented on CALCITE-3786: - Thanks [~julianhyde] and [~laurent] for the feedback ~ Compared to the old code this patch does these things: - Use #equals to compare 2 RexCalls so that we can avoid the OOM like CALCITE-3784 - Use cached hashCode for quick #equals of the digest, it also replace the pure string comparison with object equals - Include the hints in the digest - Makes the RexCall normalization more general, more cases are normalized So in general it definitely do not make the memory consumption bigger. [~laurent] It seems that you have some promotion ideas for the Digest code, feel free to do that, i tried to answer each message you left in the PR, hope it helps. But i would not revert the PR for no other strong reason, sorry. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136290#comment-17136290 ] Julian Hyde commented on CALCITE-3786: -- I agree with [~laurent]. It doesn't look as if we've reached consensus yet. In particular, [~hyuan] gave a "-1", which I interpret not as a veto but as a "you've not convinced me yet". I have been quiet on this issue because [~danny0405], [~vlsi] and [~hyuan] seem to be representing the sides of the argument very well. But I would like to see the discussion arrive at consensus. If there is disagreement, let's find an empirical way to settle the agreement -- e.g. a benchmark. [~danny0405], I think you should revert the commit, re-open this case, and continue the discussion. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136282#comment-17136282 ] Laurent Goujon commented on CALCITE-3786: - I'm not sure I understand why the change has been merged already: there has been questions/concerns on both JIRA and Github about the change. Is it possible to revert it and address those? Personally, considering the impact, I would like to see a more concrete benchmark on the memory impact. I have for example some reservation about the use of a list of pair of string/objects in digest for things which are most likely fields of the {{RelNode}}. And there's also the digest String field, initialized lazily (but not always?), but which is not sensible to memory pressure for example: could we for example let this string be gc'ed if there's not enough memory at the time even it means recomputing it? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Fix For: 1.24.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133914#comment-17133914 ] Danny Chen commented on CALCITE-3786: - [~vladimirsitnikov] Thanks for the background sharing, if we do the normalization during planning already, it would be better that we made the thing more explicit, not an implicit contract there. That is to say, i prefer to keep sync the plan structure same with the real RexTree, a normalization on creation seems better. We can have a control flag three just like the Rex simplification, but the default value should be true. Personally i still question about the gains of Rex normalization because it makes the RexNode code complex, and there is an implicit contract there, we have already made decision that Rex normalization is the way to go, so let's make the impl better if we can. [~hyuan] I can thought of another benefit to use Digest, downstream projects need only implement the #explainTerms instead of additional #hashCode and #equals, which are both error-prone. As for the additional mem consumption for object references, assume 100 bytes for a Digest there, and 1 rel nodes, the total memory should be 1Mb, which i think is acceptable, so i would choose a more concise interface, a Digest behind the #explainTerms. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133799#comment-17133799 ] Haisheng Yuan commented on CALCITE-3786: {quote} Even we are with these additional object references, then how much join nodes during the planning ? They do not expect to be the cause of the OOM. What causes the OOM is the digest strings, especially for complex RexCalls, before this patch, their string format digest were always patched up in the digest of the RelNode, which would be big memory comsuption. {quote} I am not saying this proposal causes the OOM, instead it reduces memory usage comparing with the string digest, as long as we stop using string digest for RexNode, OOM issue like 3784 will ease a lot. However, the so-called Digest still waste a lot of memory, unnecessarily. If you think Join node is not that many, how about Project? Try to compute the Digest size of Project, it may consume much more memory. {quote} Why do you think we should use #hashCode and #equals to decide that two relational expression are semantically equivalent ? Or i asked it in different way, if two relational expression are equivalent, should they be equals to each other ? (For example, 2 projects differs only with field aliases.) {quote} Why not? Although this is not mandatory, it is so intuitive and natural. Two Project with different aliases can be equal with each other, as long as they produce the same hashcode, we treat them as equal object, in MEMO we can use 1 instance to represent the other one. {quote} BTW, even if RexCall are new object, use the equals to compare would solve the problem. {quote} Exactly, that is the problem. It is deep comparison, which may cause performance issue for large complex queries. Note that different rules can generate lots of intermediate, equal RelNode and RexNode but different instances, especially when Project/Filter/Calc merge happens on physical operator in VolcanoPlanner. BTW, does anyone know that [VoltDB|https://www.voltdb.com/company/why-voltdb/], an in-memory OLTP RDBMS, is [experimenting|https://github.com/VoltDB/voltdb/tree/master/src/frontend/org/voltdb/plannerv2] on integrating Calcite into its system? If Calcite wants to be versatile for OLTP, OLAP, Batch, Stream, every bit of memory counts, and every millisecond counts. I am casting "-1" on this proposal, the justification has been provided in this and previous comments. I am just expressing objection on this idea, I don't have rights to prevent anyone from doing anything, though. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133205#comment-17133205 ] Vladimir Sitnikov commented on CALCITE-3786: >so that we can finally compare 2 RexCall using member equals, and it works >well. 1) There are code parts that still use #toString() for the comparison (e.g. RexSimplify) 2) I had exactly that suggestion (normalize RexNodes on creation), however, you have cast -1 multiple times because that would alter string representation of the plans with little gain. [https://lists.apache.org/thread.html/d89103671efdd813fac768fbc2336bd125e925f0790e9137a2a16375%40%3Cdev.calcite.apache.org%3E] My suggestion was Vladimir: It turned out "b" (sort operands in computeDigest) is easier to implement. Vladimir: I've filed a PR: [https://github.com/apache/calcite/pull/1703 ] and your response was "I’m strongly -1 for this way, because it beaks the plan test where almost all of the change are meaningless." Note: PR1703 was updated since that discussion, and the actual implementation normalizes nodes at the planning time only, while the user-visible plans are displayed without normalization. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133190#comment-17133190 ] Danny Chen commented on CALCITE-3786: - [~vladimirsitnikov] I tried to normalize the RexCall in its constructor, so that we can finally compare 2 RexCall using member equals, and it works well. But i also got 54 plan changes which are all operands order change [1], it this as expected ? [1] https://github.com/apache/calcite/pull/2016/commits/99f55001923eb1618268cfb9d0d645a86eb2ceff > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132944#comment-17132944 ] Danny Chen commented on CALCITE-3786: - bq. Let's use Join operator as an example Even we are with these additional object references, then how much join nodes during the planning ? They do not expect to be the cause of the OOM. What causes the OOM is the digest strings, especially for complex RexCalls, before this patch, their string format digest were always patched up in the digest of the RelNode, which would be big memory comsuption. bq. Do we really need a tool to unify the logic? Why do you think we should use #hashCode and #equals to decide that two relational expression are semantically equivalent ? Or i asked it in different way, if two relational expression are equivalent, should they be equals to each other ? (For example, 2 projects differs only with field aliases.) bq. Did you try to debug it? It is true that when you copy the RexCall it just pass the reference, but after column pruning or project transpose, the RexCall might be a complete new object, No, if you have an example, can you share so we can see how much memory the planning would use. BTW, even if RexCall are new object, use the equals to compare would solve the problem. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132926#comment-17132926 ] Jin Xing commented on CALCITE-3786: --- In my environment, some users use program to generate big queries (about 10k lines) and suffers memory issue a lot. The reuse of instance is the pain point and I agree with [~hyuan] that a thorough fix is to have the ability of deduplication inside planner. Really hope we can move forward on this direction. The approach proposed in the patch ease the memory issue and provide a lighter way to do shallow comparison by cached hashcode. It's very good and I can feel the benefit but I have the same concern – – does the cached hashcode has to be bound with Digest ? > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132916#comment-17132916 ] Haisheng Yuan commented on CALCITE-3786: {quote} That's not true, for RelNode we only bookeep its class name and id, for rex node only an additional object reference. {quote} Let's use Join operator as an example. It has 11 members (without the original string digest): left, right, rowType, cluster, id, traitset, condition, variablesSet, hints, joinType, joinInfo. Suppose all by reference they will use 44 bytes shallow heap size. The digest of Join has: hashCode, rel, digest, items. The reference use 16 bytes. And items is a list with 10 elements (the default size is 10), uses 40 bytes. Inside the items array list, it has 4 pairs \{"left", leftinput\}, \{"right", rightinput\}, \{"condition", condition\}, \{"joinType", jointype\}, which use 32 bytes. If the relnode is registered inside VolcanoPlanner, the left and right inputs are subsets, which will be converted to string with set_ID+traitset, which can take more than 10 bytes for a single input. The Digest itself can use more than 100 bytes. {quote} The digest behaves like a tool to unify the logic. We can also let each rel node to handle its "digest" but the code would be a mess. {quote} Do we really need a tool to unify the logic? It sounds like forced marriage. The reality is that with digest it is a mess. We can't even use an object's hashcode and equals to determine it is equals with another object or not, but rather need to override explainTerms(), isn't it counter-intuitive? {quote} They are different instances, but all the operators expect to be a singleton, and each time we copy the RexCalls in the rule, we just pass around the object reference. {quote} Did you try to debug it? It is true that when you copy the RexCall it just pass the reference, but after column pruning or project transpose, the RexCall might be a complete new object, even there are equivalent rexcall already exist. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132839#comment-17132839 ] Danny Chen commented on CALCITE-3786: - > However, the digest itself is just duplicate each field of the RelNode and > RexNode again, which still double memory usage of RelNode and RexNode. That's not true, for RelNode we only bookeep its class name and id, for rex node only an additional object reference. > But what is the proposal's advantage over just removing the digest from > RelNode and RexNode completely and caching the hashcode only? The digest behaves like a tool to unify the logic. We can also let each rel node to handle its "digest" but the code would be a mess. > After query parser, the two RexCalls a > 10 are different instances They are different instances, but all the operators expect to be a singleton, and each time we copy the RexCalls in the rule, we just pass around the object reference. I believe the rules that split the RexCalls would be few, with the fact that we just copy new references. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132573#comment-17132573 ] Haisheng Yuan commented on CALCITE-3786: The digest of RelNode in the patch does reuse RexNode parts, but the RexNode doesn't, which can still result in OOM like CALCITE-3784. The reuse of RexNode is the true pain point. If the RexNode also changes its Digest as in the patch, it can solve the OOM, because the digest (unlike digest string) size won't grow exponentially. However, the digest itself is just duplicate each field of the RelNode and RexNode again, which still double memory usage of RelNode and RexNode. I guess everyone knows that the digest is mainly used to check whether there is equivalent node in the planner quickly, without deep comparison. If we agree to discard the shallow comparison, I am totally fine with that. But what is the proposal's advantage over just removing the digest from RelNode and RexNode completely and caching the hashcode only? In addition, for queries with complex and duplicate expressions, e.g. (a > 10 and b < 1) or (a > 10 and b > 5) ... After query parser, the two RexCalls {{a > 10}} are different instances, in real case that can be arbitrary complex expressions, and repeat many times. And after rule transformations, like column pruning, all the RexNodes will become distinct instances and many of them duplicate again. This causes a lot of memory waste for large queries. With the so-called Digest in the proposal, it can consume less memory than the original string digest, still doubles the actual memory of RelNode and RexNode, but at the same time loses the ability of shallow comparison. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130580#comment-17130580 ] Danny Chen commented on CALCITE-3786: - [~hyuan] ~ 1. The new implementation already reuse the expression parts 2. the Digest#toString is only used for debugging Of course you can give a -1, but i'm sorry i would ignore that if there was no specific design, i'm not sure if the new design solves the OOM but at least this patch should not be a blocker for that, we can switch to the new design at any time if we are ready for that. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130310#comment-17130310 ] Vladimir Sitnikov commented on CALCITE-3786: [~hyuan], 1) The digest should be able to reuse expression parts, so it should not result in OOM like in 3784 2) toString implementation might track a set of already printed expressions, so the overall result would be shorter, thus it would avoid OOM as well. Well, we can add de-duplication in the current toString as well, however, it might have unexpected performance issues. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130096#comment-17130096 ] Haisheng Yuan commented on CALCITE-3786: I am sorry for disappointing you, I should have explicitly -1 on the proposal before you started working on this. It has nothing to do RexNode memo stuff, I am just against the idea in this proposal, which sounds like someone feels headache after banging his head into the wall, the doctor wrapped the patient's head with thick bandage, and gave the advice: bang your head slightly next time. The proposal can ease the issue to some extent by reusing digest, but doesn't solve the underlying problem fundamentally that RexNode can't be deduplicated inside planner. You think the digest in current code base works well, then how do you think CALCITE-3784? Maybe no one would run a single 20k lines analytical query or 60 table joins on Flink, because it is real time streaming platform. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129928#comment-17129928 ] Danny Chen commented on CALCITE-3786: - Thanks [~hyuan] for the share, the digest was always there, and it works well for current code base, this patch is only a promotion to the original pure string format digest, because we need to consider many properties including in the "digest" (e.g. the row type sans field names, the hints), a new interface makes the thing easier. As for the rex node memo, i expect a design doc there, before we moving forward to that, a digest interface is a good choice. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3786) Add Digest interface to enable efficient hashCode(equals) for RexNode and RelNode
[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129415#comment-17129415 ] Haisheng Yuan commented on CALCITE-3786: I noticed Danny opened a pull request, thanks for taking this, Danny! But frankly speaking, I am still against this idea, digest should not be used at all. The reason remains unchanged. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > - > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core >Affects Versions: 1.21.0 >Reporter: Vladimir Sitnikov >Assignee: Danny Chen >Priority: Major > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)