[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577211#comment-16577211 ] Apache Spark commented on SPARK-25084: -- User 'LantaoJin' has created a pull request for this issue: https://github.com/apache/spark/pull/22077 > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Assignee: yucai >Priority: Blocker > Fix For: 2.4.0 > > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk) limit 1;{code} > Exception: > {code:java} > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 131, Column 67: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 131, Column 67: One of ', )' expected instead of '[' > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1435) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1497) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1494) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342){code} > Wrong Codegen: > {code:java} > /* 131 */ private int computeHashForStruct_1(InternalRow > mutableStateArray[0], int value1) { > /* 132 */ > /* 133 */ > /* 134 */ if (!mutableStateArray[0].isNullAt(5)) { > /* 135 */ > /* 136 */ final int element5 = mutableStateArray[0].getInt(5); > /* 137 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element5, value1); > /* 138 */ > /* 139 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576417#comment-16576417 ] yucai commented on SPARK-25084: --- [~smilegator][~jerryshao] Thanks a lot for marking it blocker. A lot of eBay's tables use "distribute by" or "cluster by", it is important for us to move to Spark 2.3. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Blocker > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk) limit 1;{code} > Exception: > {code:java} > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 131, Column 67: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 131, Column 67: One of ', )' expected instead of '[' > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1435) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1497) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1494) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342){code} > Wrong Codegen: > {code:java} > /* 131 */ private int computeHashForStruct_1(InternalRow > mutableStateArray[0], int value1) { > /* 132 */ > /* 133 */ > /* 134 */ if (!mutableStateArray[0].isNullAt(5)) { > /* 135 */ > /* 136 */ final int element5 = mutableStateArray[0].getInt(5); > /* 137 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element5, value1); > /* 138 */ > /* 139 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575860#comment-16575860 ] Lantao Jin commented on SPARK-25084: I offer other fix way. https://github.com/apache/spark/pull/22067 It doesn't need "input" as a global variable (If distribute by random) > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Blocker > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk) limit 1;{code} > Exception: > {code:java} > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 131, Column 67: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 131, Column 67: One of ', )' expected instead of '[' > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1435) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1497) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1494) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342){code} > Wrong Codegen: > {code:java} > /* 131 */ private int computeHashForStruct_1(InternalRow > mutableStateArray[0], int value1) { > /* 132 */ > /* 133 */ > /* 134 */ if (!mutableStateArray[0].isNullAt(5)) { > /* 135 */ > /* 136 */ final int element5 = mutableStateArray[0].getInt(5); > /* 137 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element5, value1); > /* 138 */ > /* 139 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575856#comment-16575856 ] Apache Spark commented on SPARK-25084: -- User 'LantaoJin' has created a pull request for this issue: https://github.com/apache/spark/pull/22067 > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Blocker > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk) limit 1;{code} > Exception: > {code:java} > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 131, Column 67: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 131, Column 67: One of ', )' expected instead of '[' > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1435) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1497) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1494) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342){code} > Wrong Codegen: > {code:java} > /* 131 */ private int computeHashForStruct_1(InternalRow > mutableStateArray[0], int value1) { > /* 132 */ > /* 133 */ > /* 134 */ if (!mutableStateArray[0].isNullAt(5)) { > /* 135 */ > /* 136 */ final int element5 = mutableStateArray[0].getInt(5); > /* 137 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element5, value1); > /* 138 */ > /* 139 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575808#comment-16575808 ] yucai commented on SPARK-25084: --- It is a regression, when the generated codes size is more than 1024, newer Spark will split it into many functions, but the function definition is wrong, like below: {code:java} private int computeHashForStruct_0(InternalRow mutableStateArray[0], int value1) { {code} In the older version, like 2.1.0, it does not split function, so it has no this issue. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Blocker > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575791#comment-16575791 ] Xiao Li commented on SPARK-25084: - Could you investigate which PR introduced this bug? What is the error message? > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Blocker > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575790#comment-16575790 ] Xiao Li commented on SPARK-25084: - Let us mark it as a blocker. How about the master branch? Does it work? > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575778#comment-16575778 ] Saisai Shao commented on SPARK-25084: - I see. Unfortunately I've cut the RC4, if it worth to include in 2.3.2, I will cut a new RC. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575777#comment-16575777 ] Yuming Wang commented on SPARK-25084: - It's a regression. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575774#comment-16575774 ] Saisai Shao commented on SPARK-25084: - Is this a regression or just a bug existed in old version? > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575772#comment-16575772 ] Saisai Shao commented on SPARK-25084: - I'm already preparing new RC4. If this is not a severe issue, I would not block the RC4 release. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575771#comment-16575771 ] Yuming Wang commented on SPARK-25084: - [~smilegator], [~jerryshao] I think It should be target 2.3.2. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575767#comment-16575767 ] Apache Spark commented on SPARK-25084: -- User 'yucai' has created a pull request for this issue: https://github.com/apache/spark/pull/22066 > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org