[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575778#comment-16575778 ] Saisai Shao commented on SPARK-25084: - I see. Unfortunately I've cut the RC4, if it worth to include in 2.3.2, I will cut a new RC. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575777#comment-16575777 ] Yuming Wang commented on SPARK-25084: - It's a regression. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575774#comment-16575774 ] Saisai Shao commented on SPARK-25084: - Is this a regression or just a bug existed in old version? > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25067) Active tasks exceed total cores of an executor in WebUI
[ https://issues.apache.org/jira/browse/SPARK-25067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] StanZhai updated SPARK-25067: - Attachment: (was: 1533128203469_2.png) > Active tasks exceed total cores of an executor in WebUI > --- > > Key: SPARK-25067 > URL: https://issues.apache.org/jira/browse/SPARK-25067 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.2.2, 2.3.0, 2.3.1 >Reporter: StanZhai >Priority: Major > Attachments: WechatIMG1.jpeg > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575772#comment-16575772 ] Saisai Shao commented on SPARK-25084: - I'm already preparing new RC4. If this is not a severe issue, I would not block the RC4 release. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575771#comment-16575771 ] Yuming Wang commented on SPARK-25084: - [~smilegator], [~jerryshao] I think It should be target 2.3.2. > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25084: Assignee: Apache Spark > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Assignee: Apache Spark >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575767#comment-16575767 ] Apache Spark commented on SPARK-25084: -- User 'yucai' has created a pull request for this issue: https://github.com/apache/spark/pull/22066 > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25084: Assignee: (was: Apache Spark) > "distribute by" on multiple columns may lead to codegen issue > - > > Key: SPARK-25084 > URL: https://issues.apache.org/jira/browse/SPARK-25084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: yucai >Priority: Major > > Test Query: > {code:java} > select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, > ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, > ss_net_profit) limit 1000;{code} > Wrong Codegen: > {code:java} > /* 146 */ private int computeHashForStruct_0(InternalRow > mutableStateArray[0], int value1) { > /* 147 */ > /* 148 */ > /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { > /* 150 */ > /* 151 */ final int element = mutableStateArray[0].getInt(0); > /* 152 */ value1 = > org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); > /* 153 */ > /* 154 */ }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue
yucai created SPARK-25084: - Summary: "distribute by" on multiple columns may lead to codegen issue Key: SPARK-25084 URL: https://issues.apache.org/jira/browse/SPARK-25084 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: yucai Test Query: {code:java} select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_addr_sk, ss_promo_sk, ss_ext_list_price, ss_net_profit) limit 1000;{code} Wrong Codegen: {code:java} /* 146 */ private int computeHashForStruct_0(InternalRow mutableStateArray[0], int value1) { /* 147 */ /* 148 */ /* 149 */ if (!mutableStateArray[0].isNullAt(0)) { /* 150 */ /* 151 */ final int element = mutableStateArray[0].getInt(0); /* 152 */ value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element, value1); /* 153 */ /* 154 */ }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23992) ShuffleDependency does not need to be deserialized every time
[ https://issues.apache.org/jira/browse/SPARK-23992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575747#comment-16575747 ] Apache Spark commented on SPARK-23992: -- User '10110346' has created a pull request for this issue: https://github.com/apache/spark/pull/22065 > ShuffleDependency does not need to be deserialized every time > - > > Key: SPARK-23992 > URL: https://issues.apache.org/jira/browse/SPARK-23992 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: liuxian >Priority: Minor > > In the same stage, 'ShuffleDependency' is not necessary to be deserialized > each time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-25052) Is there any possibility that spark structured streaming generate duplicates in the output?
[ https://issues.apache.org/jira/browse/SPARK-25052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bharath kumar avusherla closed SPARK-25052. --- > Is there any possibility that spark structured streaming generate duplicates > in the output? > --- > > Key: SPARK-25052 > URL: https://issues.apache.org/jira/browse/SPARK-25052 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: bharath kumar avusherla >Priority: Minor > > We recently observed that the spark structured streaming generated duplicates > in the output when reading from Kafka topic and storing the output to the S3 > (and checkpointing in S3). We ran into this issue twice. This is not > reproducible. Is there anyone has ever faced this kind of issue before? Is > this because of S3 eventual consistency? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23243) Shuffle+Repartition on an RDD could lead to incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-23243: Labels: correctness (was: ) > Shuffle+Repartition on an RDD could lead to incorrect answers > - > > Key: SPARK-23243 > URL: https://issues.apache.org/jira/browse/SPARK-23243 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0, 2.1.0, 2.2.0, 2.3.0 >Reporter: Jiang Xingbo >Priority: Blocker > Labels: correctness > > The RDD repartition also uses the round-robin way to distribute data, this > can also cause incorrect answers on RDD workload the similar way as in > https://issues.apache.org/jira/browse/SPARK-23207 > The approach that fixes DataFrame.repartition() doesn't apply on the RDD > repartition issue, as discussed in > https://github.com/apache/spark/pull/20393#issuecomment-360912451 > We track for alternative solutions for this issue in this task. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575672#comment-16575672 ] Apache Spark commented on SPARK-25044: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/22063 > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule": > {code:java} > - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED *** > Results do not match for query: > ... > == Results == > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct<> struct > ![0,10,null] [0,10,0] > ![1,12,null] [1,12,1] > ![2,14,null] [2,14,2] (QueryTest.scala:163){code} > You can kind of get what's going on reading the test: > {code:java} > test("SPARK-24891 Fix HandleNullInputsForUDF rule") { > // assume(!ClosureCleanerSuite2.supportsLMFs) > // This test won't test what it intends to in 2.12, as lambda metafactory > closures > // have arg types that are not primitive, but Object > val udf1 = udf({(x: Int, y: Int) => x + y}) > val df = spark.range(0, 3).toDF("a") > .withColumn("b", udf1($"a", udf1($"a", lit(10 > .withColumn("c", udf1($"a", lit(null))) > val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed > comparePlans(df.logicalPlan, plan) > checkAnswer( > df, > Seq( > Row(0, 10, null), > Row(1, 12, null), > Row(2, 14, null))) > }{code} > > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25044: Assignee: (was: Apache Spark) > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule": > {code:java} > - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED *** > Results do not match for query: > ... > == Results == > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct<> struct > ![0,10,null] [0,10,0] > ![1,12,null] [1,12,1] > ![2,14,null] [2,14,2] (QueryTest.scala:163){code} > You can kind of get what's going on reading the test: > {code:java} > test("SPARK-24891 Fix HandleNullInputsForUDF rule") { > // assume(!ClosureCleanerSuite2.supportsLMFs) > // This test won't test what it intends to in 2.12, as lambda metafactory > closures > // have arg types that are not primitive, but Object > val udf1 = udf({(x: Int, y: Int) => x + y}) > val df = spark.range(0, 3).toDF("a") > .withColumn("b", udf1($"a", udf1($"a", lit(10 > .withColumn("c", udf1($"a", lit(null))) > val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed > comparePlans(df.logicalPlan, plan) > checkAnswer( > df, > Seq( > Row(0, 10, null), > Row(1, 12, null), > Row(2, 14, null))) > }{code} > > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25044: Assignee: Apache Spark > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Assignee: Apache Spark >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule": > {code:java} > - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED *** > Results do not match for query: > ... > == Results == > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct<> struct > ![0,10,null] [0,10,0] > ![1,12,null] [1,12,1] > ![2,14,null] [2,14,2] (QueryTest.scala:163){code} > You can kind of get what's going on reading the test: > {code:java} > test("SPARK-24891 Fix HandleNullInputsForUDF rule") { > // assume(!ClosureCleanerSuite2.supportsLMFs) > // This test won't test what it intends to in 2.12, as lambda metafactory > closures > // have arg types that are not primitive, but Object > val udf1 = udf({(x: Int, y: Int) => x + y}) > val df = spark.range(0, 3).toDF("a") > .withColumn("b", udf1($"a", udf1($"a", lit(10 > .withColumn("c", udf1($"a", lit(null))) > val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed > comparePlans(df.logicalPlan, plan) > checkAnswer( > df, > Seq( > Row(0, 10, null), > Row(1, 12, null), > Row(2, 14, null))) > }{code} > > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25083) remove the type erasure hack in data source scan
Wenchen Fan created SPARK-25083: --- Summary: remove the type erasure hack in data source scan Key: SPARK-25083 URL: https://issues.apache.org/jira/browse/SPARK-25083 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Wenchen Fan It's hacky to pretend a `RDD[ColumnarBatch]` to be a `RDD[InternalRow]`. We should make the type explicit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25076) SQLConf should not be retrieved from a stopped SparkSession
[ https://issues.apache.org/jira/browse/SPARK-25076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-25076: Fix Version/s: 2.3.2 > SQLConf should not be retrieved from a stopped SparkSession > --- > > Key: SPARK-25076 > URL: https://issues.apache.org/jira/browse/SPARK-25076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 2.3.2, 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24502) flaky test: UnsafeRowSerializerSuite
[ https://issues.apache.org/jira/browse/SPARK-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575642#comment-16575642 ] Wenchen Fan commented on SPARK-24502: - This is enough. The places you need to pay attention to is where you call `SparkSession.getActiveSession`. Make sure you can handle the case that the session is stopped. > flaky test: UnsafeRowSerializerSuite > > > Key: SPARK-24502 > URL: https://issues.apache.org/jira/browse/SPARK-24502 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: flaky-test > Fix For: 2.3.2, 2.4.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4193/testReport/org.apache.spark.sql.execution/UnsafeRowSerializerSuite/toUnsafeRow___test_helper_method/ > {code} > sbt.ForkMain$ForkError: java.lang.IllegalStateException: LiveListenerBus is > stopped. > at > org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97) > at > org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80) > at > org.apache.spark.sql.internal.SharedState.(SharedState.scala:93) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:120) > at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:119) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286) > at > org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42) > at > org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:95) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:94) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:126) > at > org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:54) > at > org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:157) > at > org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:150) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$unsafeRowConverter(UnsafeRowSerializerSuite.scala:54) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$toUnsafeRow(UnsafeRowSerializerSuite.scala:49) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:63) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:60) > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24886) Increase Jenkins build time
[ https://issues.apache.org/jira/browse/SPARK-24886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24886. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21845 [https://github.com/apache/spark/pull/21845] > Increase Jenkins build time > --- > > Key: SPARK-24886 > URL: https://issues.apache.org/jira/browse/SPARK-24886 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 2.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 2.4.0 > > > Currently, looks we hit the time limit time to time. Looks better increasing > the time a bit. > For instance, please see https://github.com/apache/spark/pull/21822 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22236) CSV I/O: does not respect RFC 4180
[ https://issues.apache.org/jira/browse/SPARK-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575629#comment-16575629 ] Steve Loughran commented on SPARK-22236: I wouldn't recommend changing multiline=true by default precisely because the change is so traumatic: you'd never be able to partition a CSV file across >1 worker > CSV I/O: does not respect RFC 4180 > -- > > Key: SPARK-22236 > URL: https://issues.apache.org/jira/browse/SPARK-22236 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 2.2.0 >Reporter: Ondrej Kokes >Priority: Minor > > When reading or writing CSV files with Spark, double quotes are escaped with > a backslash by default. However, the appropriate behaviour as set out by RFC > 4180 (and adhered to by many software packages) is to escape using a second > double quote. > This piece of Python code demonstrates the issue > {code} > import csv > with open('testfile.csv', 'w') as f: > cw = csv.writer(f) > cw.writerow(['a 2.5" drive', 'another column']) > cw.writerow(['a "quoted" string', '"quoted"']) > cw.writerow([1,2]) > with open('testfile.csv') as f: > print(f.read()) > # "a 2.5"" drive",another column > # "a ""quoted"" string","""quoted""" > # 1,2 > spark.read.csv('testfile.csv').collect() > # [Row(_c0='"a 2.5"" drive"', _c1='another column'), > # Row(_c0='"a ""quoted"" string"', _c1='"""quoted"""'), > # Row(_c0='1', _c1='2')] > # explicitly stating the escape character fixed the issue > spark.read.option('escape', '"').csv('testfile.csv').collect() > # [Row(_c0='a 2.5" drive', _c1='another column'), > # Row(_c0='a "quoted" string', _c1='"quoted"'), > # Row(_c0='1', _c1='2')] > {code} > The same applies to writes, where reading the file written by Spark may > result in garbage. > {code} > df = spark.read.option('escape', '"').csv('testfile.csv') # reading the file > correctly > df.write.format("csv").save('testout.csv') > with open('testout.csv/part-csv') as f: > cr = csv.reader(f) > print(next(cr)) > print(next(cr)) > # ['a 2.5\\ drive"', 'another column'] > # ['a \\quoted\\" string"', '\\quoted\\""'] > {code} > The culprit is in > [CSVOptions.scala|https://github.com/apache/spark/blob/7d0a3ef4ced9684457ad6c5924c58b95249419e1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L91], > where the default escape character is overridden. > While it's possible to work with CSV files in a "compatible" manner, it would > be useful if Spark had sensible defaults that conform to the above-mentioned > RFC (as well as W3C recommendations). I realise this would be a breaking > change and thus if accepted, it would probably need to result in a warning > first, before moving to a new default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25082) Documentation for Spark Function expm1 is incomplete
Alexander Belov created SPARK-25082: --- Summary: Documentation for Spark Function expm1 is incomplete Key: SPARK-25082 URL: https://issues.apache.org/jira/browse/SPARK-25082 Project: Spark Issue Type: Documentation Components: Spark Core Affects Versions: 2.3.1, 2.0.0 Reporter: Alexander Belov The documentation for the function expm1 that takes in a string public static [Column|https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/Column.html] expm1(String columnName) ([https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/functions.html#expm1-java.lang.String-)] States that it "Computes the exponential of the given column." without mentioning that it first subtracts 1 from the value. The documentation for the function expm1 that takes in a column has the correct documentation: https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/functions.html#expm1-org.apache.spark.sql.Column- "Computes the exponential of the given value minus one." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25081) Nested spill in ShuffleExternalSorter may access a released memory page
[ https://issues.apache.org/jira/browse/SPARK-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25081: - Labels: correctness (was: ) > Nested spill in ShuffleExternalSorter may access a released memory page > > > Key: SPARK-25081 > URL: https://issues.apache.org/jira/browse/SPARK-25081 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > Labels: correctness > > This issue is pretty similar to SPARK-21907. > "allocateArray" in > [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] > may trigger a spill and cause ShuffleInMemorySorter access the released > `array`. Another task may get the same memory page from the pool. This will > cause two tasks access the same memory page. When a task reads memory written > by another task, many types of failures may happen. Here are some examples I > have seen: > - JVM crash. (This is easy to reproduce in a unit test as we fill newly > allocated and deallocated memory with 0xa5 and 0x5a bytes which usually > points to an invalid memory address) > - java.lang.IllegalArgumentException: Comparison method violates its general > contract! > - java.lang.NullPointerException at > org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384) > - java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size > -536870912 because the size after growing exceeds size limitation 2147483632 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25081) Nested spill in ShuffleExternalSorter may access a released memory page
[ https://issues.apache.org/jira/browse/SPARK-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25081: Assignee: Apache Spark (was: Shixiong Zhu) > Nested spill in ShuffleExternalSorter may access a released memory page > > > Key: SPARK-25081 > URL: https://issues.apache.org/jira/browse/SPARK-25081 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Shixiong Zhu >Assignee: Apache Spark >Priority: Major > > This issue is pretty similar to SPARK-21907. > "allocateArray" in > [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] > may trigger a spill and cause ShuffleInMemorySorter access the released > `array`. Another task may get the same memory page from the pool. This will > cause two tasks access the same memory page. When a task reads memory written > by another task, many types of failures may happen. Here are some examples I > have seen: > - JVM crash. (This is easy to reproduce in a unit test as we fill newly > allocated and deallocated memory with 0xa5 and 0x5a bytes which usually > points to an invalid memory address) > - java.lang.IllegalArgumentException: Comparison method violates its general > contract! > - java.lang.NullPointerException at > org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384) > - java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size > -536870912 because the size after growing exceeds size limitation 2147483632 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25081) Nested spill in ShuffleExternalSorter may access a released memory page
[ https://issues.apache.org/jira/browse/SPARK-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25081: Assignee: Shixiong Zhu (was: Apache Spark) > Nested spill in ShuffleExternalSorter may access a released memory page > > > Key: SPARK-25081 > URL: https://issues.apache.org/jira/browse/SPARK-25081 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > This issue is pretty similar to SPARK-21907. > "allocateArray" in > [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] > may trigger a spill and cause ShuffleInMemorySorter access the released > `array`. Another task may get the same memory page from the pool. This will > cause two tasks access the same memory page. When a task reads memory written > by another task, many types of failures may happen. Here are some examples I > have seen: > - JVM crash. (This is easy to reproduce in a unit test as we fill newly > allocated and deallocated memory with 0xa5 and 0x5a bytes which usually > points to an invalid memory address) > - java.lang.IllegalArgumentException: Comparison method violates its general > contract! > - java.lang.NullPointerException at > org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384) > - java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size > -536870912 because the size after growing exceeds size limitation 2147483632 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25081) Nested spill in ShuffleExternalSorter may access a released memory page
[ https://issues.apache.org/jira/browse/SPARK-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575598#comment-16575598 ] Apache Spark commented on SPARK-25081: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/22062 > Nested spill in ShuffleExternalSorter may access a released memory page > > > Key: SPARK-25081 > URL: https://issues.apache.org/jira/browse/SPARK-25081 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > This issue is pretty similar to SPARK-21907. > "allocateArray" in > [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] > may trigger a spill and cause ShuffleInMemorySorter access the released > `array`. Another task may get the same memory page from the pool. This will > cause two tasks access the same memory page. When a task reads memory written > by another task, many types of failures may happen. Here are some examples I > have seen: > - JVM crash. (This is easy to reproduce in a unit test as we fill newly > allocated and deallocated memory with 0xa5 and 0x5a bytes which usually > points to an invalid memory address) > - java.lang.IllegalArgumentException: Comparison method violates its general > contract! > - java.lang.NullPointerException at > org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384) > - java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size > -536870912 because the size after growing exceeds size limitation 2147483632 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25081) Nested spill in ShuffleExternalSorter may access a released memory page
[ https://issues.apache.org/jira/browse/SPARK-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25081: - Description: This issue is pretty similar to SPARK-21907. "allocateArray" in [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] may trigger a spill and cause ShuffleInMemorySorter access the released `array`. Another task may get the same memory page from the pool. This will cause two tasks access the same memory page. When a task reads memory written by another task, many types of failures may happen. Here are some examples I have seen: - JVM crash. (This is easy to reproduce in a unit test as we fill newly allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points to an invalid memory address) - java.lang.IllegalArgumentException: Comparison method violates its general contract! - java.lang.NullPointerException at org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384) - java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size -536870912 because the size after growing exceeds size limitation 2147483632 was: This issue is pretty similar to SPARK-21907. "allocateArray" in [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] may trigger a spill and cause ShuffleInMemorySorter access the released `array`. > Nested spill in ShuffleExternalSorter may access a released memory page > > > Key: SPARK-25081 > URL: https://issues.apache.org/jira/browse/SPARK-25081 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > This issue is pretty similar to SPARK-21907. > "allocateArray" in > [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] > may trigger a spill and cause ShuffleInMemorySorter access the released > `array`. Another task may get the same memory page from the pool. This will > cause two tasks access the same memory page. When a task reads memory written > by another task, many types of failures may happen. Here are some examples I > have seen: > - JVM crash. (This is easy to reproduce in a unit test as we fill newly > allocated and deallocated memory with 0xa5 and 0x5a bytes which usually > points to an invalid memory address) > - java.lang.IllegalArgumentException: Comparison method violates its general > contract! > - java.lang.NullPointerException at > org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384) > - java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size > -536870912 because the size after growing exceeds size limitation 2147483632 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25081) Nested spill in ShuffleExternalSorter may access a released memory page
[ https://issues.apache.org/jira/browse/SPARK-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25081: - Description: This issue is pretty similar to SPARK-21907. "allocateArray" in [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] may trigger a spill and cause > Nested spill in ShuffleExternalSorter may access a released memory page > > > Key: SPARK-25081 > URL: https://issues.apache.org/jira/browse/SPARK-25081 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > This issue is pretty similar to SPARK-21907. > "allocateArray" in > [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] > may trigger a spill and cause -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25081) Nested spill in ShuffleExternalSorter may access a released memory page
[ https://issues.apache.org/jira/browse/SPARK-25081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-25081: - Description: This issue is pretty similar to SPARK-21907. "allocateArray" in [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] may trigger a spill and cause ShuffleInMemorySorter access the released `array`. was: This issue is pretty similar to SPARK-21907. "allocateArray" in [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] may trigger a spill and cause > Nested spill in ShuffleExternalSorter may access a released memory page > > > Key: SPARK-25081 > URL: https://issues.apache.org/jira/browse/SPARK-25081 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Major > > This issue is pretty similar to SPARK-21907. > "allocateArray" in > [ShuffleInMemorySorter.reset|https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99] > may trigger a spill and cause ShuffleInMemorySorter access the released > `array`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25081) Nested spill in ShuffleExternalSorter may access a released memory page
Shixiong Zhu created SPARK-25081: Summary: Nested spill in ShuffleExternalSorter may access a released memory page Key: SPARK-25081 URL: https://issues.apache.org/jira/browse/SPARK-25081 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: Shixiong Zhu Assignee: Shixiong Zhu -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575554#comment-16575554 ] shane knapp commented on SPARK-25079: - i've confirmed, for the master branch at least, that a symlink from 3.4 to 3.5 works. next up: spot-checking older branches. which should be... interesting. > [PYTHON] upgrade python 3.4 -> 3.5 > -- > > Key: SPARK-25079 > URL: https://issues.apache.org/jira/browse/SPARK-25079 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Affects Versions: 2.3.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Major > > for the impending arrow upgrade > (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python > 3.4 -> 3.5. > i have been testing this here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69] > my methodology: > 1) upgrade python + arrow to 3.5 and 0.10.0 > 2) run python tests > 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and > upgrade centos workers to python3.5 > 4) simultaneously do the following: > - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that > points to python3.5 (this is currently being tested here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)] > - push a change to python/run-tests.py replacing 3.4 with 3.5 > 5) once the python3.5 change to run-tests.py is merged, we will need to > back-port this to all existing branches > 6) then and only then can i remove the python3.4 -> python3.5 symlink -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24950) scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[ https://issues.apache.org/jira/browse/SPARK-24950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575521#comment-16575521 ] shane knapp commented on SPARK-24950: - do we care about 2.0 and 1.6? > scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13 > - > > Key: SPARK-24950 > URL: https://issues.apache.org/jira/browse/SPARK-24950 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: shane knapp >Assignee: Chris Martin >Priority: Major > Fix For: 2.1.4, 2.2.3, 2.3.2, 2.4.0 > > > during my travails to port the spark builds to run on ubuntu 16.04LTS, i have > encountered a strange and apparently java version-specific failure on *one* > specific unit test. > the failure is here: > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/868/testReport/junit/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/daysToMillis_and_millisToDays/] > the java version on this worker is: > sknapp@ubuntu-testing:~$ java -version > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) > however, when i run this exact build on the other ubuntu workers, it passes. > they systems are set up (for the most part) identically except for the java > version: > sknapp@amp-jenkins-staging-worker-02:~$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > there are some minor kernel and other package differences on these ubuntu > workers, but nothing that (in my opinion) would affect this test. i am > willing to help investigate this, however. > the test also passes on the centos 6.9 workers, which have the following java > version installed: > [sknapp@amp-jenkins-worker-05 ~]$ java -version > java version "1.8.0_60" > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)my guess is > that either: > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala > or > sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala > is doing something wrong. i am not a scala expert by any means, so i'd > really like some help in trying to un-block the project to port the builds to > ubuntu. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24950) scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[ https://issues.apache.org/jira/browse/SPARK-24950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-24950. --- Resolution: Fixed Fix Version/s: 2.3.2 2.2.3 2.1.4 > scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13 > - > > Key: SPARK-24950 > URL: https://issues.apache.org/jira/browse/SPARK-24950 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: shane knapp >Assignee: Chris Martin >Priority: Major > Fix For: 2.1.4, 2.2.3, 2.3.2, 2.4.0 > > > during my travails to port the spark builds to run on ubuntu 16.04LTS, i have > encountered a strange and apparently java version-specific failure on *one* > specific unit test. > the failure is here: > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/868/testReport/junit/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/daysToMillis_and_millisToDays/] > the java version on this worker is: > sknapp@ubuntu-testing:~$ java -version > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) > however, when i run this exact build on the other ubuntu workers, it passes. > they systems are set up (for the most part) identically except for the java > version: > sknapp@amp-jenkins-staging-worker-02:~$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > there are some minor kernel and other package differences on these ubuntu > workers, but nothing that (in my opinion) would affect this test. i am > willing to help investigate this, however. > the test also passes on the centos 6.9 workers, which have the following java > version installed: > [sknapp@amp-jenkins-worker-05 ~]$ java -version > java version "1.8.0_60" > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)my guess is > that either: > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala > or > sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala > is doing something wrong. i am not a scala expert by any means, so i'd > really like some help in trying to un-block the project to port the builds to > ubuntu. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24950) scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[ https://issues.apache.org/jira/browse/SPARK-24950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575504#comment-16575504 ] shane knapp commented on SPARK-24950: - booyah! i love watching the build queue pile up. :) thanks [~srowen]! > scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13 > - > > Key: SPARK-24950 > URL: https://issues.apache.org/jira/browse/SPARK-24950 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: shane knapp >Assignee: Chris Martin >Priority: Major > Fix For: 2.4.0 > > > during my travails to port the spark builds to run on ubuntu 16.04LTS, i have > encountered a strange and apparently java version-specific failure on *one* > specific unit test. > the failure is here: > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/868/testReport/junit/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/daysToMillis_and_millisToDays/] > the java version on this worker is: > sknapp@ubuntu-testing:~$ java -version > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) > however, when i run this exact build on the other ubuntu workers, it passes. > they systems are set up (for the most part) identically except for the java > version: > sknapp@amp-jenkins-staging-worker-02:~$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > there are some minor kernel and other package differences on these ubuntu > workers, but nothing that (in my opinion) would affect this test. i am > willing to help investigate this, however. > the test also passes on the centos 6.9 workers, which have the following java > version installed: > [sknapp@amp-jenkins-worker-05 ~]$ java -version > java version "1.8.0_60" > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)my guess is > that either: > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala > or > sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala > is doing something wrong. i am not a scala expert by any means, so i'd > really like some help in trying to un-block the project to port the builds to > ubuntu. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24950) scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[ https://issues.apache.org/jira/browse/SPARK-24950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575493#comment-16575493 ] shane knapp commented on SPARK-24950: - word. let's leave this open until for a bit longer as i continue to test. > scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13 > - > > Key: SPARK-24950 > URL: https://issues.apache.org/jira/browse/SPARK-24950 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: shane knapp >Assignee: Chris Martin >Priority: Major > Fix For: 2.4.0 > > > during my travails to port the spark builds to run on ubuntu 16.04LTS, i have > encountered a strange and apparently java version-specific failure on *one* > specific unit test. > the failure is here: > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/868/testReport/junit/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/daysToMillis_and_millisToDays/] > the java version on this worker is: > sknapp@ubuntu-testing:~$ java -version > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) > however, when i run this exact build on the other ubuntu workers, it passes. > they systems are set up (for the most part) identically except for the java > version: > sknapp@amp-jenkins-staging-worker-02:~$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > there are some minor kernel and other package differences on these ubuntu > workers, but nothing that (in my opinion) would affect this test. i am > willing to help investigate this, however. > the test also passes on the centos 6.9 workers, which have the following java > version installed: > [sknapp@amp-jenkins-worker-05 ~]$ java -version > java version "1.8.0_60" > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)my guess is > that either: > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala > or > sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala > is doing something wrong. i am not a scala expert by any means, so i'd > really like some help in trying to un-block the project to port the builds to > ubuntu. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24950) scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[ https://issues.apache.org/jira/browse/SPARK-24950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575488#comment-16575488 ] Sean Owen commented on SPARK-24950: --- Roger that, will back-port back to 2.1 as best I can. > scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13 > - > > Key: SPARK-24950 > URL: https://issues.apache.org/jira/browse/SPARK-24950 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: shane knapp >Assignee: Chris Martin >Priority: Major > Fix For: 2.4.0 > > > during my travails to port the spark builds to run on ubuntu 16.04LTS, i have > encountered a strange and apparently java version-specific failure on *one* > specific unit test. > the failure is here: > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/868/testReport/junit/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/daysToMillis_and_millisToDays/] > the java version on this worker is: > sknapp@ubuntu-testing:~$ java -version > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) > however, when i run this exact build on the other ubuntu workers, it passes. > they systems are set up (for the most part) identically except for the java > version: > sknapp@amp-jenkins-staging-worker-02:~$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > there are some minor kernel and other package differences on these ubuntu > workers, but nothing that (in my opinion) would affect this test. i am > willing to help investigate this, however. > the test also passes on the centos 6.9 workers, which have the following java > version installed: > [sknapp@amp-jenkins-worker-05 ~]$ java -version > java version "1.8.0_60" > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)my guess is > that either: > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala > or > sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala > is doing something wrong. i am not a scala expert by any means, so i'd > really like some help in trying to un-block the project to port the builds to > ubuntu. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24950) scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[ https://issues.apache.org/jira/browse/SPARK-24950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575479#comment-16575479 ] shane knapp commented on SPARK-24950: - [~srowen] [~d80tb7] i'm thinking that we actually need to backport this change to previous branches. :( [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/spark-branch-2.1-test-maven-hadoop-2.7-ubuntu-testing/2/] {noformat} - daysToMillis and millisToDays *** FAILED *** 9131 did not equal 9130 Round trip of 9130 did not work in tz sun.util.calendar.ZoneInfo[id="Pacific/Enderbury",offset=4680,dstSavings=0,useDaylight=false,transitions=5,lastRule=null] (DateTimeUtilsSuite.scala:554){noformat} > scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13 > - > > Key: SPARK-24950 > URL: https://issues.apache.org/jira/browse/SPARK-24950 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: shane knapp >Assignee: Chris Martin >Priority: Major > Fix For: 2.4.0 > > > during my travails to port the spark builds to run on ubuntu 16.04LTS, i have > encountered a strange and apparently java version-specific failure on *one* > specific unit test. > the failure is here: > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/868/testReport/junit/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/daysToMillis_and_millisToDays/] > the java version on this worker is: > sknapp@ubuntu-testing:~$ java -version > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) > however, when i run this exact build on the other ubuntu workers, it passes. > they systems are set up (for the most part) identically except for the java > version: > sknapp@amp-jenkins-staging-worker-02:~$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > there are some minor kernel and other package differences on these ubuntu > workers, but nothing that (in my opinion) would affect this test. i am > willing to help investigate this, however. > the test also passes on the centos 6.9 workers, which have the following java > version installed: > [sknapp@amp-jenkins-worker-05 ~]$ java -version > java version "1.8.0_60" > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)my guess is > that either: > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala > or > sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala > is doing something wrong. i am not a scala expert by any means, so i'd > really like some help in trying to un-block the project to port the builds to > ubuntu. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24950) scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[ https://issues.apache.org/jira/browse/SPARK-24950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp updated SPARK-24950: Affects Version/s: 2.1.3 > scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13 > - > > Key: SPARK-24950 > URL: https://issues.apache.org/jira/browse/SPARK-24950 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: shane knapp >Assignee: Chris Martin >Priority: Major > Fix For: 2.4.0 > > > during my travails to port the spark builds to run on ubuntu 16.04LTS, i have > encountered a strange and apparently java version-specific failure on *one* > specific unit test. > the failure is here: > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/868/testReport/junit/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/daysToMillis_and_millisToDays/] > the java version on this worker is: > sknapp@ubuntu-testing:~$ java -version > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) > however, when i run this exact build on the other ubuntu workers, it passes. > they systems are set up (for the most part) identically except for the java > version: > sknapp@amp-jenkins-staging-worker-02:~$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > there are some minor kernel and other package differences on these ubuntu > workers, but nothing that (in my opinion) would affect this test. i am > willing to help investigate this, however. > the test also passes on the centos 6.9 workers, which have the following java > version installed: > [sknapp@amp-jenkins-worker-05 ~]$ java -version > java version "1.8.0_60" > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)my guess is > that either: > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala > or > sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala > is doing something wrong. i am not a scala expert by any means, so i'd > really like some help in trying to un-block the project to port the builds to > ubuntu. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-24950) scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13
[ https://issues.apache.org/jira/browse/SPARK-24950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp reopened SPARK-24950: - > scala DateTimeUtilsSuite daysToMillis and millisToDays fails w/java 8 181-b13 > - > > Key: SPARK-24950 > URL: https://issues.apache.org/jira/browse/SPARK-24950 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: shane knapp >Assignee: Chris Martin >Priority: Major > Fix For: 2.4.0 > > > during my travails to port the spark builds to run on ubuntu 16.04LTS, i have > encountered a strange and apparently java version-specific failure on *one* > specific unit test. > the failure is here: > [https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/868/testReport/junit/org.apache.spark.sql.catalyst.util/DateTimeUtilsSuite/daysToMillis_and_millisToDays/] > the java version on this worker is: > sknapp@ubuntu-testing:~$ java -version > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) > however, when i run this exact build on the other ubuntu workers, it passes. > they systems are set up (for the most part) identically except for the java > version: > sknapp@amp-jenkins-staging-worker-02:~$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > there are some minor kernel and other package differences on these ubuntu > workers, but nothing that (in my opinion) would affect this test. i am > willing to help investigate this, however. > the test also passes on the centos 6.9 workers, which have the following java > version installed: > [sknapp@amp-jenkins-worker-05 ~]$ java -version > java version "1.8.0_60" > Java(TM) SE Runtime Environment (build 1.8.0_60-b27) > Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)my guess is > that either: > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala > or > sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala > is doing something wrong. i am not a scala expert by any means, so i'd > really like some help in trying to un-block the project to port the builds to > ubuntu. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25068) High-order function: exists(array, function) → boolean
[ https://issues.apache.org/jira/browse/SPARK-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-25068. - Resolution: Fixed Fix Version/s: 2.4.0 > High-order function: exists(array, function) → boolean > - > > Key: SPARK-25068 > URL: https://issues.apache.org/jira/browse/SPARK-25068 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 2.4.0 > > > Tests if arrays have those elements for which function returns true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25076) SQLConf should not be retrieved from a stopped SparkSession
[ https://issues.apache.org/jira/browse/SPARK-25076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-25076. - Resolution: Fixed Fix Version/s: 2.4.0 > SQLConf should not be retrieved from a stopped SparkSession > --- > > Key: SPARK-25076 > URL: https://issues.apache.org/jira/browse/SPARK-25076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 2.0.0
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575373#comment-16575373 ] Steve Bairos commented on SPARK-18057: -- Hey, long shot but any chance this change could get back ported to branch-2.3? My company is currently on the 2.3 branch and we're dying to get off of kafka-client 0.10 because there are a few issues with 0.10 and TLS. > Update structured streaming kafka from 0.10.0.1 to 2.0.0 > > > Key: SPARK-18057 > URL: https://issues.apache.org/jira/browse/SPARK-18057 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Reporter: Cody Koeninger >Assignee: Ted Yu >Priority: Major > Fix For: 2.4.0 > > > There are a couple of relevant KIPs here, > https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25077) Delete unused variable in WindowExec
[ https://issues.apache.org/jira/browse/SPARK-25077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-25077. - Resolution: Fixed Assignee: Li Yuanjian Fix Version/s: 2.4.0 > Delete unused variable in WindowExec > > > Key: SPARK-25077 > URL: https://issues.apache.org/jira/browse/SPARK-25077 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Li Yuanjian >Assignee: Li Yuanjian >Priority: Trivial > Fix For: 2.4.0 > > > Delete the unused variable `inputFields` in WindowExec, avoid making others > confused while reading the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23298) distinct.count on Dataset/DataFrame yields non-deterministic results
[ https://issues.apache.org/jira/browse/SPARK-23298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mateusz Jukiewicz updated SPARK-23298: -- Labels: Correctness CorrectnessBug correctness (was: CorrectnessBug correctness) > distinct.count on Dataset/DataFrame yields non-deterministic results > > > Key: SPARK-23298 > URL: https://issues.apache.org/jira/browse/SPARK-23298 > Project: Spark > Issue Type: Bug > Components: Shuffle, SQL, YARN >Affects Versions: 2.1.0, 2.2.0 > Environment: Spark 2.2.0 or 2.1.0 > Java 1.8.0_144 > Yarn version: > {code:java} > Hadoop 2.6.0-cdh5.12.1 > Subversion http://github.com/cloudera/hadoop -r > 520d8b072e666e9f21d645ca6a5219fc37535a52 > Compiled by jenkins on 2017-08-24T16:43Z > Compiled with protoc 2.5.0 > From source with checksum de51bf9693ab9426379a1cd28142cea0 > This command was run using > /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.12.1.jar{code} > > >Reporter: Mateusz Jukiewicz >Priority: Major > Labels: Correctness, CorrectnessBug, correctness > > This is what happens: > {code:java} > /* Exemplary spark-shell starting command > /opt/spark/bin/spark-shell \ > --num-executors 269 \ > --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ > --conf spark.kryoserializer.buffer.max=512m > */ > val dataset = spark.read.textFile("/text_dataset.out") > dataset.distinct.count > // res0: Long = 24025868 > dataset.distinct.count > // res1: Long = 24014227{code} > The _text_dataset.out_ file is a dataset with one string per line. The string > has alphanumeric characters as well as colons and spaces. The line length > does not exceed 1200. I don't think that's important though, as the issue > appeared on various other datasets, I just tried to narrow it down to the > simplest possible case. > The observations regarding the issue are as follows: > * I managed to reproduce it on both spark 2.2 and spark 2.1. > * The issue occurs in YARN cluster mode (I haven't tested YARN client mode). > * The issue is not reproducible on a single machine (e.g. laptop) in spark > local mode. > * It seems that once the correct count is computed, it is not possible to > reproduce the issue in the same spark session. In other words, I was able to > get 2-3 incorrect distinct.count results consecutively, but once it got > right, it always returned the correct value. I had to re-run spark-shell to > observe the problem again. > * The issue appears on both Dataset and DataFrame (i.e. using read.text or > read.textFile). > * The issue is not reproducible on RDD (i.e. dataset.rdd.distinct.count). > * Not a single container has failed in those multiple invalid executions. > * YARN doesn't show any warnings or errors in those invalid executions. > * The execution plan determined for both valid and invalid executions was > always the same (it's shown in the _SQL_ tab of the UI). > * The number returned in the invalid executions was always greater than the > correct number (24 014 227). > * This occurs even though the input is already completely deduplicated (i.e. > _distinct.count_ shouldn't change anything). > * The input isn't replicated (i.e. there's only one copy of each file block > on the HDFS). > * The problem is probably not related to reading from HDFS. Spark was always > able to correctly read all input records (which was shown in the UI), and > that number got malformed after the exchange phase: > ** correct execution: > Input Size / Records: 3.9 GB / 24014227 _(first stage)_ > Shuffle Write: 3.3 GB / 24014227 _(first stage)_ > Shuffle Read: 3.3 GB / 24014227 _(second stage)_ > ** incorrect execution: > Input Size / Records: 3.9 GB / 24014227 _(first stage)_ > Shuffle Write: 3.3 GB / 24014227 _(first stage)_ > Shuffle Read: 3.3 GB / 24020150 _(second stage)_ > * The problem might be related with the internal way of Encoders hashing. > The reason might be: > ** in a simple `distinct.count` invocation, there are in total three > hash-related stages (called `HashAggregate`), > ** excerpt from scaladoc for `distinct` method says: > {code:java} >* @note Equality checking is performed directly on the encoded > representation of the data >* and thus is not affected by a custom `equals` function defined on > `T`.{code} > * One of my suspicions was the number of partitions we're using (2154). This > is greater than 2000, which means that a different data structure (i.e. > _HighlyCompressedMapStatus_instead of _CompressedMapStatus_) will be used for > book-keeping during the shuffle. Unfortunately after decreasing the number > below this threshold the problem still occurs. > * It's easier to reproduce the issue with a large number of partitions. > * One of my another suspicions was
[jira] [Updated] (SPARK-23298) distinct.count on Dataset/DataFrame yields non-deterministic results
[ https://issues.apache.org/jira/browse/SPARK-23298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mateusz Jukiewicz updated SPARK-23298: -- Labels: CorrectnessBug correctness (was: ) > distinct.count on Dataset/DataFrame yields non-deterministic results > > > Key: SPARK-23298 > URL: https://issues.apache.org/jira/browse/SPARK-23298 > Project: Spark > Issue Type: Bug > Components: Shuffle, SQL, YARN >Affects Versions: 2.1.0, 2.2.0 > Environment: Spark 2.2.0 or 2.1.0 > Java 1.8.0_144 > Yarn version: > {code:java} > Hadoop 2.6.0-cdh5.12.1 > Subversion http://github.com/cloudera/hadoop -r > 520d8b072e666e9f21d645ca6a5219fc37535a52 > Compiled by jenkins on 2017-08-24T16:43Z > Compiled with protoc 2.5.0 > From source with checksum de51bf9693ab9426379a1cd28142cea0 > This command was run using > /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.12.1.jar{code} > > >Reporter: Mateusz Jukiewicz >Priority: Major > Labels: CorrectnessBug, correctness > > This is what happens: > {code:java} > /* Exemplary spark-shell starting command > /opt/spark/bin/spark-shell \ > --num-executors 269 \ > --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ > --conf spark.kryoserializer.buffer.max=512m > */ > val dataset = spark.read.textFile("/text_dataset.out") > dataset.distinct.count > // res0: Long = 24025868 > dataset.distinct.count > // res1: Long = 24014227{code} > The _text_dataset.out_ file is a dataset with one string per line. The string > has alphanumeric characters as well as colons and spaces. The line length > does not exceed 1200. I don't think that's important though, as the issue > appeared on various other datasets, I just tried to narrow it down to the > simplest possible case. > The observations regarding the issue are as follows: > * I managed to reproduce it on both spark 2.2 and spark 2.1. > * The issue occurs in YARN cluster mode (I haven't tested YARN client mode). > * The issue is not reproducible on a single machine (e.g. laptop) in spark > local mode. > * It seems that once the correct count is computed, it is not possible to > reproduce the issue in the same spark session. In other words, I was able to > get 2-3 incorrect distinct.count results consecutively, but once it got > right, it always returned the correct value. I had to re-run spark-shell to > observe the problem again. > * The issue appears on both Dataset and DataFrame (i.e. using read.text or > read.textFile). > * The issue is not reproducible on RDD (i.e. dataset.rdd.distinct.count). > * Not a single container has failed in those multiple invalid executions. > * YARN doesn't show any warnings or errors in those invalid executions. > * The execution plan determined for both valid and invalid executions was > always the same (it's shown in the _SQL_ tab of the UI). > * The number returned in the invalid executions was always greater than the > correct number (24 014 227). > * This occurs even though the input is already completely deduplicated (i.e. > _distinct.count_ shouldn't change anything). > * The input isn't replicated (i.e. there's only one copy of each file block > on the HDFS). > * The problem is probably not related to reading from HDFS. Spark was always > able to correctly read all input records (which was shown in the UI), and > that number got malformed after the exchange phase: > ** correct execution: > Input Size / Records: 3.9 GB / 24014227 _(first stage)_ > Shuffle Write: 3.3 GB / 24014227 _(first stage)_ > Shuffle Read: 3.3 GB / 24014227 _(second stage)_ > ** incorrect execution: > Input Size / Records: 3.9 GB / 24014227 _(first stage)_ > Shuffle Write: 3.3 GB / 24014227 _(first stage)_ > Shuffle Read: 3.3 GB / 24020150 _(second stage)_ > * The problem might be related with the internal way of Encoders hashing. > The reason might be: > ** in a simple `distinct.count` invocation, there are in total three > hash-related stages (called `HashAggregate`), > ** excerpt from scaladoc for `distinct` method says: > {code:java} >* @note Equality checking is performed directly on the encoded > representation of the data >* and thus is not affected by a custom `equals` function defined on > `T`.{code} > * One of my suspicions was the number of partitions we're using (2154). This > is greater than 2000, which means that a different data structure (i.e. > _HighlyCompressedMapStatus_instead of _CompressedMapStatus_) will be used for > book-keeping during the shuffle. Unfortunately after decreasing the number > below this threshold the problem still occurs. > * It's easier to reproduce the issue with a large number of partitions. > * One of my another suspicions was that it's somehow related to the number > of blocks
[jira] [Commented] (SPARK-22236) CSV I/O: does not respect RFC 4180
[ https://issues.apache.org/jira/browse/SPARK-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575305#comment-16575305 ] Joe Pallas commented on SPARK-22236: If this can't change before 3.0, how about a note in the documentation explaining that compatibility with RFC4180 requires setting {{escape}} to {{"}} and {{multiLine}} to {{true}}? > CSV I/O: does not respect RFC 4180 > -- > > Key: SPARK-22236 > URL: https://issues.apache.org/jira/browse/SPARK-22236 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 2.2.0 >Reporter: Ondrej Kokes >Priority: Minor > > When reading or writing CSV files with Spark, double quotes are escaped with > a backslash by default. However, the appropriate behaviour as set out by RFC > 4180 (and adhered to by many software packages) is to escape using a second > double quote. > This piece of Python code demonstrates the issue > {code} > import csv > with open('testfile.csv', 'w') as f: > cw = csv.writer(f) > cw.writerow(['a 2.5" drive', 'another column']) > cw.writerow(['a "quoted" string', '"quoted"']) > cw.writerow([1,2]) > with open('testfile.csv') as f: > print(f.read()) > # "a 2.5"" drive",another column > # "a ""quoted"" string","""quoted""" > # 1,2 > spark.read.csv('testfile.csv').collect() > # [Row(_c0='"a 2.5"" drive"', _c1='another column'), > # Row(_c0='"a ""quoted"" string"', _c1='"""quoted"""'), > # Row(_c0='1', _c1='2')] > # explicitly stating the escape character fixed the issue > spark.read.option('escape', '"').csv('testfile.csv').collect() > # [Row(_c0='a 2.5" drive', _c1='another column'), > # Row(_c0='a "quoted" string', _c1='"quoted"'), > # Row(_c0='1', _c1='2')] > {code} > The same applies to writes, where reading the file written by Spark may > result in garbage. > {code} > df = spark.read.option('escape', '"').csv('testfile.csv') # reading the file > correctly > df.write.format("csv").save('testout.csv') > with open('testout.csv/part-csv') as f: > cr = csv.reader(f) > print(next(cr)) > print(next(cr)) > # ['a 2.5\\ drive"', 'another column'] > # ['a \\quoted\\" string"', '\\quoted\\""'] > {code} > The culprit is in > [CSVOptions.scala|https://github.com/apache/spark/blob/7d0a3ef4ced9684457ad6c5924c58b95249419e1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L91], > where the default escape character is overridden. > While it's possible to work with CSV files in a "compatible" manner, it would > be useful if Spark had sensible defaults that conform to the above-mentioned > RFC (as well as W3C recommendations). I realise this would be a breaking > change and thus if accepted, it would probably need to result in a warning > first, before moving to a new default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25059) Exception while executing an action on DataFrame that read Json
[ https://issues.apache.org/jira/browse/SPARK-25059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575288#comment-16575288 ] Kunal Goswami commented on SPARK-25059: --- Thank you so much for the prompt response, let me try using spark 2.3 then. > Exception while executing an action on DataFrame that read Json > --- > > Key: SPARK-25059 > URL: https://issues.apache.org/jira/browse/SPARK-25059 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.2.0 > Environment: AWS EMR 5.8.0 > Spark 2.2.0 > >Reporter: Kunal Goswami >Priority: Major > Labels: Spark-SQL > > When I try to read ~9600 Json files using > {noformat} > val test = spark.read.option("header", true).option("inferSchema", > true).json(paths: _*) {noformat} > > Any action on the above created data frame results in: > {noformat} > Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "apply2_1$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class "org.apache.spark.sql.catalyst.expressions.Generat[73/1850] > pecificUnsafeProjection" grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949) > at org.codehaus.janino.CodeContext.write(CodeContext.java:839) > at org.codehaus.janino.UnitCompiler.writeOpcode(UnitCompiler.java:11081) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4546) > at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762) > at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180) > at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112) > at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370) > at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at > org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1436) > at org.codehaus.janino.UnitCompiler.access$1600(UnitCompiler.java:206) > at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1376) > at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1370) > at org.codehaus.janino.Java$Block.accept(Java.java:2471) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2220) > at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1378) > at > org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1370) > at org.codehaus.janino.Java$IfStatement.accept(Java.java:2621) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at > org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1436) > at org.codehaus.janino.UnitCompiler.access$1600(UnitCompiler.java:206) > at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1376) > at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1370) > at org.codehaus.janino.Java$Block.accept(Java.java:2471) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2220) > at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1378) > at > org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1370) > at
[jira] [Created] (SPARK-25080) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
Andrew K Long created SPARK-25080: - Summary: NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110) Key: SPARK-25080 URL: https://issues.apache.org/jira/browse/SPARK-25080 Project: Spark Issue Type: Improvement Components: Input/Output Affects Versions: 2.3.1 Environment: AWS EMR Reporter: Andrew K Long NPE while reading hive table. ``` Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, executor 487): java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194) ... 67 more Caused by: java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at
[jira] [Commented] (SPARK-25024) Update mesos documentation to be clear about security supported
[ https://issues.apache.org/jira/browse/SPARK-25024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575267#comment-16575267 ] Thomas Graves commented on SPARK-25024: --- ok, I'm not familiar with mesos hardly at all so I apologize if some of these seem intuitive, from reading the mesos docs its not clear to me on a few points. Note I'm going to go through the yarn docs and try to clarify very similar things there. I see some updates have been made on master vs I was originally looking at the 2.3.1 docs (https://github.com/apache/spark/blob/master/docs/running-on-mesos.md) * for cluster mode does MesosClusterDispatcher support authentication and can zookeeper be secured? * Does it support accessing secure HDFS? Does it require keytabs be shipped? * does Mesos Shuffle Service support authentication? I assume so since I would expect it to use spark RPC, so assume spark confs when you start it need spark.authenticate=true and specify a secret? So its not really multi-tenant, but perhaps mesos handles the multi-tenancy as does each user start their own shuffle service? * spark.mesos.principal and spark.mesos.secret, assume mesos handles multi-tenancy based on registry? * for the spark.mesos.driver.secret* configs, I assume it would vary by setup if these are actually secure. For instance if I specify an env variable or config can other users see it. Also does that secret need to match shuffle service, might depend on question above if only one per cluster or setup per user. Maybe to many variations to talk about? * > Update mesos documentation to be clear about security supported > --- > > Key: SPARK-25024 > URL: https://issues.apache.org/jira/browse/SPARK-25024 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.2 >Reporter: Thomas Graves >Priority: Major > > I was reading through our mesos deployment docs and security docs and its not > clear at all what type of security and how to set it up for mesos. I think > we should clarify this and have something about exactly what is supported and > what is not. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575247#comment-16575247 ] Sean Owen commented on SPARK-25044: --- Next thought: use ScalaUDF's inputTypes field to determine which args are primitive. However, I find this is only set when performing type coercion, and can't be relied on, it seems. We could change the whole code base to always set this, but I wonder if we can force user code that might reference ScalaUDF to do so. Hm. > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule": > {code:java} > - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED *** > Results do not match for query: > ... > == Results == > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct<> struct > ![0,10,null] [0,10,0] > ![1,12,null] [1,12,1] > ![2,14,null] [2,14,2] (QueryTest.scala:163){code} > You can kind of get what's going on reading the test: > {code:java} > test("SPARK-24891 Fix HandleNullInputsForUDF rule") { > // assume(!ClosureCleanerSuite2.supportsLMFs) > // This test won't test what it intends to in 2.12, as lambda metafactory > closures > // have arg types that are not primitive, but Object > val udf1 = udf({(x: Int, y: Int) => x + y}) > val df = spark.range(0, 3).toDF("a") > .withColumn("b", udf1($"a", udf1($"a", lit(10 > .withColumn("c", udf1($"a", lit(null))) > val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed > comparePlans(df.logicalPlan, plan) > checkAnswer( > df, > Seq( > Row(0, 10, null), > Row(1, 12, null), > Row(2, 14, null))) > }{code} > > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575197#comment-16575197 ] shane knapp commented on SPARK-25079: - SO. MANY. MOVING. PARTS. > [PYTHON] upgrade python 3.4 -> 3.5 > -- > > Key: SPARK-25079 > URL: https://issues.apache.org/jira/browse/SPARK-25079 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Affects Versions: 2.3.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Major > > for the impending arrow upgrade > (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python > 3.4 -> 3.5. > i have been testing this here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69] > my methodology: > 1) upgrade python + arrow to 3.5 and 0.10.0 > 2) run python tests > 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and > upgrade centos workers to python3.5 > 4) simultaneously do the following: > - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that > points to python3.5 (this is currently being tested here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)] > - push a change to python/run-tests.py replacing 3.4 with 3.5 > 5) once the python3.5 change to run-tests.py is merged, we will need to > back-port this to all existing branches > 6) then and only then can i remove the python3.4 -> python3.5 symlink -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25079: Assignee: Apache Spark (was: shane knapp) > [PYTHON] upgrade python 3.4 -> 3.5 > -- > > Key: SPARK-25079 > URL: https://issues.apache.org/jira/browse/SPARK-25079 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Affects Versions: 2.3.1 >Reporter: shane knapp >Assignee: Apache Spark >Priority: Major > > for the impending arrow upgrade > (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python > 3.4 -> 3.5. > i have been testing this here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69] > my methodology: > 1) upgrade python + arrow to 3.5 and 0.10.0 > 2) run python tests > 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and > upgrade centos workers to python3.5 > 4) simultaneously do the following: > - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that > points to python3.5 (this is currently being tested here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)] > - push a change to python/run-tests.py replacing 3.4 with 3.5 > 5) once the python3.5 change to run-tests.py is merged, we will need to > back-port this to all existing branches > 6) then and only then can i remove the python3.4 -> python3.5 symlink -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575186#comment-16575186 ] Apache Spark commented on SPARK-25079: -- User 'shaneknapp' has created a pull request for this issue: https://github.com/apache/spark/pull/22061 > [PYTHON] upgrade python 3.4 -> 3.5 > -- > > Key: SPARK-25079 > URL: https://issues.apache.org/jira/browse/SPARK-25079 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Affects Versions: 2.3.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Major > > for the impending arrow upgrade > (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python > 3.4 -> 3.5. > i have been testing this here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69] > my methodology: > 1) upgrade python + arrow to 3.5 and 0.10.0 > 2) run python tests > 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and > upgrade centos workers to python3.5 > 4) simultaneously do the following: > - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that > points to python3.5 (this is currently being tested here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)] > - push a change to python/run-tests.py replacing 3.4 with 3.5 > 5) once the python3.5 change to run-tests.py is merged, we will need to > back-port this to all existing branches > 6) then and only then can i remove the python3.4 -> python3.5 symlink -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25079: Assignee: shane knapp (was: Apache Spark) > [PYTHON] upgrade python 3.4 -> 3.5 > -- > > Key: SPARK-25079 > URL: https://issues.apache.org/jira/browse/SPARK-25079 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Affects Versions: 2.3.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Major > > for the impending arrow upgrade > (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python > 3.4 -> 3.5. > i have been testing this here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69] > my methodology: > 1) upgrade python + arrow to 3.5 and 0.10.0 > 2) run python tests > 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and > upgrade centos workers to python3.5 > 4) simultaneously do the following: > - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that > points to python3.5 (this is currently being tested here: > [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)] > - push a change to python/run-tests.py replacing 3.4 with 3.5 > 5) once the python3.5 change to run-tests.py is merged, we will need to > back-port this to all existing branches > 6) then and only then can i remove the python3.4 -> python3.5 symlink -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.10.0
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575170#comment-16575170 ] shane knapp commented on SPARK-23874: - this issue depends on this: https://issues.apache.org/jira/browse/SPARK-25079 > Upgrade apache/arrow to 0.10.0 > -- > > Key: SPARK-23874 > URL: https://issues.apache.org/jira/browse/SPARK-23874 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Bryan Cutler >Priority: Major > > Version 0.10.0 will allow for the following improvements and bug fixes: > * Allow for adding BinaryType support > * Bug fix related to array serialization ARROW-1973 > * Python2 str will be made into an Arrow string instead of bytes ARROW-2101 > * Python bytearrays are supported in as input to pyarrow ARROW-2141 > * Java has common interface for reset to cleanup complex vectors in Spark > ArrowWriter ARROW-1962 > * Cleanup pyarrow type equality checks ARROW-2423 > * ArrowStreamWriter should not hold references to ArrowBlocks ARROW-2632, > ARROW-2645 > * Improved low level handling of messages for RecordBatch ARROW-2704 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5
shane knapp created SPARK-25079: --- Summary: [PYTHON] upgrade python 3.4 -> 3.5 Key: SPARK-25079 URL: https://issues.apache.org/jira/browse/SPARK-25079 Project: Spark Issue Type: Improvement Components: Build, PySpark Affects Versions: 2.3.1 Reporter: shane knapp Assignee: shane knapp for the impending arrow upgrade (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python 3.4 -> 3.5. i have been testing this here: [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69] my methodology: 1) upgrade python + arrow to 3.5 and 0.10.0 2) run python tests 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and upgrade centos workers to python3.5 4) simultaneously do the following: - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that points to python3.5 (this is currently being tested here: [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)] - push a change to python/run-tests.py replacing 3.4 with 3.5 5) once the python3.5 change to run-tests.py is merged, we will need to back-port this to all existing branches 6) then and only then can i remove the python3.4 -> python3.5 symlink -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25036) Scala 2.12 issues: Compilation error with sbt
[ https://issues.apache.org/jira/browse/SPARK-25036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25036: Assignee: Apache Spark (was: Kazuaki Ishizaki) > Scala 2.12 issues: Compilation error with sbt > - > > Key: SPARK-25036 > URL: https://issues.apache.org/jira/browse/SPARK-25036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark >Priority: Major > Fix For: 2.4.0 > > > When compiling with sbt, the following errors occur: > There are -two- three types: > 1. {{ExprValue.isNull}} is compared with unexpected type. > 2. {{match may not be exhaustive}} is detected at {{match}} > 3. discarding unmoored doc comment > The first one is more serious since it may also generate incorrect code in > Spark 2.3. > {code:java} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:63: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] def isIntersected(r1: ValueInterval, r2: ValueInterval): > Boolean = (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:79: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala:67: > match may not be exhaustive. > [error] It would fail on the following inputs: (ArrayType(_, _), _), (_, > ArrayData()), (_, _) > [error] [warn] (endpointsExpression.dataType, endpointsExpression.eval()) > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:470: > match may not be exhaustive. > [error] It would fail on the following inputs: NewFunctionSpec(_, None, > Some(_)), NewFunctionSpec(_, Some(_), None) > [error] [warn] newFunction match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:94: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely always compare unequal > [error] [warn] if (eval.isNull != "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:126: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:133: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:709: > match may not be exhaustive. > [error] It would fail on the following input: Schema((x: > org.apache.spark.sql.types.DataType forSome x not in > org.apache.spark.sql.types.StructType), _) > [error] [warn] def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:90: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (inputs.map(_.isNull).forall(_ == "false")) { > [error] [warn] > {code} > {code:java} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:410: > discarding unmoored doc comment > [error] [warn] /** > [error] [warn] > [error] [warn] >
[jira] [Assigned] (SPARK-25036) Scala 2.12 issues: Compilation error with sbt
[ https://issues.apache.org/jira/browse/SPARK-25036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25036: Assignee: Kazuaki Ishizaki (was: Apache Spark) > Scala 2.12 issues: Compilation error with sbt > - > > Key: SPARK-25036 > URL: https://issues.apache.org/jira/browse/SPARK-25036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > When compiling with sbt, the following errors occur: > There are -two- three types: > 1. {{ExprValue.isNull}} is compared with unexpected type. > 2. {{match may not be exhaustive}} is detected at {{match}} > 3. discarding unmoored doc comment > The first one is more serious since it may also generate incorrect code in > Spark 2.3. > {code:java} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:63: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] def isIntersected(r1: ValueInterval, r2: ValueInterval): > Boolean = (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:79: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala:67: > match may not be exhaustive. > [error] It would fail on the following inputs: (ArrayType(_, _), _), (_, > ArrayData()), (_, _) > [error] [warn] (endpointsExpression.dataType, endpointsExpression.eval()) > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:470: > match may not be exhaustive. > [error] It would fail on the following inputs: NewFunctionSpec(_, None, > Some(_)), NewFunctionSpec(_, Some(_), None) > [error] [warn] newFunction match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:94: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely always compare unequal > [error] [warn] if (eval.isNull != "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:126: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:133: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:709: > match may not be exhaustive. > [error] It would fail on the following input: Schema((x: > org.apache.spark.sql.types.DataType forSome x not in > org.apache.spark.sql.types.StructType), _) > [error] [warn] def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:90: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (inputs.map(_.isNull).forall(_ == "false")) { > [error] [warn] > {code} > {code:java} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:410: > discarding unmoored doc comment > [error] [warn] /** > [error] [warn] > [error] [warn] >
[jira] [Commented] (SPARK-25036) Scala 2.12 issues: Compilation error with sbt
[ https://issues.apache.org/jira/browse/SPARK-25036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575158#comment-16575158 ] Apache Spark commented on SPARK-25036: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/22059 > Scala 2.12 issues: Compilation error with sbt > - > > Key: SPARK-25036 > URL: https://issues.apache.org/jira/browse/SPARK-25036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > When compiling with sbt, the following errors occur: > There are -two- three types: > 1. {{ExprValue.isNull}} is compared with unexpected type. > 2. {{match may not be exhaustive}} is detected at {{match}} > 3. discarding unmoored doc comment > The first one is more serious since it may also generate incorrect code in > Spark 2.3. > {code:java} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:63: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] def isIntersected(r1: ValueInterval, r2: ValueInterval): > Boolean = (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:79: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala:67: > match may not be exhaustive. > [error] It would fail on the following inputs: (ArrayType(_, _), _), (_, > ArrayData()), (_, _) > [error] [warn] (endpointsExpression.dataType, endpointsExpression.eval()) > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:470: > match may not be exhaustive. > [error] It would fail on the following inputs: NewFunctionSpec(_, None, > Some(_)), NewFunctionSpec(_, Some(_), None) > [error] [warn] newFunction match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:94: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely always compare unequal > [error] [warn] if (eval.isNull != "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:126: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:133: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:709: > match may not be exhaustive. > [error] It would fail on the following input: Schema((x: > org.apache.spark.sql.types.DataType forSome x not in > org.apache.spark.sql.types.StructType), _) > [error] [warn] def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:90: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (inputs.map(_.isNull).forall(_ == "false")) { > [error] [warn] > {code} > {code:java} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:410: > discarding unmoored doc comment > [error] [warn] /** > [error] [warn] > [error] [warn] >
[jira] [Updated] (SPARK-25036) Scala 2.12 issues: Compilation error with sbt
[ https://issues.apache.org/jira/browse/SPARK-25036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-25036: - Description: When compiling with sbt, the following errors occur: There are -two- three types: 1. {{ExprValue.isNull}} is compared with unexpected type. 2. {{match may not be exhaustive}} is detected at {{match}} 3. discarding unmoored doc comment The first one is more serious since it may also generate incorrect code in Spark 2.3. {code:java} [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:63: match may not be exhaustive. [error] It would fail on the following inputs: (NumericValueInterval(_, _), _), (_, NumericValueInterval(_, _)), (_, _) [error] [warn] def isIntersected(r1: ValueInterval, r2: ValueInterval): Boolean = (r1, r2) match { [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:79: match may not be exhaustive. [error] It would fail on the following inputs: (NumericValueInterval(_, _), _), (_, NumericValueInterval(_, _)), (_, _) [error] [warn] (r1, r2) match { [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala:67: match may not be exhaustive. [error] It would fail on the following inputs: (ArrayType(_, _), _), (_, ArrayData()), (_, _) [error] [warn] (endpointsExpression.dataType, endpointsExpression.eval()) match { [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:470: match may not be exhaustive. [error] It would fail on the following inputs: NewFunctionSpec(_, None, Some(_)), NewFunctionSpec(_, Some(_), None) [error] [warn] newFunction match { [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:94: org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are unrelated: they will most likely always compare unequal [error] [warn] if (eval.isNull != "true") { [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:126: org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are unrelated: they will most likely never compare equal [error] [warn] if (eval.isNull == "true") { [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:133: org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are unrelated: they will most likely never compare equal [error] [warn] if (eval.isNull == "true") { [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:709: match may not be exhaustive. [error] It would fail on the following input: Schema((x: org.apache.spark.sql.types.DataType forSome x not in org.apache.spark.sql.types.StructType), _) [error] [warn] def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] match { [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:90: org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are unrelated: they will most likely never compare equal [error] [warn] if (inputs.map(_.isNull).forall(_ == "false")) { [error] [warn] {code} {code:java} [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:410: discarding unmoored doc comment [error] [warn] /** [error] [warn] [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:441: discarding unmoored doc comment [error] [warn] /** [error] [warn] ... [error] [warn] /home/ishizaki/Spark/PR/scala212/spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:440: discarding unmoored doc comment [error] [warn] /** [error] [warn] {code} was: When compiling with sbt, the following errors occur: There are two types: 1. {{ExprValue.isNull}} is compared with unexpected type. 1. {{match may not be exhaustive}} is detected at {{match}} The first one is more serious since it may also generate incorrect code in Spark 2.3. {code} [error] [warn]
[jira] [Comment Edited] (SPARK-25036) Scala 2.12 issues: Compilation error with sbt
[ https://issues.apache.org/jira/browse/SPARK-25036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575137#comment-16575137 ] Kazuaki Ishizaki edited comment on SPARK-25036 at 8/9/18 5:05 PM: -- Another type of compilation error is found. Added the log to the description was (Author: kiszk): Another type of compilation error is found > Scala 2.12 issues: Compilation error with sbt > - > > Key: SPARK-25036 > URL: https://issues.apache.org/jira/browse/SPARK-25036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > When compiling with sbt, the following errors occur: > There are two types: > 1. {{ExprValue.isNull}} is compared with unexpected type. > 1. {{match may not be exhaustive}} is detected at {{match}} > The first one is more serious since it may also generate incorrect code in > Spark 2.3. > {code} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:63: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] def isIntersected(r1: ValueInterval, r2: ValueInterval): > Boolean = (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:79: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala:67: > match may not be exhaustive. > [error] It would fail on the following inputs: (ArrayType(_, _), _), (_, > ArrayData()), (_, _) > [error] [warn] (endpointsExpression.dataType, endpointsExpression.eval()) > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:470: > match may not be exhaustive. > [error] It would fail on the following inputs: NewFunctionSpec(_, None, > Some(_)), NewFunctionSpec(_, Some(_), None) > [error] [warn] newFunction match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:94: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely always compare unequal > [error] [warn] if (eval.isNull != "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:126: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:133: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:709: > match may not be exhaustive. > [error] It would fail on the following input: Schema((x: > org.apache.spark.sql.types.DataType forSome x not in > org.apache.spark.sql.types.StructType), _) > [error] [warn] def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:90: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (inputs.map(_.isNull).forall(_ == "false")) { > [error] [warn] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional
[jira] [Reopened] (SPARK-25036) Scala 2.12 issues: Compilation error with sbt
[ https://issues.apache.org/jira/browse/SPARK-25036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki reopened SPARK-25036: -- Another type of compilation error is found > Scala 2.12 issues: Compilation error with sbt > - > > Key: SPARK-25036 > URL: https://issues.apache.org/jira/browse/SPARK-25036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > When compiling with sbt, the following errors occur: > There are two types: > 1. {{ExprValue.isNull}} is compared with unexpected type. > 1. {{match may not be exhaustive}} is detected at {{match}} > The first one is more serious since it may also generate incorrect code in > Spark 2.3. > {code} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:63: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] def isIntersected(r1: ValueInterval, r2: ValueInterval): > Boolean = (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:79: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala:67: > match may not be exhaustive. > [error] It would fail on the following inputs: (ArrayType(_, _), _), (_, > ArrayData()), (_, _) > [error] [warn] (endpointsExpression.dataType, endpointsExpression.eval()) > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:470: > match may not be exhaustive. > [error] It would fail on the following inputs: NewFunctionSpec(_, None, > Some(_)), NewFunctionSpec(_, Some(_), None) > [error] [warn] newFunction match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:94: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely always compare unequal > [error] [warn] if (eval.isNull != "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:126: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:133: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:709: > match may not be exhaustive. > [error] It would fail on the following input: Schema((x: > org.apache.spark.sql.types.DataType forSome x not in > org.apache.spark.sql.types.StructType), _) > [error] [warn] def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:90: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (inputs.map(_.isNull).forall(_ == "false")) { > [error] [warn] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25024) Update mesos documentation to be clear about security supported
[ https://issues.apache.org/jira/browse/SPARK-25024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575134#comment-16575134 ] Arthur Rand commented on SPARK-25024: - Just to chime in here. Unless a lot has changed, all of the Spark security features when running on Mesos are available in "vanilla Mesos", as long as you have the required plug-ins. The problem is that the users' suite of security plug-ins is impossible to predict so the spark docs only tell you how to _configure Spark_. Some of the questions you bring up [~tgraves], depend on the specific setup, for example auth when submitting jobs. However, I think it's safe to say that if you have a _secure Mesos_ cluster (meaning you have some form of plug-ins) then it'll work with Spark. > Update mesos documentation to be clear about security supported > --- > > Key: SPARK-25024 > URL: https://issues.apache.org/jira/browse/SPARK-25024 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.2 >Reporter: Thomas Graves >Priority: Major > > I was reading through our mesos deployment docs and security docs and its not > clear at all what type of security and how to set it up for mesos. I think > we should clarify this and have something about exactly what is supported and > what is not. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25059) Exception while executing an action on DataFrame that read Json
[ https://issues.apache.org/jira/browse/SPARK-25059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575129#comment-16575129 ] Kazuaki Ishizaki commented on SPARK-25059: -- Thank you for reporting the issue. Could you please try this using Spark 2.3? This is because the community extensively investigated and fixed these issues in Spark 2.3 > Exception while executing an action on DataFrame that read Json > --- > > Key: SPARK-25059 > URL: https://issues.apache.org/jira/browse/SPARK-25059 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.2.0 > Environment: AWS EMR 5.8.0 > Spark 2.2.0 > >Reporter: Kunal Goswami >Priority: Major > Labels: Spark-SQL > > When I try to read ~9600 Json files using > {noformat} > val test = spark.read.option("header", true).option("inferSchema", > true).json(paths: _*) {noformat} > > Any action on the above created data frame results in: > {noformat} > Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "apply2_1$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class "org.apache.spark.sql.catalyst.expressions.Generat[73/1850] > pecificUnsafeProjection" grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949) > at org.codehaus.janino.CodeContext.write(CodeContext.java:839) > at org.codehaus.janino.UnitCompiler.writeOpcode(UnitCompiler.java:11081) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4546) > at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762) > at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180) > at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112) > at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370) > at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at > org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1436) > at org.codehaus.janino.UnitCompiler.access$1600(UnitCompiler.java:206) > at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1376) > at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1370) > at org.codehaus.janino.Java$Block.accept(Java.java:2471) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2220) > at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1378) > at > org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1370) > at org.codehaus.janino.Java$IfStatement.accept(Java.java:2621) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at > org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1436) > at org.codehaus.janino.UnitCompiler.access$1600(UnitCompiler.java:206) > at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1376) > at org.codehaus.janino.UnitCompiler$6.visitBlock(UnitCompiler.java:1370) > at org.codehaus.janino.Java$Block.accept(Java.java:2471) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2220) > at org.codehaus.janino.UnitCompiler.access$1800(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1378) > at >
[jira] [Commented] (SPARK-25036) Scala 2.12 issues: Compilation error with sbt
[ https://issues.apache.org/jira/browse/SPARK-25036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575116#comment-16575116 ] Apache Spark commented on SPARK-25036: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/22058 > Scala 2.12 issues: Compilation error with sbt > - > > Key: SPARK-25036 > URL: https://issues.apache.org/jira/browse/SPARK-25036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > When compiling with sbt, the following errors occur: > There are two types: > 1. {{ExprValue.isNull}} is compared with unexpected type. > 1. {{match may not be exhaustive}} is detected at {{match}} > The first one is more serious since it may also generate incorrect code in > Spark 2.3. > {code} > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:63: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] def isIntersected(r1: ValueInterval, r2: ValueInterval): > Boolean = (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala:79: > match may not be exhaustive. > [error] It would fail on the following inputs: (NumericValueInterval(_, _), > _), (_, NumericValueInterval(_, _)), (_, _) > [error] [warn] (r1, r2) match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala:67: > match may not be exhaustive. > [error] It would fail on the following inputs: (ArrayType(_, _), _), (_, > ArrayData()), (_, _) > [error] [warn] (endpointsExpression.dataType, endpointsExpression.eval()) > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala:470: > match may not be exhaustive. > [error] It would fail on the following inputs: NewFunctionSpec(_, None, > Some(_)), NewFunctionSpec(_, Some(_), None) > [error] [warn] newFunction match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:94: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely always compare unequal > [error] [warn] if (eval.isNull != "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:126: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:133: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (eval.isNull == "true") { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala:709: > match may not be exhaustive. > [error] It would fail on the following input: Schema((x: > org.apache.spark.sql.types.DataType forSome x not in > org.apache.spark.sql.types.StructType), _) > [error] [warn] def attributesFor[T: TypeTag]: Seq[Attribute] = schemaFor[T] > match { > [error] [warn] > [error] [warn] > /home/ishizaki/Spark/PR/scala212/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:90: > org.apache.spark.sql.catalyst.expressions.codegen.ExprValue and String are > unrelated: they will most likely never compare equal > [error] [warn] if (inputs.map(_.isNull).forall(_ == "false")) { > [error] [warn] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575097#comment-16575097 ] Imran Rashid commented on SPARK-23207: -- yeah I agree with Tom, silent data loss is a major bug. I don't actually think the chance to hit this is so small. > Shuffle+Repartition on an DataFrame could lead to incorrect answers > --- > > Key: SPARK-23207 > URL: https://issues.apache.org/jira/browse/SPARK-23207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo >Assignee: Jiang Xingbo >Priority: Blocker > Labels: correctness > Fix For: 2.3.0 > > > Currently shuffle repartition uses RoundRobinPartitioning, the generated > result is nondeterministic since the sequence of input rows are not > determined. > The bug can be triggered when there is a repartition call following a shuffle > (which would lead to non-deterministic row ordering), as the pattern shows > below: > upstream stage -> repartition stage -> result stage > (-> indicate a shuffle) > When one of the executors process goes down, some tasks on the repartition > stage will be retried and generate inconsistent ordering, and some tasks of > the result stage will be retried generating different data. > The following code returns 931532, instead of 100: > {code} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x => > x > }.repartition(200).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) { > throw new Exception("pkill -f java".!!) > } > x > } > res.distinct().count() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22634) Update Bouncy castle dependency
[ https://issues.apache.org/jira/browse/SPARK-22634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575066#comment-16575066 ] Sean Owen commented on SPARK-22634: --- That text is part of Netty's NOTICE file, which must be reproduced, but, it isn't actually pulled in by Netty according to mvn. It says only jets3t uses it, and you say jets3t isn't used directly here. I'd say this is resolved if SPARK-23654 is resolved then. There's another reason to remove this if it's not necessary. Being crypto software, I think we need to update an ECCN for Spark if it's distributed. I'm investigating that separately. But all the better to remove it if not needed. > Update Bouncy castle dependency > --- > > Key: SPARK-22634 > URL: https://issues.apache.org/jira/browse/SPARK-22634 > Project: Spark > Issue Type: Task > Components: Spark Core, SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Lior Regev >Assignee: Sean Owen >Priority: Minor > Fix For: 2.3.0 > > > Spark's usage of jets3t library as well as Spark's own Flume and Kafka > streaming uses bouncy castle version 1.51 > This is an outdated version as the latest one is 1.58 > This, in turn renders packages such as > [spark-hadoopcryptoledger-ds|https://github.com/ZuInnoTe/spark-hadoopcryptoledger-ds] > unusable since these require 1.58 and spark's distributions come along with > 1.51 > My own attempt was to run on EMR, and since I automatically get all of > spark's dependecies (bouncy castle 1.51 being one of them) into the > classpath, using the library to parse blockchain data failed due to missing > functionality. > I have also opened an > [issue|https://bitbucket.org/jmurty/jets3t/issues/242/bouncycastle-dependency] > with jets3t to update their dependecy as well, but along with that Spark > would have to update it's own or at least be packaged with a newer version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25078) Standalone does not work with spark.authenticate.secret and deploy-mode=cluster
[ https://issues.apache.org/jira/browse/SPARK-25078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-25078: - Summary: Standalone does not work with spark.authenticate.secret and deploy-mode=cluster (was: Standalone cluster mode does not work with spark.authenticate.secret) > Standalone does not work with spark.authenticate.secret and > deploy-mode=cluster > --- > > Key: SPARK-25078 > URL: https://issues.apache.org/jira/browse/SPARK-25078 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > > When running a spark standalone cluster with spark.authenticate.secret setup, > you cannot submit a program in cluster mode, even with the right secret. The > driver fails with: > {noformat} > 18/08/09 08:17:21 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view permissions: Set(systest); groups > with view permissions: Set(); users with modify permissions: Set(systest); > groups with modify permissions: Set() > 18/08/09 08:17:21 ERROR SparkContext: Error initializing SparkContext. > java.lang.IllegalArgumentException: requirement failed: A secret key must be > specified via the spark.authenticate.secret config. > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.SecurityManager.initializeAuth(SecurityManager.scala:361) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:238) > at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175) > at > org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257) > at org.apache.spark.SparkContext.(SparkContext.scala:424) > ... > {noformat} > but its actually doing the wrong check in > {{SecurityManager.initializeAuth()}}. The secret is there, its just in an > environment variable {{_SPARK_AUTH_SECRET}} (so its not visible to another > process). > *Workaround*: In your program, you can pass in a dummy secret to your spark > conf. It doesn't matter what it is at all, later it'll be ignored and when > establishing connections, the secret from the env variable will be used. Eg. > {noformat} > val conf = new SparkConf() > conf.setIfMissing("spark.authenticate.secret", "doesn't matter") > val sc = new SparkContext(conf) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25078) Standalone cluster mode does not work with spark.authenticate.secret
Imran Rashid created SPARK-25078: Summary: Standalone cluster mode does not work with spark.authenticate.secret Key: SPARK-25078 URL: https://issues.apache.org/jira/browse/SPARK-25078 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 2.4.0 Reporter: Imran Rashid When running a spark standalone cluster with spark.authenticate.secret setup, you cannot submit a program in cluster mode, even with the right secret. The driver fails with: {noformat} 18/08/09 08:17:21 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(systest); groups with view permissions: Set(); users with modify permissions: Set(systest); groups with modify permissions: Set() 18/08/09 08:17:21 ERROR SparkContext: Error initializing SparkContext. java.lang.IllegalArgumentException: requirement failed: A secret key must be specified via the spark.authenticate.secret config. at scala.Predef$.require(Predef.scala:224) at org.apache.spark.SecurityManager.initializeAuth(SecurityManager.scala:361) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:238) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257) at org.apache.spark.SparkContext.(SparkContext.scala:424) ... {noformat} but its actually doing the wrong check in {{SecurityManager.initializeAuth()}}. The secret is there, its just in an environment variable {{_SPARK_AUTH_SECRET}} (so its not visible to another process). *Workaround*: In your program, you can pass in a dummy secret to your spark conf. It doesn't matter what it is at all, later it'll be ignored and when establishing connections, the secret from the env variable will be used. Eg. {noformat} val conf = new SparkConf() conf.setIfMissing("spark.authenticate.secret", "doesn't matter") val sc = new SparkContext(conf) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25077) Delete unused variable in WindowExec
[ https://issues.apache.org/jira/browse/SPARK-25077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25077: Assignee: Apache Spark > Delete unused variable in WindowExec > > > Key: SPARK-25077 > URL: https://issues.apache.org/jira/browse/SPARK-25077 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Li Yuanjian >Assignee: Apache Spark >Priority: Trivial > > Delete the unused variable `inputFields` in WindowExec, avoid making others > confused while reading the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25077) Delete unused variable in WindowExec
[ https://issues.apache.org/jira/browse/SPARK-25077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575053#comment-16575053 ] Apache Spark commented on SPARK-25077: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/22057 > Delete unused variable in WindowExec > > > Key: SPARK-25077 > URL: https://issues.apache.org/jira/browse/SPARK-25077 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Li Yuanjian >Priority: Trivial > > Delete the unused variable `inputFields` in WindowExec, avoid making others > confused while reading the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25077) Delete unused variable in WindowExec
[ https://issues.apache.org/jira/browse/SPARK-25077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25077: Assignee: (was: Apache Spark) > Delete unused variable in WindowExec > > > Key: SPARK-25077 > URL: https://issues.apache.org/jira/browse/SPARK-25077 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Li Yuanjian >Priority: Trivial > > Delete the unused variable `inputFields` in WindowExec, avoid making others > confused while reading the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25077) Delete unused variable in WindowExec
Li Yuanjian created SPARK-25077: --- Summary: Delete unused variable in WindowExec Key: SPARK-25077 URL: https://issues.apache.org/jira/browse/SPARK-25077 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Li Yuanjian Delete the unused variable `inputFields` in WindowExec, avoid making others confused while reading the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources
[ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575044#comment-16575044 ] Kyle Prifogle commented on SPARK-12449: --- [~oae] as far as I can tell the issue lives on here: https://issues.apache.org/jira/browse/SPARK-22386 > Pushing down arbitrary logical plans to data sources > > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Stephan Kessler >Priority: Major > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources > for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows > to push down filters and projects pruning unnecessary fields and rows > directly in the data source. > However, data sources such as SQL Engines are capable of doing even more > preprocessing, e.g., evaluating aggregates. This is beneficial because it > would reduce the amount of data transferred from the source to Spark. The > existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to > defer the processing of arbitrary logical plans to the data source. We have > already shown the details at the Spark Summit 2015 Europe > [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24502) flaky test: UnsafeRowSerializerSuite
[ https://issues.apache.org/jira/browse/SPARK-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575026#comment-16575026 ] Paul Praet commented on SPARK-24502: We are only creating SparkSessions with a SparkSessionBuilder.getOrCreate() and then calling sparkSession.close() when we are done. Can you confirm this is not enough then ? > flaky test: UnsafeRowSerializerSuite > > > Key: SPARK-24502 > URL: https://issues.apache.org/jira/browse/SPARK-24502 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: flaky-test > Fix For: 2.3.2, 2.4.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4193/testReport/org.apache.spark.sql.execution/UnsafeRowSerializerSuite/toUnsafeRow___test_helper_method/ > {code} > sbt.ForkMain$ForkError: java.lang.IllegalStateException: LiveListenerBus is > stopped. > at > org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97) > at > org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80) > at > org.apache.spark.sql.internal.SharedState.(SharedState.scala:93) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:120) > at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:119) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286) > at > org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42) > at > org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:95) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:94) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:126) > at > org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:54) > at > org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:157) > at > org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:150) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$unsafeRowConverter(UnsafeRowSerializerSuite.scala:54) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$toUnsafeRow(UnsafeRowSerializerSuite.scala:49) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:63) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:60) > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25063) Rename class KnowNotNull to KnownNotNull
[ https://issues.apache.org/jira/browse/SPARK-25063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-25063. - Resolution: Fixed Assignee: Maryann Xue Fix Version/s: 2.4.0 > Rename class KnowNotNull to KnownNotNull > > > Key: SPARK-25063 > URL: https://issues.apache.org/jira/browse/SPARK-25063 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maryann Xue >Assignee: Maryann Xue >Priority: Trivial > Fix For: 2.4.0 > > > It's a class name typo checked in through SPARK-24891 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574956#comment-16574956 ] Thomas Graves commented on SPARK-23207: --- ok, I guess I disagree with that. Any correctness bug is very bad in my opinion, corrupt/lost data is much worse than taking a performance hit as corrupt/lost data could easily result in to lost revenue or errors in business critical data. > Shuffle+Repartition on an DataFrame could lead to incorrect answers > --- > > Key: SPARK-23207 > URL: https://issues.apache.org/jira/browse/SPARK-23207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo >Assignee: Jiang Xingbo >Priority: Blocker > Labels: correctness > Fix For: 2.3.0 > > > Currently shuffle repartition uses RoundRobinPartitioning, the generated > result is nondeterministic since the sequence of input rows are not > determined. > The bug can be triggered when there is a repartition call following a shuffle > (which would lead to non-deterministic row ordering), as the pattern shows > below: > upstream stage -> repartition stage -> result stage > (-> indicate a shuffle) > When one of the executors process goes down, some tasks on the repartition > stage will be retried and generate inconsistent ordering, and some tasks of > the result stage will be retried generating different data. > The following code returns 931532, instead of 100: > {code} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x => > x > }.repartition(200).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) { > throw new Exception("pkill -f java".!!) > } > x > } > res.distinct().count() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25076) SQLConf should not be retrieved from a stopped SparkSession
[ https://issues.apache.org/jira/browse/SPARK-25076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574915#comment-16574915 ] Apache Spark commented on SPARK-25076: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/22056 > SQLConf should not be retrieved from a stopped SparkSession > --- > > Key: SPARK-25076 > URL: https://issues.apache.org/jira/browse/SPARK-25076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25076) SQLConf should not be retrieved from a stopped SparkSession
[ https://issues.apache.org/jira/browse/SPARK-25076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25076: Assignee: Wenchen Fan (was: Apache Spark) > SQLConf should not be retrieved from a stopped SparkSession > --- > > Key: SPARK-25076 > URL: https://issues.apache.org/jira/browse/SPARK-25076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25076) SQLConf should not be retrieved from a stopped SparkSession
[ https://issues.apache.org/jira/browse/SPARK-25076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25076: Assignee: Apache Spark (was: Wenchen Fan) > SQLConf should not be retrieved from a stopped SparkSession > --- > > Key: SPARK-25076 > URL: https://issues.apache.org/jira/browse/SPARK-25076 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24502) flaky test: UnsafeRowSerializerSuite
[ https://issues.apache.org/jira/browse/SPARK-24502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574908#comment-16574908 ] Wenchen Fan commented on SPARK-24502: - There is no resource leakage. Users need to manage the active and default SparkSession manually, by calling `get/set/clearActiveSession` and `get/set/clearDefaultSession`. This is not very user-friendly, but it's what it is. Unfortunately, our test framework had a bug: it didn't clear active/default session when a spark session is stopped. This causes a problem because we use `SQLConf.get` a lot in the test code. My PR fixed it. It's totally fine if you create and close multiple spark sessions in the production code, there is no resource leak. But you need to pay attention if you get active/default session. It's the same in Spark 2.2. I'm adding a safeguard for SQLConf.get: https://issues.apache.org/jira/browse/SPARK-25076 . Hopefully this problem can be eased. > flaky test: UnsafeRowSerializerSuite > > > Key: SPARK-24502 > URL: https://issues.apache.org/jira/browse/SPARK-24502 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: flaky-test > Fix For: 2.3.2, 2.4.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4193/testReport/org.apache.spark.sql.execution/UnsafeRowSerializerSuite/toUnsafeRow___test_helper_method/ > {code} > sbt.ForkMain$ForkError: java.lang.IllegalStateException: LiveListenerBus is > stopped. > at > org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97) > at > org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80) > at > org.apache.spark.sql.internal.SharedState.(SharedState.scala:93) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:120) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:120) > at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:119) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286) > at > org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42) > at > org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:95) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:95) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:94) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:126) > at > org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:54) > at > org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:157) > at > org.apache.spark.sql.catalyst.expressions.UnsafeProjection$.create(Projection.scala:150) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$unsafeRowConverter(UnsafeRowSerializerSuite.scala:54) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite.org$apache$spark$sql$execution$UnsafeRowSerializerSuite$$toUnsafeRow(UnsafeRowSerializerSuite.scala:49) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:63) > at > org.apache.spark.sql.execution.UnsafeRowSerializerSuite$$anonfun$2.apply(UnsafeRowSerializerSuite.scala:60) > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574903#comment-16574903 ] Jiang Xingbo commented on SPARK-23207: -- This affects the 2.2 and lower versions, the reason why we didn't backport the patch is that it can cause huge perf regression to `repartition()` operation, and chance to hit this correctness bug is small. cc [~smilegator][~sameerag] > Shuffle+Repartition on an DataFrame could lead to incorrect answers > --- > > Key: SPARK-23207 > URL: https://issues.apache.org/jira/browse/SPARK-23207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo >Assignee: Jiang Xingbo >Priority: Blocker > Labels: correctness > Fix For: 2.3.0 > > > Currently shuffle repartition uses RoundRobinPartitioning, the generated > result is nondeterministic since the sequence of input rows are not > determined. > The bug can be triggered when there is a repartition call following a shuffle > (which would lead to non-deterministic row ordering), as the pattern shows > below: > upstream stage -> repartition stage -> result stage > (-> indicate a shuffle) > When one of the executors process goes down, some tasks on the repartition > stage will be retried and generate inconsistent ordering, and some tasks of > the result stage will be retried generating different data. > The following code returns 931532, instead of 100: > {code} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x => > x > }.repartition(200).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) { > throw new Exception("pkill -f java".!!) > } > x > } > res.distinct().count() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25076) SQLConf should not be retrieved from a stopped SparkSession
Wenchen Fan created SPARK-25076: --- Summary: SQLConf should not be retrieved from a stopped SparkSession Key: SPARK-25076 URL: https://issues.apache.org/jira/browse/SPARK-25076 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574886#comment-16574886 ] Thomas Graves commented on SPARK-23207: --- [~jiangxb1987] ^ > Shuffle+Repartition on an DataFrame could lead to incorrect answers > --- > > Key: SPARK-23207 > URL: https://issues.apache.org/jira/browse/SPARK-23207 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jiang Xingbo >Assignee: Jiang Xingbo >Priority: Blocker > Labels: correctness > Fix For: 2.3.0 > > > Currently shuffle repartition uses RoundRobinPartitioning, the generated > result is nondeterministic since the sequence of input rows are not > determined. > The bug can be triggered when there is a repartition call following a shuffle > (which would lead to non-deterministic row ordering), as the pattern shows > below: > upstream stage -> repartition stage -> result stage > (-> indicate a shuffle) > When one of the executors process goes down, some tasks on the repartition > stage will be retried and generate inconsistent ordering, and some tasks of > the result stage will be retried generating different data. > The following code returns 931532, instead of 100: > {code} > import scala.sys.process._ > import org.apache.spark.TaskContext > val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x => > x > }.repartition(200).map { x => > if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) { > throw new Exception("pkill -f java".!!) > } > x > } > res.distinct().count() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error req
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574864#comment-16574864 ] Sujith commented on SPARK-25073: Seems to be you are right, Message is bit misleading to the user. As per my understanding there is also dependency in yarn.nodemanager.resource.memory-mb parameter. *_yarn.nodemanager.resource.memory-mb:_* Amount of physical memory, in MB, that can be allocated for containers. It means the amount of memory YARN can utilize on this node and therefore this property should be lower then the total memory of that machine. *_yarn.scheduler.maximum-allocation-mb_* It defines the maximum memory allocation available for a container in MB, it means RM can only allocate memory to containers in increments of {{"yarn.scheduler.minimum-allocation-mb"}} and not exceed {{"yarn.scheduler.maximum-allocation-mb"}} and It should not be more then total allocated memory of the Node. I will try to analyze more on this and i will raise PR if it requires a fix. Thanks. > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL: https://issues.apache.org/jira/browse/SPARK-25073 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0, 2.3.1 >Reporter: vivek kumar >Priority: Minor > > When the yarn.nodemanager.resource.memory-mb and/or > yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports > an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting > the error request to be more around yarn.scheduler.maximum-allocation-mb' > and/or 'yarn.nodemanager.resource.memory-mb'. > > Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and > yarn.nodemanager.resource.memory-mb =8G > # Launch shell on Yarn with am.memory less than nodemanager.resource memory > but greater than yarn.scheduler.maximum-allocation-mb > eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g > Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) > is above the max threshold (4096 MB) of this cluster! Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'. > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and > yarn.nodemanager.resource.memory-mb =8g > a. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory but less than yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* > Error : > java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > b. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory and yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* > Error: > java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Expected* : Error request for scenario2 should be more around > yarn.scheduler.maximum-allocation-mb' and/or > 'yarn.nodemanager.resource.memory-mb'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574841#comment-16574841 ] Sean Owen commented on SPARK-25044: --- I tried this – it's not hard – but the implementation method signature in this case still uses Object, not ints or longs. Actually, this functionality seems to only be used in the SQL Analyzer, and only to figure out whether the args are primitive, and then too only to decide if it's necessary to handle null values of that argument. I tried simply changing the Analyzer to ignore whether the arg is primitive, and not skip the check if it's primitive. It causes some tests to pass, but not all of them. I might next investigate whether it's feasible to fix this by not analyzing primitive-ness of arguments [~smilegator] {code:java} - SPARK-11725: correctly handle null inputs for ScalaUDF *** FAILED *** == FAIL: Plans do not match === !Project [if (isnull(a#0)) null else UDF(knownotnull(a#0)) AS #0] Project [UDF(a#0) AS #0] +- LocalRelation , [a#0, b#0, c#0, d#0, e#0] +- LocalRelation , [a#0, b#0, c#0, d#0, e#0] (PlanTest.scala:119){code} > Address translation of LMF closure primitive args to Object in Scala 2.12 > - > > Key: SPARK-25044 > URL: https://issues.apache.org/jira/browse/SPARK-25044 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Major > > A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 > Fix HandleNullInputsForUDF rule": > {code:java} > - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED *** > Results do not match for query: > ... > == Results == > == Results == > !== Correct Answer - 3 == == Spark Answer - 3 == > !struct<> struct > ![0,10,null] [0,10,0] > ![1,12,null] [1,12,1] > ![2,14,null] [2,14,2] (QueryTest.scala:163){code} > You can kind of get what's going on reading the test: > {code:java} > test("SPARK-24891 Fix HandleNullInputsForUDF rule") { > // assume(!ClosureCleanerSuite2.supportsLMFs) > // This test won't test what it intends to in 2.12, as lambda metafactory > closures > // have arg types that are not primitive, but Object > val udf1 = udf({(x: Int, y: Int) => x + y}) > val df = spark.range(0, 3).toDF("a") > .withColumn("b", udf1($"a", udf1($"a", lit(10 > .withColumn("c", udf1($"a", lit(null))) > val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed > comparePlans(df.logicalPlan, plan) > checkAnswer( > df, > Seq( > Row(0, 10, null), > Row(1, 12, null), > Row(2, 14, null))) > }{code} > > It seems that the closure that is fed in as a UDF changes behavior, in a way > that primitive-type arguments are handled differently. For example an Int > argument, when fed 'null', acts like 0. > I'm sure it's a difference in the LMF closure and how its types are > understood, but not exactly sure of the cause yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574839#comment-16574839 ] Guillaume Massé commented on SPARK-25075: - Scala 2.13 is currently in the milestone phases (We are at 2.13.0-M4 at the moment of writing this). Since Sparks build with 2.12, we can start the migration to 2.13.0-M4 to find any incompatibilities. It's a good timing to add anything to 2.13.X before we finalize the collection API. To ease the process of the migration to 2.13 the Scala Center and the Scala team created automatic migration rules and compatibility library available at [https://github.com/scala/scala-collection-compat.] > Build and test Spark against Scala 2.13 > --- > > Key: SPARK-25075 > URL: https://issues.apache.org/jira/browse/SPARK-25075 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra >Affects Versions: 2.1.0 >Reporter: Guillaume Massé >Priority: Major > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.13 milestone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25075) Build and test Spark against Scala 2.13
Guillaume Massé created SPARK-25075: --- Summary: Build and test Spark against Scala 2.13 Key: SPARK-25075 URL: https://issues.apache.org/jira/browse/SPARK-25075 Project: Spark Issue Type: Umbrella Components: Build, Project Infra Affects Versions: 2.1.0 Reporter: Guillaume Massé This umbrella JIRA tracks the requirements for building and testing Spark against the current Scala 2.13 milestone. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25047. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22032 [https://github.com/apache/spark/pull/22032] > Can't assign SerializedLambda to scala.Function1 in deserialization of > BucketedRandomProjectionLSHModel > --- > > Key: SPARK-25047 > URL: https://issues.apache.org/jira/browse/SPARK-25047 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.4.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > Fix For: 2.4.0 > > > Another distinct test failure: > {code:java} > - BucketedRandomProjectionLSH: streaming transform *** FAILED *** > org.apache.spark.sql.streaming.StreamingQueryException: Query [id = > 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = > 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > ... > Cause: java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) > ...{code} > Here the different nature of a Java 8 LMF closure trips of Java > serialization/deserialization. I think this can be patched by manually > implementing the Java serialization here, and don't see other instances (yet). > Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25047) Can't assign SerializedLambda to scala.Function1 in deserialization of BucketedRandomProjectionLSHModel
[ https://issues.apache.org/jira/browse/SPARK-25047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-25047: - Assignee: Sean Owen > Can't assign SerializedLambda to scala.Function1 in deserialization of > BucketedRandomProjectionLSHModel > --- > > Key: SPARK-25047 > URL: https://issues.apache.org/jira/browse/SPARK-25047 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 2.4.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > > Another distinct test failure: > {code:java} > - BucketedRandomProjectionLSH: streaming transform *** FAILED *** > org.apache.spark.sql.streaming.StreamingQueryException: Query [id = > 7f34fb07-a718-4488-b644-d27cfd29ff6c, runId = > 0bbc0ba2-2952-4504-85d6-8aba877ba01b] terminated with exception: Job aborted > due to stage failure: Task 0 in stage 16.0 failed 1 times, most recent > failure: Lost task 0.0 in stage 16.0 (TID 16, localhost, executor driver): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > ... > Cause: java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel.hashFunction of > type scala.Function1 in instance of > org.apache.spark.ml.feature.BucketedRandomProjectionLSHModel > at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233) > at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405) > at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2284) > ...{code} > Here the different nature of a Java 8 LMF closure trips of Java > serialization/deserialization. I think this can be patched by manually > implementing the Java serialization here, and don't see other instances (yet). > Also wondering if this "val" can be a "def". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error reque
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vivek kumar updated SPARK-25073: Description: When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting the error request to be more around yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and yarn.nodemanager.resource.memory-mb =8G # Launch shell on Yarn with am.memory less than nodemanager.resource memory but greater than yarn.scheduler.maximum-allocation-mb eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) is above the max threshold (4096 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-allocation-mb'. at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and yarn.nodemanager.resource.memory-mb =8g a. Launch shell on Yarn with am.memory greater than nodemanager.resource memory but less than yarn.scheduler.maximum-allocation-mb eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* Error : java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is above the max threshold (*8096 MB*) of this cluster! *Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.* at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) b. Launch shell on Yarn with am.memory greater than nodemanager.resource memory and yarn.scheduler.maximum-allocation-mb eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* Error: java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is above the max threshold (*8096 MB*) of this cluster! *Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.* at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) *Expected* : Error request for scenario2 should be more around yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. was: When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting the error request to be more around yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and yarn.nodemanager.resource.memory-mb =8G # Launch shell on Yarn with am.memory less than nodemanager.resource memory but greater than yarn.scheduler.maximum-allocation-mb eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) is above the max threshold (4096 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-allocation-mb'. at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) Scenario 2. yarn.scheduler.maximum-allocation-mb =15g and yarn.nodemanager.resource.memory-mb =8g a.Launch shell on Yarn with am.memory greater than nodemanager.resource memory but less than yarn.scheduler.maximum-allocation-mb eg; spark-shell --master yarn --conf spark.yarn.am.memory=10g Error : java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is above the max threshold (*8096 MB*) of this cluster! Please increase the value of *'yarn.scheduler.maximum-allocation-mb'*. at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) b.Launch shell on Yarn with am.memory greater than nodemanager.resource memory and yarn.scheduler.maximum-allocation-mb eg; spark-shell --master yarn --conf spark.yarn.am.memory=17g Error: java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is above the max threshold (*8096 MB*) of this cluster! Please increase the value of *'yarn.scheduler.maximum-allocation-mb'*. at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) Expected : Error request for scenario2 should be more around yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL:
[jira] [Created] (SPARK-25074) Implement maxNumConcurrentTasks() in MesosFineGrainedSchedulerBackend
Jiang Xingbo created SPARK-25074: Summary: Implement maxNumConcurrentTasks() in MesosFineGrainedSchedulerBackend Key: SPARK-25074 URL: https://issues.apache.org/jira/browse/SPARK-25074 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 2.4.0 Reporter: Jiang Xingbo We added a new method `maxNumConcurrentTasks()` to `SchedulerBackend` to get the max number of tasks that can be concurrent launched currently. However the method is not implemented in `MesosFineGrainedSchedulerBackend`, so submit a job containing barrier stage shall always fail fast with `MesosFineGrainedSchedulerBackend` resource manager. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23415) BufferHolderSparkSubmitSuite is flaky
[ https://issues.apache.org/jira/browse/SPARK-23415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-23415: --- Assignee: Kazuaki Ishizaki > BufferHolderSparkSubmitSuite is flaky > - > > Key: SPARK-23415 > URL: https://issues.apache.org/jira/browse/SPARK-23415 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > The test suite fails due to 60-second timeout sometimes. > {code:java} > Error Message > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > failAfter did not complete within 60 seconds. > Stacktrace > sbt.ForkMain$ForkError: > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > failAfter did not complete within 60 seconds. > {code} > - [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87380/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4206/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/4759/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/412/] > (June 15th) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23415) BufferHolderSparkSubmitSuite is flaky
[ https://issues.apache.org/jira/browse/SPARK-23415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23415. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20636 [https://github.com/apache/spark/pull/20636] > BufferHolderSparkSubmitSuite is flaky > - > > Key: SPARK-23415 > URL: https://issues.apache.org/jira/browse/SPARK-23415 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Dongjoon Hyun >Assignee: Kazuaki Ishizaki >Priority: Major > Fix For: 2.4.0 > > > The test suite fails due to 60-second timeout sometimes. > {code:java} > Error Message > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > failAfter did not complete within 60 seconds. > Stacktrace > sbt.ForkMain$ForkError: > org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to > failAfter did not complete within 60 seconds. > {code} > - [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87380/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4206/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/4759/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/412/] > (June 15th) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources
[ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574770#comment-16574770 ] Johannes Zillmann commented on SPARK-12449: --- I'm a bit confused. Reading https://www.snowflake.com/snowflake-spark-part-2-pushing-query-processing/ and https://github.com/snowflakedb/spark-snowflake/pull/8/files it looks like what the ticket is describing has already been realized ? Can somebody shed light on this !? > Pushing down arbitrary logical plans to data sources > > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Stephan Kessler >Priority: Major > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources > for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows > to push down filters and projects pruning unnecessary fields and rows > directly in the data source. > However, data sources such as SQL Engines are capable of doing even more > preprocessing, e.g., evaluating aggregates. This is beneficial because it > would reduce the amount of data transferred from the source to Spark. The > existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to > defer the processing of arbitrary logical plans to the data source. We have > already shown the details at the Spark Summit 2015 Europe > [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25032) Create table is failing, after dropping the database . It is not falling back to default database
[ https://issues.apache.org/jira/browse/SPARK-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574754#comment-16574754 ] sandeep katta commented on SPARK-25032: --- I will be looking into this. Solution: 1)Don't allow to delete the current database 2)Fall back to default once the database is deleted > Create table is failing, after dropping the database . It is not falling back > to default database > - > > Key: SPARK-25032 > URL: https://issues.apache.org/jira/browse/SPARK-25032 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0, 2.3.0, 2.3.1 > Environment: Spark 2.3.1 > Hadoop 2.7.3 > >Reporter: Ayush Anubhava >Priority: Minor > > *Launch spark-beeline for both the scenarios* > *Scenario 1* > create database cbo1; > use cbo1; > create table test2 ( a int, b string , c int) stored as parquet; > drop database cbo1 cascade; > create table test1 ( a int, b string , c int) stored as parquet; > {color:#ff}Output : Exception is thrown at this point {color} > {color:#ff}Error: > org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database > 'cbo1' not found; (state=,code=0){color} > *Scenario 2:* > create database cbo1; > use cbo1; > create table test2 ( a int, b string , c int) stored as parquet; > drop database cbo1 cascade; > create database cbo1; > create table test1 ( a int, b string , c int) stored as parquet; > {color:#ff}Output : Table is getting created in the database "*cbo1*", > even on not using the database.It should have been created in default > db.{color} > > In beeline session, after dropping the database , it is not falling back to > default db > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error reque
vivek kumar created SPARK-25073: --- Summary: Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error request to adjust yarn.scheduler.maximum-allocation-mb Key: SPARK-25073 URL: https://issues.apache.org/jira/browse/SPARK-25073 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 2.3.1, 2.3.0 Reporter: vivek kumar When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting the error request to be more around yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and yarn.nodemanager.resource.memory-mb =8G # Launch shell on Yarn with am.memory less than nodemanager.resource memory but greater than yarn.scheduler.maximum-allocation-mb eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) is above the max threshold (4096 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-allocation-mb'. at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) Scenario 2. yarn.scheduler.maximum-allocation-mb =15g and yarn.nodemanager.resource.memory-mb =8g a.Launch shell on Yarn with am.memory greater than nodemanager.resource memory but less than yarn.scheduler.maximum-allocation-mb eg; spark-shell --master yarn --conf spark.yarn.am.memory=10g Error : java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is above the max threshold (*8096 MB*) of this cluster! Please increase the value of *'yarn.scheduler.maximum-allocation-mb'*. at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) b.Launch shell on Yarn with am.memory greater than nodemanager.resource memory and yarn.scheduler.maximum-allocation-mb eg; spark-shell --master yarn --conf spark.yarn.am.memory=17g Error: java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is above the max threshold (*8096 MB*) of this cluster! Please increase the value of *'yarn.scheduler.maximum-allocation-mb'*. at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) Expected : Error request for scenario2 should be more around yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25032) Create table is failing, after dropping the database . It is not falling back to default database
[ https://issues.apache.org/jira/browse/SPARK-25032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574754#comment-16574754 ] sandeep katta edited comment on SPARK-25032 at 8/9/18 12:16 PM: I will be looking into this. Solution: 1)Don't allow to delete the current database or 2)Fall back to default once the database is deleted was (Author: sandeep.katta2007): I will be looking into this. Solution: 1)Don't allow to delete the current database 2)Fall back to default once the database is deleted > Create table is failing, after dropping the database . It is not falling back > to default database > - > > Key: SPARK-25032 > URL: https://issues.apache.org/jira/browse/SPARK-25032 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0, 2.3.0, 2.3.1 > Environment: Spark 2.3.1 > Hadoop 2.7.3 > >Reporter: Ayush Anubhava >Priority: Minor > > *Launch spark-beeline for both the scenarios* > *Scenario 1* > create database cbo1; > use cbo1; > create table test2 ( a int, b string , c int) stored as parquet; > drop database cbo1 cascade; > create table test1 ( a int, b string , c int) stored as parquet; > {color:#ff}Output : Exception is thrown at this point {color} > {color:#ff}Error: > org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database > 'cbo1' not found; (state=,code=0){color} > *Scenario 2:* > create database cbo1; > use cbo1; > create table test2 ( a int, b string , c int) stored as parquet; > drop database cbo1 cascade; > create database cbo1; > create table test1 ( a int, b string , c int) stored as parquet; > {color:#ff}Output : Table is getting created in the database "*cbo1*", > even on not using the database.It should have been created in default > db.{color} > > In beeline session, after dropping the database , it is not falling back to > default db > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25072) PySpark custom Row class can be given extra parameters
Jan-Willem van der Sijp created SPARK-25072: --- Summary: PySpark custom Row class can be given extra parameters Key: SPARK-25072 URL: https://issues.apache.org/jira/browse/SPARK-25072 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.2.0 Environment: {noformat} SPARK_MAJOR_VERSION is set to 2, using Spark2 Python 3.4.5 (default, Dec 11 2017, 16:57:19) Type 'copyright', 'credits' or 'license' for more information IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 18/08/01 04:49:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/08/01 04:49:17 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 18/08/01 04:49:27 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Python version 3.4.5 (default, Dec 11 2017 16:57:19) SparkSession available as 'spark'. {noformat} {{CentOS release 6.9 (Final)}} {{Linux sandbox-hdp.hortonworks.com 4.14.0-1.el7.elrepo.x86_64 #1 SMP Sun Nov 12 20:21:04 EST 2017 x86_64 x86_64 x86_64 GNU/Linux}} {noformat}openjdk version "1.8.0_161" OpenJDK Runtime Environment (build 1.8.0_161-b14) OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode){noformat} Reporter: Jan-Willem van der Sijp When a custom Row class is made in PySpark, it is possible to provide the constructor of this class with more parameters than there are columns. These extra parameters affect the value of the Row, but are not part of the {{repr}} or {{str}} output, making it hard to debug errors due to these "invisible" values. The hidden values can be accessed through integer-based indexing though. Some examples: {code:python} In [69]: RowClass = Row("column1", "column2") In [70]: RowClass(1, 2) == RowClass(1, 2) Out[70]: True In [71]: RowClass(1, 2) == RowClass(1, 2, 3) Out[71]: False In [75]: RowClass(1, 2, 3) Out[75]: Row(column1=1, column2=2) In [76]: RowClass(1, 2) Out[76]: Row(column1=1, column2=2) In [77]: RowClass(1, 2, 3).asDict() Out[77]: {'column1': 1, 'column2': 2} In [78]: RowClass(1, 2, 3)[2] Out[78]: 3 In [79]: repr(RowClass(1, 2, 3)) Out[79]: 'Row(column1=1, column2=2)' In [80]: str(RowClass(1, 2, 3)) Out[80]: 'Row(column1=1, column2=2)' {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25071) BuildSide is coming not as expected with join queries
[ https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Anubhava updated SPARK-25071: --- Attachment: (was: SPARK-25071_IMG2.PNG) > BuildSide is coming not as expected with join queries > - > > Key: SPARK-25071 > URL: https://issues.apache.org/jira/browse/SPARK-25071 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: Spark 2.3.1 > Hadoop 2.7.3 >Reporter: Ayush Anubhava >Priority: Major > > *BuildSide is not coming as expected.* > Pre-requisites: > *CBO is set as true & spark.sql.cbo.joinReorder.enabled= true.* > *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec* > *Steps:* > *Scenario 1:* > spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='800')") > spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > > *Result 1:* > scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#0L) > : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#1L) > +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L] > scala> val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > buildSide: org.apache.spark.sql.execution.joins.BuildSide = BuildRight > scala> println(buildSide) > *BuildRight* > > *Scenario 2:* > spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='80')") > spark.sql("CREATE TABLE big4 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > *Result 2:* > scala> val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#4L|#4L], [c1#5L|#5L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#4L) > : +- HiveTableScan [c1#4L|#4L], HiveTableRelation `default`.`small4`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#4L|#4L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#5L) > +- HiveTableScan [c1#5L|#5L], HiveTableRelation `default`.`big4`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#5L|#5L] > scala> val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > buildSide: org.apache.spark.sql.execution.joins.BuildSide = *BuildRight* > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org