[jira] [Assigned] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option
[ https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-32982: Assignee: Hyukjin Kwon > Remove hive-1.2 profiles in PIP installation option > --- > > Key: SPARK-32982 > URL: https://issues.apache.org/jira/browse/SPARK-32982 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Hive 1.2 is a fork that we should remove. It's best to don't expose this > distribution from pip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option
[ https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32982. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29858 [https://github.com/apache/spark/pull/29858] > Remove hive-1.2 profiles in PIP installation option > --- > > Key: SPARK-32982 > URL: https://issues.apache.org/jira/browse/SPARK-32982 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.0 > > > Hive 1.2 is a fork that we should remove. It's best to don't expose this > distribution from pip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32714) Port pyspark-stubs
[ https://issues.apache.org/jira/browse/SPARK-32714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32714. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29591 [https://github.com/apache/spark/pull/29591] > Port pyspark-stubs > -- > > Key: SPARK-32714 > URL: https://issues.apache.org/jira/browse/SPARK-32714 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.1.0 > > > Port https://github.com/zero323/pyspark-stubs into PySpark. This was being > discussed in dev mailing list. See also > http://apache-spark-developers-list.1001551.n3.nabble.com/Re-PySpark-Revisiting-PySpark-type-annotations-td26232.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors
[ https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201257#comment-17201257 ] Apache Spark commented on SPARK-32971: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/29861 > Support dynamic PVC creation/deletion for K8s executors > --- > > Key: SPARK-32971 > URL: https://issues.apache.org/jira/browse/SPARK-32971 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors
[ https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201256#comment-17201256 ] Apache Spark commented on SPARK-32971: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/29861 > Support dynamic PVC creation/deletion for K8s executors > --- > > Key: SPARK-32971 > URL: https://issues.apache.org/jira/browse/SPARK-32971 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32984) Improve showing the differences between approved and actual plans
[ https://issues.apache.org/jira/browse/SPARK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32984: Assignee: (was: Apache Spark) > Improve showing the differences between approved and actual plans > - > > Key: SPARK-32984 > URL: https://issues.apache.org/jira/browse/SPARK-32984 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.0 >Reporter: wuyi >Priority: Major > > It's hard to find the difference between the approved and actual plan since > the plans of TPC-DS queries are often huge. We could add hint, e.g., caret > (^), to help developers locate the differences quickly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32984) Improve showing the differences between approved and actual plans
[ https://issues.apache.org/jira/browse/SPARK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32984: Assignee: Apache Spark > Improve showing the differences between approved and actual plans > - > > Key: SPARK-32984 > URL: https://issues.apache.org/jira/browse/SPARK-32984 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > It's hard to find the difference between the approved and actual plan since > the plans of TPC-DS queries are often huge. We could add hint, e.g., caret > (^), to help developers locate the differences quickly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32984) Improve showing the differences between approved and actual plans
[ https://issues.apache.org/jira/browse/SPARK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32984: Assignee: (was: Apache Spark) > Improve showing the differences between approved and actual plans > - > > Key: SPARK-32984 > URL: https://issues.apache.org/jira/browse/SPARK-32984 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.0 >Reporter: wuyi >Priority: Major > > It's hard to find the difference between the approved and actual plan since > the plans of TPC-DS queries are often huge. We could add hint, e.g., caret > (^), to help developers locate the differences quickly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32984) Improve showing the differences between approved and actual plans
[ https://issues.apache.org/jira/browse/SPARK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201245#comment-17201245 ] Apache Spark commented on SPARK-32984: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/29860 > Improve showing the differences between approved and actual plans > - > > Key: SPARK-32984 > URL: https://issues.apache.org/jira/browse/SPARK-32984 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.0 >Reporter: wuyi >Priority: Major > > It's hard to find the difference between the approved and actual plan since > the plans of TPC-DS queries are often huge. We could add hint, e.g., caret > (^), to help developers locate the differences quickly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32984) Improve showing the differences between approved and actual plans
wuyi created SPARK-32984: Summary: Improve showing the differences between approved and actual plans Key: SPARK-32984 URL: https://issues.apache.org/jira/browse/SPARK-32984 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.1.0 Reporter: wuyi It's hard to find the difference between the approved and actual plan since the plans of TPC-DS queries are often huge. We could add hint, e.g., caret (^), to help developers locate the differences quickly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32983) Spark SQL INTERSECT ALL does not keep all rows.
Will Du created SPARK-32983: --- Summary: Spark SQL INTERSECT ALL does not keep all rows. Key: SPARK-32983 URL: https://issues.apache.org/jira/browse/SPARK-32983 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 3.0.0, 2.4.6 Reporter: Will Du Spark SQL INTERSECT ALL should keep all rows. But, it actually remove replicated like Spark SQL INTERSECT. with base as ( select 1 as id union all select 2 as id ), a as ( select 1 as id union all select 3 as id) select * from a INTERSECT ALL select * from base; with base as ( select 1 as id union all select 2 as id ), a as ( select 1 as id union all select 3 as id) select * from a INTERSECT select * from base; Both the above queries return one record that is 1. I think the 1st query should return 1 1 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
[ https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32977: - Assignee: Russell Spitzer > [SQL] JavaDoc on Default Save mode Incorrect > > > Key: SPARK-32977 > URL: https://issues.apache.org/jira/browse/SPARK-32977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Russell Spitzer >Assignee: Russell Spitzer >Priority: Major > > The JavaDoc says that the default save mode is dependent on DataSource > version which is incorrect. It is always ErrorOnExists. > http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
[ https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32977. --- Fix Version/s: 3.0.2 3.1.0 Resolution: Fixed Issue resolved by pull request 29853 [https://github.com/apache/spark/pull/29853] > [SQL] JavaDoc on Default Save mode Incorrect > > > Key: SPARK-32977 > URL: https://issues.apache.org/jira/browse/SPARK-32977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Russell Spitzer >Assignee: Russell Spitzer >Priority: Major > Fix For: 3.1.0, 3.0.2 > > > The JavaDoc says that the default save mode is dependent on DataSource > version which is incorrect. It is always ErrorOnExists. > http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option
[ https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201230#comment-17201230 ] Apache Spark commented on SPARK-32982: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29858 > Remove hive-1.2 profiles in PIP installation option > --- > > Key: SPARK-32982 > URL: https://issues.apache.org/jira/browse/SPARK-32982 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Hive 1.2 is a fork that we should remove. It's best to don't expose this > distribution from pip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option
[ https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201232#comment-17201232 ] Apache Spark commented on SPARK-32982: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29858 > Remove hive-1.2 profiles in PIP installation option > --- > > Key: SPARK-32982 > URL: https://issues.apache.org/jira/browse/SPARK-32982 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Hive 1.2 is a fork that we should remove. It's best to don't expose this > distribution from pip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option
[ https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32982: Assignee: (was: Apache Spark) > Remove hive-1.2 profiles in PIP installation option > --- > > Key: SPARK-32982 > URL: https://issues.apache.org/jira/browse/SPARK-32982 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Hive 1.2 is a fork that we should remove. It's best to don't expose this > distribution from pip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option
[ https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32982: Assignee: Apache Spark > Remove hive-1.2 profiles in PIP installation option > --- > > Key: SPARK-32982 > URL: https://issues.apache.org/jira/browse/SPARK-32982 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > Hive 1.2 is a fork that we should remove. It's best to don't expose this > distribution from pip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option
[ https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32982: - Issue Type: Improvement (was: Bug) > Remove hive-1.2 profiles in PIP installation option > --- > > Key: SPARK-32982 > URL: https://issues.apache.org/jira/browse/SPARK-32982 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > Hive 1.2 is a fork that we should remove. It's best to don't expose this > distribution from pip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option
Hyukjin Kwon created SPARK-32982: Summary: Remove hive-1.2 profiles in PIP installation option Key: SPARK-32982 URL: https://issues.apache.org/jira/browse/SPARK-32982 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.1.0 Reporter: Hyukjin Kwon Hive 1.2 is a fork that we should remove. It's best to don't expose this distribution from pip. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors
[ https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201225#comment-17201225 ] Apache Spark commented on SPARK-32971: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29859 > Support dynamic PVC creation/deletion for K8s executors > --- > > Key: SPARK-32971 > URL: https://issues.apache.org/jira/browse/SPARK-32971 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
[ https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201222#comment-17201222 ] Apache Spark commented on SPARK-32981: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29858 > Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution > - > > Key: SPARK-32981 > URL: https://issues.apache.org/jira/browse/SPARK-32981 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we > still provide the unofficial forked Hive 1.2 version from our distribution. > This issue aims to remove it from Apache Spark 3.1.0. > {code} > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
[ https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201219#comment-17201219 ] Apache Spark commented on SPARK-32981: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29858 > Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution > - > > Key: SPARK-32981 > URL: https://issues.apache.org/jira/browse/SPARK-32981 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we > still provide the unofficial forked Hive 1.2 version from our distribution. > This issue aims to remove it from Apache Spark 3.1.0. > {code} > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201217#comment-17201217 ] Apache Spark commented on SPARK-32972: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/29857 > Pass all `mllib` module UTs in Scala 2.13 > - > > Key: SPARK-32972 > URL: https://issues.apache.org/jira/browse/SPARK-32972 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > There are 51 Scala test and 3 java test Failed of `mllib` module, the failed > case as follow: > *Java:* > * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED) > *Scala:* > * MatrixFactorizationModelSuite ( 1 FAILED) > * LDASuite ( 1 FAILED) > * MLTestSuite ( 1 FAILED) > * PrefixSpanSuite ( 1 FAILED) > * BucketedRandomProjectionLSHSuite ( 3 FAILED) > * Word2VecSuite ( 3 FAILED) > * Word2VecSuite ( 5 FAILED) > * MinHashLSHSuite ( 3 FAILED) > * DecisionTreeSuite ( 1 FAILED) > * FPGrowthSuite ( 2 FAILED) > * NaiveBayesSuite ( 2 FAILED) > * NGramSuite ( 4 FAILED) > * RFormulaSuite ( 4 FAILED) > * GradientBoostedTreesSuite ( 1 FAILED) > * StopWordsRemoverSuite ( 10 FAILED) > * RandomForestSuite ( 1 FAILED) > * PrefixSpanSuite ( 4 FAILED) > * StringIndexerSuite ( 2 FAILED) > * IDFSuite ( 1 FAILED) > * RandomForestRegressorSuite ( 1 FAILED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32972: Assignee: Apache Spark > Pass all `mllib` module UTs in Scala 2.13 > - > > Key: SPARK-32972 > URL: https://issues.apache.org/jira/browse/SPARK-32972 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > There are 51 Scala test and 3 java test Failed of `mllib` module, the failed > case as follow: > *Java:* > * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED) > *Scala:* > * MatrixFactorizationModelSuite ( 1 FAILED) > * LDASuite ( 1 FAILED) > * MLTestSuite ( 1 FAILED) > * PrefixSpanSuite ( 1 FAILED) > * BucketedRandomProjectionLSHSuite ( 3 FAILED) > * Word2VecSuite ( 3 FAILED) > * Word2VecSuite ( 5 FAILED) > * MinHashLSHSuite ( 3 FAILED) > * DecisionTreeSuite ( 1 FAILED) > * FPGrowthSuite ( 2 FAILED) > * NaiveBayesSuite ( 2 FAILED) > * NGramSuite ( 4 FAILED) > * RFormulaSuite ( 4 FAILED) > * GradientBoostedTreesSuite ( 1 FAILED) > * StopWordsRemoverSuite ( 10 FAILED) > * RandomForestSuite ( 1 FAILED) > * PrefixSpanSuite ( 4 FAILED) > * StringIndexerSuite ( 2 FAILED) > * IDFSuite ( 1 FAILED) > * RandomForestRegressorSuite ( 1 FAILED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32972: Assignee: (was: Apache Spark) > Pass all `mllib` module UTs in Scala 2.13 > - > > Key: SPARK-32972 > URL: https://issues.apache.org/jira/browse/SPARK-32972 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > There are 51 Scala test and 3 java test Failed of `mllib` module, the failed > case as follow: > *Java:* > * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED) > *Scala:* > * MatrixFactorizationModelSuite ( 1 FAILED) > * LDASuite ( 1 FAILED) > * MLTestSuite ( 1 FAILED) > * PrefixSpanSuite ( 1 FAILED) > * BucketedRandomProjectionLSHSuite ( 3 FAILED) > * Word2VecSuite ( 3 FAILED) > * Word2VecSuite ( 5 FAILED) > * MinHashLSHSuite ( 3 FAILED) > * DecisionTreeSuite ( 1 FAILED) > * FPGrowthSuite ( 2 FAILED) > * NaiveBayesSuite ( 2 FAILED) > * NGramSuite ( 4 FAILED) > * RFormulaSuite ( 4 FAILED) > * GradientBoostedTreesSuite ( 1 FAILED) > * StopWordsRemoverSuite ( 10 FAILED) > * RandomForestSuite ( 1 FAILED) > * PrefixSpanSuite ( 4 FAILED) > * StringIndexerSuite ( 2 FAILED) > * IDFSuite ( 1 FAILED) > * RandomForestRegressorSuite ( 1 FAILED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201216#comment-17201216 ] Apache Spark commented on SPARK-32972: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/29857 > Pass all `mllib` module UTs in Scala 2.13 > - > > Key: SPARK-32972 > URL: https://issues.apache.org/jira/browse/SPARK-32972 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > There are 51 Scala test and 3 java test Failed of `mllib` module, the failed > case as follow: > *Java:* > * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED) > *Scala:* > * MatrixFactorizationModelSuite ( 1 FAILED) > * LDASuite ( 1 FAILED) > * MLTestSuite ( 1 FAILED) > * PrefixSpanSuite ( 1 FAILED) > * BucketedRandomProjectionLSHSuite ( 3 FAILED) > * Word2VecSuite ( 3 FAILED) > * Word2VecSuite ( 5 FAILED) > * MinHashLSHSuite ( 3 FAILED) > * DecisionTreeSuite ( 1 FAILED) > * FPGrowthSuite ( 2 FAILED) > * NaiveBayesSuite ( 2 FAILED) > * NGramSuite ( 4 FAILED) > * RFormulaSuite ( 4 FAILED) > * GradientBoostedTreesSuite ( 1 FAILED) > * StopWordsRemoverSuite ( 10 FAILED) > * RandomForestSuite ( 1 FAILED) > * PrefixSpanSuite ( 4 FAILED) > * StringIndexerSuite ( 2 FAILED) > * IDFSuite ( 1 FAILED) > * RandomForestRegressorSuite ( 1 FAILED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32975) [K8S] - executor fails to be restarted after it goes to ERROR/Failure state
[ https://issues.apache.org/jira/browse/SPARK-32975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201207#comment-17201207 ] Tibor Fasanga commented on SPARK-32975: --- Note that the main problem is that the executor POD quits with error and Spark driver and Spark operator think it is still running, therefore the executor is never restarted. This is intermittent problem. Our testing shows that this happens frequently when the following is true: # the driver POD has a sidecar container, and # it takes longer to initialize and start the sidecar container (this delay is caused by time required to pull the image of the sidecar container) In other words, this problem manifests itself when there is a delay between starting the driver *container* and the time the driver *POD* is fully started (the POD contains the driver container and the sidecar container). In this case we see the following events in the description of the driver POD: (see the "_Pulling image "registry.nspos.nokia.local/fluent/fluent-bit:1.5.5_" event that is present in this case) {code:java} Events: Type Reason AgeFrom Message -- --- Normal Scheduled default-scheduler Successfully assigned default/act-pipeline-app-driver to node5 Warning FailedMount 20mkubelet, node5 MountVolume.SetUp failed for volume "spark-conf-volume" : configmap "act-pipeline-app-1600699152173-driver-conf-map" not found Normal Pulled 20mkubelet, node5 Container image "registry.nspos.nokia.local/nspos-pki-container:20.9.0-rel.1" already present on machine Normal Created 20mkubelet, node5 Created container nspos-pki Normal Started 20mkubelet, node5 Started container nspos-pki Normal Pulling 20mkubelet, node5 Pulling image "registry.nspos.nokia.local/analytics-rtanalytics-pipeline-app:20.9.0-rel.48" Normal Pulled 19mkubelet, node5 Successfully pulled image "registry.nspos.nokia.local/analytics-rtanalytics-pipeline-app:20.9.0-rel.48" Normal Created 19mkubelet, node5 Created container spark-kubernetes-driver Normal Started 19mkubelet, node5 Started container spark-kubernetes-driver Normal Pulling 19mkubelet, node5 Pulling image "registry.nspos.nokia.local/fluent/fluent-bit:1.5.5" Normal Pulled 18mkubelet, node5 Successfully pulled image "registry.nspos.nokia.local/fluent/fluent-bit:1.5.5" Normal Created 18mkubelet, node5 Created container log-sidecar Normal Started 18mkubelet, node5 Started container log-sidecar {code} Note: The message "_MountVolume.SetUp failed for volume "spark-conf-volume" : configmap "act-pipeline-app-1600699152173-driver-conf-map" not found_" seems to be unrelated and does not seem to cause any problems. > [K8S] - executor fails to be restarted after it goes to ERROR/Failure state > --- > > Key: SPARK-32975 > URL: https://issues.apache.org/jira/browse/SPARK-32975 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Scheduler >Affects Versions: 2.4.4 >Reporter: Shenson Joseph >Priority: Critical > > We are using v1beta2-1.1.2-2.4.5 version of operator with spark-2.4.4 > spark executors keeps getting killed with exit code 1 and we are seeing > following exception in the executor which goes to error state. Once this > error happens, driver doesn't restart executor. > > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64) > at
[jira] [Assigned] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors
[ https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32971: - Assignee: Dongjoon Hyun > Support dynamic PVC creation/deletion for K8s executors > --- > > Key: SPARK-32971 > URL: https://issues.apache.org/jira/browse/SPARK-32971 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors
[ https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32971. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29846 [https://github.com/apache/spark/pull/29846] > Support dynamic PVC creation/deletion for K8s executors > --- > > Key: SPARK-32971 > URL: https://issues.apache.org/jira/browse/SPARK-32971 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.10.0
[ https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201156#comment-17201156 ] Michael Heuer commented on SPARK-27733: --- I can also participate in the Parquet sync meeting. Since this involves coordination across several different projects (Spark, Parquet, Avro, Hive, possibly others), will that be an adequate venue for discussion and decision making? > Upgrade to Avro 1.10.0 > -- > > Key: SPARK-27733 > URL: https://issues.apache.org/jira/browse/SPARK-27733 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.1.0 >Reporter: Ismaël Mejía >Priority: Minor > > Avro 1.9.2 was released with many nice features including reduced size (1MB > less), and removed dependencies, no paranamer, no shaded guava, security > updates, so probably a worth upgrade. > Avro 1.10.0 was released and this is still not done. > There is at the moment (2020/08) still a blocker because of Hive related > transitive dependencies bringing older versions of Avro, so we could say that > this is somehow still blocked until HIVE-21737 is solved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors
[ https://issues.apache.org/jira/browse/SPARK-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201154#comment-17201154 ] John Lonergan commented on SPARK-12312: --- Yes - the driver wrapper I wrote accepted both a KT or a ticket cache., See the reference I gave. If I recall correctly we provided the option of selecting either approach of ... UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytabFile) OR UserGroupInformation.getUGIFromTicketCache(principal, cache) JL On Wed, 23 Sep 2020 at 22:17, Prakash Rajendran (Jira) > JDBC connection to Kerberos secured databases fails on remote executors > --- > > Key: SPARK-12312 > URL: https://issues.apache.org/jira/browse/SPARK-12312 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2, 2.4.2 >Reporter: nabacg >Assignee: Gabor Somogyi >Priority: Minor > > When loading DataFrames from JDBC datasource with Kerberos authentication, > remote executors (yarn-client/cluster etc. modes) fail to establish a > connection due to lack of Kerberos ticket or ability to generate it. > This is a real issue when trying to ingest data from kerberized data sources > (SQL Server, Oracle) in enterprise environment where exposing simple > authentication access is not an option due to IT policy issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32980) Launcher Client tests flake with minikube
[ https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32980. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29854 [https://github.com/apache/spark/pull/29854] > Launcher Client tests flake with minikube > - > > Key: SPARK-32980 > URL: https://issues.apache.org/jira/browse/SPARK-32980 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Fix For: 3.1.0 > > > Launcher Client tests flake with minikube -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.
[ https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32937: - Assignee: Holden Karau > DecomissionSuite in k8s integration tests is failing. > - > > Key: SPARK-32937 > URL: https://issues.apache.org/jira/browse/SPARK-32937 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Assignee: Holden Karau >Priority: Major > > Logs from the failing test, copied from jenkins. As of now, it is always > failing. > {code} > - Test basic decommissioning *** FAILED *** > The code passed to eventually never returned normally. Attempted 182 times > over 3.00377927275 minutes. Last failure message: "++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' 3 == 2 ']' > + '[' 3 == 3 ']' > ++ python3 -V > + pyv3='Python 3.7.3' > + export PYTHON_VERSION=3.7.3 > + PYTHON_VERSION=3.7.3 > + export PYSPARK_PYTHON=python3 > + PYSPARK_PYTHON=python3 > + export PYSPARK_DRIVER_PYTHON=python3 > + PYSPARK_DRIVER_PYTHON=python3 > + '[' -n '' ']' > + '[' -z ']' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner > local:///opt/spark/tests/decommissioning.py > 20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Starting decom test > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for > spark.driver. > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest > 20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: ), task resources: > Map(cpus -> name: cpus, amount: 1.0) > 20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 > tasks per executor > 20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view permissions: Set(185, jenkins); > groups with view permissions: Set(); users with modify permissions: Set(185, > jenkins); groups with modify permissions: Set() > 20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on > port 7078. > 20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: > BlockManagerMasterEndpoint up > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at > /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3 > 20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 > MiB > 20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator > 20/09/17 11:06:58 INFO Utils: Successfully started service 'SparkUI' on > port 4040. > 20/09/17 11:06:58 INFO
[jira] [Assigned] (SPARK-32980) Launcher Client tests flake with minikube
[ https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32980: - Assignee: Holden Karau > Launcher Client tests flake with minikube > - > > Key: SPARK-32980 > URL: https://issues.apache.org/jira/browse/SPARK-32980 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Launcher Client tests flake with minikube -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.
[ https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32937. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29854 [https://github.com/apache/spark/pull/29854] > DecomissionSuite in k8s integration tests is failing. > - > > Key: SPARK-32937 > URL: https://issues.apache.org/jira/browse/SPARK-32937 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Assignee: Holden Karau >Priority: Major > Fix For: 3.1.0 > > > Logs from the failing test, copied from jenkins. As of now, it is always > failing. > {code} > - Test basic decommissioning *** FAILED *** > The code passed to eventually never returned normally. Attempted 182 times > over 3.00377927275 minutes. Last failure message: "++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' 3 == 2 ']' > + '[' 3 == 3 ']' > ++ python3 -V > + pyv3='Python 3.7.3' > + export PYTHON_VERSION=3.7.3 > + PYTHON_VERSION=3.7.3 > + export PYSPARK_PYTHON=python3 > + PYSPARK_PYTHON=python3 > + export PYSPARK_DRIVER_PYTHON=python3 > + PYSPARK_DRIVER_PYTHON=python3 > + '[' -n '' ']' > + '[' -z ']' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner > local:///opt/spark/tests/decommissioning.py > 20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Starting decom test > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for > spark.driver. > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest > 20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: ), task resources: > Map(cpus -> name: cpus, amount: 1.0) > 20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 > tasks per executor > 20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view permissions: Set(185, jenkins); > groups with view permissions: Set(); users with modify permissions: Set(185, > jenkins); groups with modify permissions: Set() > 20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on > port 7078. > 20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: > BlockManagerMasterEndpoint up > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at > /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3 > 20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 > MiB > 20/09/17 11:06:57 INFO SparkEnv: Registering
[jira] [Resolved] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
[ https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-32981. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29856 [https://github.com/apache/spark/pull/29856] > Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution > - > > Key: SPARK-32981 > URL: https://issues.apache.org/jira/browse/SPARK-32981 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > > Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we > still provide the unofficial forked Hive 1.2 version from our distribution. > This issue aims to remove it from Apache Spark 3.1.0. > {code} > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
[ https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201141#comment-17201141 ] Apache Spark commented on SPARK-32981: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29856 > Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution > - > > Key: SPARK-32981 > URL: https://issues.apache.org/jira/browse/SPARK-32981 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we > still provide the unofficial forked Hive 1.2 version from our distribution. > This issue aims to remove it from Apache Spark 3.1.0. > {code} > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
[ https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201138#comment-17201138 ] Apache Spark commented on SPARK-32981: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/29856 > Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution > - > > Key: SPARK-32981 > URL: https://issues.apache.org/jira/browse/SPARK-32981 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we > still provide the unofficial forked Hive 1.2 version from our distribution. > This issue aims to remove it from Apache Spark 3.1.0. > {code} > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
[ https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-32981: - Assignee: Dongjoon Hyun > Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution > - > > Key: SPARK-32981 > URL: https://issues.apache.org/jira/browse/SPARK-32981 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we > still provide the unofficial forked Hive 1.2 version from our distribution. > This issue aims to remove it from Apache Spark 3.1.0. > {code} > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc > spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
Dongjoon Hyun created SPARK-32981: - Summary: Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution Key: SPARK-32981 URL: https://issues.apache.org/jira/browse/SPARK-32981 Project: Spark Issue Type: Task Components: Build Affects Versions: 3.1.0 Reporter: Dongjoon Hyun Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we still provide the unofficial forked Hive 1.2 version from our distribution. This issue aims to remove it from Apache Spark 3.1.0. {code} spark-3.0.1-bin-hadoop2.7-hive1.2.tgz spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32067) [K8S] Executor pod template ConfigMap of ongoing submission got inadvertently altered by subsequent submission
[ https://issues.apache.org/jira/browse/SPARK-32067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Yu updated SPARK-32067: - Affects Version/s: (was: 2.4.6) (was: 3.0.0) 2.4.7 3.0.1 > [K8S] Executor pod template ConfigMap of ongoing submission got inadvertently > altered by subsequent submission > -- > > Key: SPARK-32067 > URL: https://issues.apache.org/jira/browse/SPARK-32067 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.7, 3.0.1 >Reporter: James Yu >Priority: Minor > > THE BUG: > The bug is reproducible by spark-submit two different apps (app1 and app2) > with different executor pod templates (e.g., different labels) to K8s > sequentially, with app2 launching while app1 is still in the middle of > ramping up all its executor pods. The unwanted result is that some launched > executor pods of app1 end up having app2's executor pod template applied to > them. > The root cause appears to be that app1's podspec-configmap got overwritten by > app2 during the overlapping launching periods because both apps use the same > ConfigMap (name). This causes some app1's executor pods being ramped up after > app2 is launched to be inadvertently launched with the app2's pod template. > The issue can be seen as follows: > First, after submitting app1, you get these configmaps: > {code:java} > NAMESPACENAME DATAAGE > default app1--driver-conf-map 1 9m46s > default podspec-configmap 1 12m{code} > Then submit app2 while app1 is still ramping up its executors. The > podspec-confimap is modified by app2. > {code:java} > NAMESPACENAME DATAAGE > default app1--driver-conf-map 1 11m43s > default app2--driver-conf-map 1 10s > default podspec-configmap 1 13m57s{code} > > PROPOSED SOLUTION: > Properly prefix the podspec-configmap for each submitted app, ideally the > same way as the driver configmap: > {code:java} > NAMESPACENAME DATAAGE > default app1--driver-conf-map 1 11m43s > default app1--podspec-configmap1 13m57s > default app2--driver-conf-map 1 10s > default app2--podspec-configmap1 3m{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.10.0
[ https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201109#comment-17201109 ] Dongjoon Hyun commented on SPARK-27733: --- [~smilegator]. Shall we discuss dropping Hive 1.2.1 then? > Let us wait for one more month. If no one is complaining about the quality of > Hive 2.3.x, we can discuss whether we can drop Hive 1.2.1 as Hive execution > in Spark 3.1. > Upgrade to Avro 1.10.0 > -- > > Key: SPARK-27733 > URL: https://issues.apache.org/jira/browse/SPARK-27733 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.1.0 >Reporter: Ismaël Mejía >Priority: Minor > > Avro 1.9.2 was released with many nice features including reduced size (1MB > less), and removed dependencies, no paranamer, no shaded guava, security > updates, so probably a worth upgrade. > Avro 1.10.0 was released and this is still not done. > There is at the moment (2020/08) still a blocker because of Hive related > transitive dependencies bringing older versions of Avro, so we could say that > this is somehow still blocked until HIVE-21737 is solved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors
[ https://issues.apache.org/jira/browse/SPARK-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201101#comment-17201101 ] Prakash Rajendran edited comment on SPARK-12312 at 9/23/20, 9:16 PM: - [~johnlon] the above references use keytab and principal to establish kerberos authentication to Oracle. IN my scenario, I do not have control over keytab file, as there will be a sidecar which takes care of setting up the kerb5cache file to my pod. So the executor has to use only the krb5cache file. Is this scenario also works? was (Author: prakki79): [~johnlon] the above references use keytab and principal to establish kerberos authentication to Oracle. IN my scenario, I donot have control over keytab file, as there will be a sidecar which takes care of detting up the kerb5cache file to my pod. So the executor has to use only the krb5cache file. Is this scenario also works? > JDBC connection to Kerberos secured databases fails on remote executors > --- > > Key: SPARK-12312 > URL: https://issues.apache.org/jira/browse/SPARK-12312 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2, 2.4.2 >Reporter: nabacg >Assignee: Gabor Somogyi >Priority: Minor > > When loading DataFrames from JDBC datasource with Kerberos authentication, > remote executors (yarn-client/cluster etc. modes) fail to establish a > connection due to lack of Kerberos ticket or ability to generate it. > This is a real issue when trying to ingest data from kerberized data sources > (SQL Server, Oracle) in enterprise environment where exposing simple > authentication access is not an option due to IT policy issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors
[ https://issues.apache.org/jira/browse/SPARK-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201101#comment-17201101 ] Prakash Rajendran commented on SPARK-12312: --- [~johnlon] the above references use keytab and principal to establish kerberos authentication to Oracle. IN my scenario, I donot have control over keytab file, as there will be a sidecar which takes care of detting up the kerb5cache file to my pod. So the executor has to use only the krb5cache file. Is this scenario also works? > JDBC connection to Kerberos secured databases fails on remote executors > --- > > Key: SPARK-12312 > URL: https://issues.apache.org/jira/browse/SPARK-12312 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2, 2.4.2 >Reporter: nabacg >Assignee: Gabor Somogyi >Priority: Minor > > When loading DataFrames from JDBC datasource with Kerberos authentication, > remote executors (yarn-client/cluster etc. modes) fail to establish a > connection due to lack of Kerberos ticket or ability to generate it. > This is a real issue when trying to ingest data from kerberized data sources > (SQL Server, Oracle) in enterprise environment where exposing simple > authentication access is not an option due to IT policy issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.10.0
[ https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201094#comment-17201094 ] Chao Sun commented on SPARK-27733: -- [~sha...@uber.com] sure, I can join in the next sync meeting. > Upgrade to Avro 1.10.0 > -- > > Key: SPARK-27733 > URL: https://issues.apache.org/jira/browse/SPARK-27733 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.1.0 >Reporter: Ismaël Mejía >Priority: Minor > > Avro 1.9.2 was released with many nice features including reduced size (1MB > less), and removed dependencies, no paranamer, no shaded guava, security > updates, so probably a worth upgrade. > Avro 1.10.0 was released and this is still not done. > There is at the moment (2020/08) still a blocker because of Hive related > transitive dependencies bringing older versions of Avro, so we could say that > this is somehow still blocked until HIVE-21737 is solved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.10.0
[ https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201091#comment-17201091 ] Dongjoon Hyun commented on SPARK-27733: --- Thank you for pinging me, [~sha...@uber.com]. Sorry, I cannot join there. cc [~dbtsai] > Upgrade to Avro 1.10.0 > -- > > Key: SPARK-27733 > URL: https://issues.apache.org/jira/browse/SPARK-27733 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.1.0 >Reporter: Ismaël Mejía >Priority: Minor > > Avro 1.9.2 was released with many nice features including reduced size (1MB > less), and removed dependencies, no paranamer, no shaded guava, security > updates, so probably a worth upgrade. > Avro 1.10.0 was released and this is still not done. > There is at the moment (2020/08) still a blocker because of Hive related > transitive dependencies bringing older versions of Avro, so we could say that > this is somehow still blocked until HIVE-21737 is solved. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32980) Launcher Client tests flake with minikube
[ https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201076#comment-17201076 ] Apache Spark commented on SPARK-32980: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/29854 > Launcher Client tests flake with minikube > - > > Key: SPARK-32980 > URL: https://issues.apache.org/jira/browse/SPARK-32980 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Major > > Launcher Client tests flake with minikube -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32980) Launcher Client tests flake with minikube
[ https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32980: Assignee: Apache Spark > Launcher Client tests flake with minikube > - > > Key: SPARK-32980 > URL: https://issues.apache.org/jira/browse/SPARK-32980 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Major > > Launcher Client tests flake with minikube -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32980) Launcher Client tests flake with minikube
[ https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32980: Assignee: (was: Apache Spark) > Launcher Client tests flake with minikube > - > > Key: SPARK-32980 > URL: https://issues.apache.org/jira/browse/SPARK-32980 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Major > > Launcher Client tests flake with minikube -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32980) Launcher Client tests flake with minikube
[ https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201073#comment-17201073 ] Holden Karau commented on SPARK-32980: -- Our method of getting the service assumes the service is on the first line, but when a new version of minikube is released the first few lines are upgrade info. > Launcher Client tests flake with minikube > - > > Key: SPARK-32980 > URL: https://issues.apache.org/jira/browse/SPARK-32980 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Major > > Launcher Client tests flake with minikube -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32980) Launcher Client tests flake with minikube
Holden Karau created SPARK-32980: Summary: Launcher Client tests flake with minikube Key: SPARK-32980 URL: https://issues.apache.org/jira/browse/SPARK-32980 Project: Spark Issue Type: Bug Components: Kubernetes, Tests Affects Versions: 3.1.0 Reporter: Holden Karau Launcher Client tests flake with minikube -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32915) RPC implementation to support pushing and merging shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-32915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32915: Assignee: (was: Apache Spark) > RPC implementation to support pushing and merging shuffle blocks > > > Key: SPARK-32915 > URL: https://issues.apache.org/jira/browse/SPARK-32915 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Priority: Major > > RPC implementation for the basic functionality in network-common and > network-shuffle module to enable pushing blocks on the client side and > merging received blocks on the server side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32915) RPC implementation to support pushing and merging shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-32915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32915: Assignee: Apache Spark > RPC implementation to support pushing and merging shuffle blocks > > > Key: SPARK-32915 > URL: https://issues.apache.org/jira/browse/SPARK-32915 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Assignee: Apache Spark >Priority: Major > > RPC implementation for the basic functionality in network-common and > network-shuffle module to enable pushing blocks on the client side and > merging received blocks on the server side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32915) RPC implementation to support pushing and merging shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-32915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201053#comment-17201053 ] Apache Spark commented on SPARK-32915: -- User 'Victsm' has created a pull request for this issue: https://github.com/apache/spark/pull/29855 > RPC implementation to support pushing and merging shuffle blocks > > > Key: SPARK-32915 > URL: https://issues.apache.org/jira/browse/SPARK-32915 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Priority: Major > > RPC implementation for the basic functionality in network-common and > network-shuffle module to enable pushing blocks on the client side and > merging received blocks on the server side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.
[ https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32937: Assignee: Apache Spark > DecomissionSuite in k8s integration tests is failing. > - > > Key: SPARK-32937 > URL: https://issues.apache.org/jira/browse/SPARK-32937 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Assignee: Apache Spark >Priority: Major > > Logs from the failing test, copied from jenkins. As of now, it is always > failing. > {code} > - Test basic decommissioning *** FAILED *** > The code passed to eventually never returned normally. Attempted 182 times > over 3.00377927275 minutes. Last failure message: "++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' 3 == 2 ']' > + '[' 3 == 3 ']' > ++ python3 -V > + pyv3='Python 3.7.3' > + export PYTHON_VERSION=3.7.3 > + PYTHON_VERSION=3.7.3 > + export PYSPARK_PYTHON=python3 > + PYSPARK_PYTHON=python3 > + export PYSPARK_DRIVER_PYTHON=python3 > + PYSPARK_DRIVER_PYTHON=python3 > + '[' -n '' ']' > + '[' -z ']' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner > local:///opt/spark/tests/decommissioning.py > 20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Starting decom test > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for > spark.driver. > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest > 20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: ), task resources: > Map(cpus -> name: cpus, amount: 1.0) > 20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 > tasks per executor > 20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view permissions: Set(185, jenkins); > groups with view permissions: Set(); users with modify permissions: Set(185, > jenkins); groups with modify permissions: Set() > 20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on > port 7078. > 20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: > BlockManagerMasterEndpoint up > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at > /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3 > 20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 > MiB > 20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator > 20/09/17 11:06:58 INFO Utils: Successfully started service 'SparkUI' on > port 4040. > 20/09/17 11:06:58 INFO
[jira] [Assigned] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.
[ https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32937: Assignee: (was: Apache Spark) > DecomissionSuite in k8s integration tests is failing. > - > > Key: SPARK-32937 > URL: https://issues.apache.org/jira/browse/SPARK-32937 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Priority: Major > > Logs from the failing test, copied from jenkins. As of now, it is always > failing. > {code} > - Test basic decommissioning *** FAILED *** > The code passed to eventually never returned normally. Attempted 182 times > over 3.00377927275 minutes. Last failure message: "++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' 3 == 2 ']' > + '[' 3 == 3 ']' > ++ python3 -V > + pyv3='Python 3.7.3' > + export PYTHON_VERSION=3.7.3 > + PYTHON_VERSION=3.7.3 > + export PYSPARK_PYTHON=python3 > + PYSPARK_PYTHON=python3 > + export PYSPARK_DRIVER_PYTHON=python3 > + PYSPARK_DRIVER_PYTHON=python3 > + '[' -n '' ']' > + '[' -z ']' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner > local:///opt/spark/tests/decommissioning.py > 20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Starting decom test > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for > spark.driver. > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest > 20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: ), task resources: > Map(cpus -> name: cpus, amount: 1.0) > 20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 > tasks per executor > 20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view permissions: Set(185, jenkins); > groups with view permissions: Set(); users with modify permissions: Set(185, > jenkins); groups with modify permissions: Set() > 20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on > port 7078. > 20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: > BlockManagerMasterEndpoint up > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at > /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3 > 20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 > MiB > 20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator > 20/09/17 11:06:58 INFO Utils: Successfully started service 'SparkUI' on > port 4040. > 20/09/17 11:06:58 INFO SparkUI: Bound SparkUI to
[jira] [Commented] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.
[ https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201045#comment-17201045 ] Apache Spark commented on SPARK-32937: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/29854 > DecomissionSuite in k8s integration tests is failing. > - > > Key: SPARK-32937 > URL: https://issues.apache.org/jira/browse/SPARK-32937 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Prashant Sharma >Priority: Major > > Logs from the failing test, copied from jenkins. As of now, it is always > failing. > {code} > - Test basic decommissioning *** FAILED *** > The code passed to eventually never returned normally. Attempted 182 times > over 3.00377927275 minutes. Last failure message: "++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' 3 == 2 ']' > + '[' 3 == 3 ']' > ++ python3 -V > + pyv3='Python 3.7.3' > + export PYTHON_VERSION=3.7.3 > + PYTHON_VERSION=3.7.3 > + export PYSPARK_PYTHON=python3 > + PYSPARK_PYTHON=python3 > + export PYSPARK_DRIVER_PYTHON=python3 > + PYSPARK_DRIVER_PYTHON=python3 > + '[' -n '' ']' > + '[' -z ']' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner > local:///opt/spark/tests/decommissioning.py > 20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Starting decom test > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for > spark.driver. > 20/09/17 11:06:57 INFO ResourceUtils: > == > 20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest > 20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, > executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , > memory -> name: memory, amount: 1024, script: , vendor: ), task resources: > Map(cpus -> name: cpus, amount: 1.0) > 20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 > tasks per executor > 20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0 > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins > 20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: > 20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view permissions: Set(185, jenkins); > groups with view permissions: Set(); users with modify permissions: Set(185, > jenkins); groups with modify permissions: Set() > 20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on > port 7078. > 20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: > BlockManagerMasterEndpoint up > 20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat > 20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at > /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3 > 20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 > MiB > 20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator > 20/09/17 11:06:58 INFO Utils: Successfully
[jira] [Resolved] (SPARK-32979) Spark K8s decom test is broken
[ https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-32979. -- Resolution: Duplicate Duplicate of SPARK-32937 > Spark K8s decom test is broken > -- > > Key: SPARK-32979 > URL: https://issues.apache.org/jira/browse/SPARK-32979 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Someone changed the logging messages again. Let's fix the test and add some > comments about the importance of running the K8s test on changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32933) Use keyword-only syntax for keyword_only methods
[ https://issues.apache.org/jira/browse/SPARK-32933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201017#comment-17201017 ] Maciej Szymkiewicz commented on SPARK-32933: Not a problem [~hyukjin.kwon]. My only concern is that we still need a viable alternative for capturing arguments. {{locals}} hack does the job, especially if we add some helper for dropping {{self}} {code:python} def _drop_self(d): d = copy.copy(d) del d["self"] return d class BucketedRandomProjectionLSH(_LSH, _BucketedRandomProjectionLSHParams, HasSeed, JavaMLReadable, JavaMLWritable): def __init__(self, *, inputCol=None, outputCol=None, seed=None, numHashTables=1, bucketLength=None): kwargs = _drop_self(locals()) ... {code} Alternatively, we could just provide all the args explicitly. I guess we could also leverage `inspect` (optionally combined with class decorator? Just thinking out loud). > Use keyword-only syntax for keyword_only methods > > > Key: SPARK-32933 > URL: https://issues.apache.org/jira/browse/SPARK-32933 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 3.1.0 > > > Since 3.0, provides syntax for indicating keyword-only arguments ([PEP > 3102|https://www.python.org/dev/peps/pep-3102/]). > It is not a full replacement for our current usage of {{keyword_only}}, but > it would allow us to make our expectations explicit: > {code:python} > @keyword_only > def __init__(self, degree=2, inputCol=None, outputCol=None): > {code} > {code:python} > @keyword_only > def __init__(self, *, degree=2, inputCol=None, outputCol=None): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32979) Spark K8s decom test is broken
[ https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32979: Assignee: Holden Karau (was: Apache Spark) > Spark K8s decom test is broken > -- > > Key: SPARK-32979 > URL: https://issues.apache.org/jira/browse/SPARK-32979 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Someone changed the logging messages again. Let's fix the test and add some > comments about the importance of running the K8s test on changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32979) Spark K8s decom test is broken
[ https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32979: Assignee: Holden Karau (was: Apache Spark) > Spark K8s decom test is broken > -- > > Key: SPARK-32979 > URL: https://issues.apache.org/jira/browse/SPARK-32979 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Someone changed the logging messages again. Let's fix the test and add some > comments about the importance of running the K8s test on changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32979) Spark K8s decom test is broken
[ https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32979: Assignee: Apache Spark (was: Holden Karau) > Spark K8s decom test is broken > -- > > Key: SPARK-32979 > URL: https://issues.apache.org/jira/browse/SPARK-32979 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Major > > Someone changed the logging messages again. Let's fix the test and add some > comments about the importance of running the K8s test on changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32979) Spark K8s decom test is broken
[ https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200999#comment-17200999 ] Apache Spark commented on SPARK-32979: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/29854 > Spark K8s decom test is broken > -- > > Key: SPARK-32979 > URL: https://issues.apache.org/jira/browse/SPARK-32979 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Someone changed the logging messages again. Let's fix the test and add some > comments about the importance of running the K8s test on changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32979) Spark K8s decom test is broken
[ https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32979: Assignee: Apache Spark (was: Holden Karau) > Spark K8s decom test is broken > -- > > Key: SPARK-32979 > URL: https://issues.apache.org/jira/browse/SPARK-32979 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core, Tests >Affects Versions: 3.1.0 >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Major > > Someone changed the logging messages again. Let's fix the test and add some > comments about the importance of running the K8s test on changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32950) No need for some big-endian specific code paths in {On,Off}HeapColumnVector
[ https://issues.apache.org/jira/browse/SPARK-32950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-32950: Assignee: Michael Munday > No need for some big-endian specific code paths in {On,Off}HeapColumnVector > --- > > Key: SPARK-32950 > URL: https://issues.apache.org/jira/browse/SPARK-32950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Michael Munday >Assignee: Michael Munday >Priority: Trivial > Labels: big-endian > > There is no need for a separate code path for big-endian platforms in > putFloats and putDoubles in OnHeapColumnVector and OffHeapColumnVector. Since > SPARK-26985 was fixed the values have been copied in native byte order so the > code required to perform this operation can be the same on both little- and > big-endian platforms. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32950) No need for some big-endian specific code paths in {On,Off}HeapColumnVector
[ https://issues.apache.org/jira/browse/SPARK-32950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-32950. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29815 [https://github.com/apache/spark/pull/29815] > No need for some big-endian specific code paths in {On,Off}HeapColumnVector > --- > > Key: SPARK-32950 > URL: https://issues.apache.org/jira/browse/SPARK-32950 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Michael Munday >Assignee: Michael Munday >Priority: Trivial > Labels: big-endian > Fix For: 3.1.0 > > > There is no need for a separate code path for big-endian platforms in > putFloats and putDoubles in OnHeapColumnVector and OffHeapColumnVector. Since > SPARK-26985 was fixed the values have been copied in native byte order so the > code required to perform this operation can be the same on both little- and > big-endian platforms. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms
[ https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-32892. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 29762 [https://github.com/apache/spark/pull/29762] > Murmur3 and xxHash64 implementations do not produce the correct results on > big-endian platforms > --- > > Key: SPARK-32892 > URL: https://issues.apache.org/jira/browse/SPARK-32892 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 >Reporter: Michael Munday >Assignee: Michael Munday >Priority: Minor > Labels: big-endian > Fix For: 3.1.0 > > > The Murmur3 and xxHash64 implementations in Spark do not produce the correct > results on big-endian systems. This causes test failures on my target > platform (s390x). > These hash functions require that multi-byte chunks be interpreted as > integers encoded in *little-endian* byte order. This requires byte reversal > when using multi-byte unsafe operations on big-endian platforms. > I have a PR ready for discussion and review. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms
[ https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-32892: Assignee: Michael Munday > Murmur3 and xxHash64 implementations do not produce the correct results on > big-endian platforms > --- > > Key: SPARK-32892 > URL: https://issues.apache.org/jira/browse/SPARK-32892 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.0.1 >Reporter: Michael Munday >Assignee: Michael Munday >Priority: Minor > Labels: big-endian > > The Murmur3 and xxHash64 implementations in Spark do not produce the correct > results on big-endian systems. This causes test failures on my target > platform (s390x). > These hash functions require that multi-byte chunks be interpreted as > integers encoded in *little-endian* byte order. This requires byte reversal > when using multi-byte unsafe operations on big-endian platforms. > I have a PR ready for discussion and review. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32979) Spark K8s decom test is broken
Holden Karau created SPARK-32979: Summary: Spark K8s decom test is broken Key: SPARK-32979 URL: https://issues.apache.org/jira/browse/SPARK-32979 Project: Spark Issue Type: Bug Components: Kubernetes, Spark Core, Tests Affects Versions: 3.1.0 Reporter: Holden Karau Assignee: Holden Karau Someone changed the logging messages again. Let's fix the test and add some comments about the importance of running the K8s test on changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-32972: - Description: There are 51 Scala test and 3 java test Failed of `mllib` module, the failed case as follow: *Java:* * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED) * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED) * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED) *Scala:* * MatrixFactorizationModelSuite ( 1 FAILED) * LDASuite ( 1 FAILED) * MLTestSuite ( 1 FAILED) * PrefixSpanSuite ( 1 FAILED) * BucketedRandomProjectionLSHSuite ( 3 FAILED) * Word2VecSuite ( 3 FAILED) * Word2VecSuite ( 5 FAILED) * MinHashLSHSuite ( 3 FAILED) * DecisionTreeSuite ( 1 FAILED) * FPGrowthSuite ( 2 FAILED) * NaiveBayesSuite ( 2 FAILED) * NGramSuite ( 4 FAILED) * RFormulaSuite ( 4 FAILED) * GradientBoostedTreesSuite ( 1 FAILED) * StopWordsRemoverSuite ( 10 FAILED) * RandomForestSuite ( 1 FAILED) * PrefixSpanSuite ( 4 FAILED) * StringIndexerSuite ( 2 FAILED) * IDFSuite ( 1 FAILED) * RandomForestRegressorSuite ( 1 FAILED) was:There are 51 Scala test and 3 java test Failed of `mllib` module, details will be added later. > Pass all `mllib` module UTs in Scala 2.13 > - > > Key: SPARK-32972 > URL: https://issues.apache.org/jira/browse/SPARK-32972 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > There are 51 Scala test and 3 java test Failed of `mllib` module, the failed > case as follow: > *Java:* > * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED) > * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED) > *Scala:* > * MatrixFactorizationModelSuite ( 1 FAILED) > * LDASuite ( 1 FAILED) > * MLTestSuite ( 1 FAILED) > * PrefixSpanSuite ( 1 FAILED) > * BucketedRandomProjectionLSHSuite ( 3 FAILED) > * Word2VecSuite ( 3 FAILED) > * Word2VecSuite ( 5 FAILED) > * MinHashLSHSuite ( 3 FAILED) > * DecisionTreeSuite ( 1 FAILED) > * FPGrowthSuite ( 2 FAILED) > * NaiveBayesSuite ( 2 FAILED) > * NGramSuite ( 4 FAILED) > * RFormulaSuite ( 4 FAILED) > * GradientBoostedTreesSuite ( 1 FAILED) > * StopWordsRemoverSuite ( 10 FAILED) > * RandomForestSuite ( 1 FAILED) > * PrefixSpanSuite ( 4 FAILED) > * StringIndexerSuite ( 2 FAILED) > * IDFSuite ( 1 FAILED) > * RandomForestRegressorSuite ( 1 FAILED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
[ https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200935#comment-17200935 ] Apache Spark commented on SPARK-32977: -- User 'RussellSpitzer' has created a pull request for this issue: https://github.com/apache/spark/pull/29853 > [SQL] JavaDoc on Default Save mode Incorrect > > > Key: SPARK-32977 > URL: https://issues.apache.org/jira/browse/SPARK-32977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Russell Spitzer >Priority: Major > > The JavaDoc says that the default save mode is dependent on DataSource > version which is incorrect. It is always ErrorOnExists. > http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
[ https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32977: Assignee: (was: Apache Spark) > [SQL] JavaDoc on Default Save mode Incorrect > > > Key: SPARK-32977 > URL: https://issues.apache.org/jira/browse/SPARK-32977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Russell Spitzer >Priority: Major > > The JavaDoc says that the default save mode is dependent on DataSource > version which is incorrect. It is always ErrorOnExists. > http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
[ https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32977: Assignee: Apache Spark > [SQL] JavaDoc on Default Save mode Incorrect > > > Key: SPARK-32977 > URL: https://issues.apache.org/jira/browse/SPARK-32977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Russell Spitzer >Assignee: Apache Spark >Priority: Major > > The JavaDoc says that the default save mode is dependent on DataSource > version which is incorrect. It is always ErrorOnExists. > http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
[ https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200932#comment-17200932 ] Apache Spark commented on SPARK-32977: -- User 'RussellSpitzer' has created a pull request for this issue: https://github.com/apache/spark/pull/29853 > [SQL] JavaDoc on Default Save mode Incorrect > > > Key: SPARK-32977 > URL: https://issues.apache.org/jira/browse/SPARK-32977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Russell Spitzer >Priority: Major > > The JavaDoc says that the default save mode is dependent on DataSource > version which is incorrect. It is always ErrorOnExists. > http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
[ https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200930#comment-17200930 ] Russell Spitzer commented on SPARK-32977: - [~brkyvz] We talked about this a while back, just submitted the PR to fix the doc. Could you please review? > [SQL] JavaDoc on Default Save mode Incorrect > > > Key: SPARK-32977 > URL: https://issues.apache.org/jira/browse/SPARK-32977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Russell Spitzer >Priority: Major > > The JavaDoc says that the default save mode is dependent on DataSource > version which is incorrect. It is always ErrorOnExists. > http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32978) Incorrect number of dynamic part metric
[ https://issues.apache.org/jira/browse/SPARK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-32978: Description: How to reproduce this issue: {code:sql} create table dynamic_partition(i bigint, part bigint) using parquet partitioned by (part); insert overwrite table dynamic_partition partition(part) select id, id % 50 as part from range(1); {code} The number of dynamic part should be 50, but it is 800. was: How to reproduce this issue: {code:sql} create table dynamic_partition(i bigint, part bigint) using parquet partitioned by (part); insert overwrite table dynamic_partition partition(part) select id, id % 50 as part from range(1); {code} > Incorrect number of dynamic part metric > --- > > Key: SPARK-32978 > URL: https://issues.apache.org/jira/browse/SPARK-32978 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > How to reproduce this issue: > {code:sql} > create table dynamic_partition(i bigint, part bigint) using parquet > partitioned by (part); > insert overwrite table dynamic_partition partition(part) select id, id % 50 > as part from range(1); > {code} > The number of dynamic part should be 50, but it is 800. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32978) Incorrect number of dynamic part metric
Yuming Wang created SPARK-32978: --- Summary: Incorrect number of dynamic part metric Key: SPARK-32978 URL: https://issues.apache.org/jira/browse/SPARK-32978 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang Attachments: screenshot-1.png How to reproduce this issue: {code:sql} create table dynamic_partition(i bigint, part bigint) using parquet partitioned by (part); insert overwrite table dynamic_partition partition(part) select id, id % 50 as part from range(1); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32978) Incorrect number of dynamic part metric
[ https://issues.apache.org/jira/browse/SPARK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-32978: Attachment: screenshot-1.png > Incorrect number of dynamic part metric > --- > > Key: SPARK-32978 > URL: https://issues.apache.org/jira/browse/SPARK-32978 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > Attachments: screenshot-1.png > > > How to reproduce this issue: > {code:sql} > create table dynamic_partition(i bigint, part bigint) using parquet > partitioned by (part); > insert overwrite table dynamic_partition partition(part) select id, id % 50 > as part from range(1); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
[ https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Spitzer updated SPARK-32977: Description: The JavaDoc says that the default save mode is dependent on DataSource version which is incorrect. It is always ErrorOnExists. http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html was:The JavaDoc says that the default save mode is dependent on DataSource version which is incorrect. It is always ErrorOnExists. > [SQL] JavaDoc on Default Save mode Incorrect > > > Key: SPARK-32977 > URL: https://issues.apache.org/jira/browse/SPARK-32977 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Russell Spitzer >Priority: Major > > The JavaDoc says that the default save mode is dependent on DataSource > version which is incorrect. It is always ErrorOnExists. > http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect
Russell Spitzer created SPARK-32977: --- Summary: [SQL] JavaDoc on Default Save mode Incorrect Key: SPARK-32977 URL: https://issues.apache.org/jira/browse/SPARK-32977 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1 Reporter: Russell Spitzer The JavaDoc says that the default save mode is dependent on DataSource version which is incorrect. It is always ErrorOnExists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32976) Support column list in INSERT statement
[ https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200888#comment-17200888 ] Kent Yao commented on SPARK-32976: -- thanks for pinging me,looking into this > Support column list in INSERT statement > --- > > Key: SPARK-32976 > URL: https://issues.apache.org/jira/browse/SPARK-32976 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > INSERT currently does not support named column lists. > {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} > Note, we assume the column list contains all the column names. Issue an > exception if the list is not complete. The column order could be different > from the column order defined in the table definition. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32976) Support column list in INSERT statement
[ https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-32976: Description: INSERT currently does not support named column lists. {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} Note, we assume the column list contains all the column names. The order could be different from the column order defined in the table definition. was: INSERT currently does not support named column lists. {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} > Support column list in INSERT statement > --- > > Key: SPARK-32976 > URL: https://issues.apache.org/jira/browse/SPARK-32976 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > INSERT currently does not support named column lists. > {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} > Note, we assume the column list contains all the column names. The order > could be different from the column order defined in the table definition. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32976) Support column list in INSERT statement
[ https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-32976: Description: INSERT currently does not support named column lists. {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} Note, we assume the column list contains all the column names. Issue an exception if the list is not complete. The column order could be different from the column order defined in the table definition. was: INSERT currently does not support named column lists. {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} Note, we assume the column list contains all the column names. The order could be different from the column order defined in the table definition. > Support column list in INSERT statement > --- > > Key: SPARK-32976 > URL: https://issues.apache.org/jira/browse/SPARK-32976 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > INSERT currently does not support named column lists. > {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} > Note, we assume the column list contains all the column names. Issue an > exception if the list is not complete. The column order could be different > from the column order defined in the table definition. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32976) Support column list in INSERT statement
[ https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200877#comment-17200877 ] Xiao Li commented on SPARK-32976: - [~Qin Yao] Are you interested in this? > Support column list in INSERT statement > --- > > Key: SPARK-32976 > URL: https://issues.apache.org/jira/browse/SPARK-32976 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Xiao Li >Priority: Major > > INSERT currently does not support named column lists. > {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32976) Support column list in INSERT statement
Xiao Li created SPARK-32976: --- Summary: Support column list in INSERT statement Key: SPARK-32976 URL: https://issues.apache.org/jira/browse/SPARK-32976 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Xiao Li INSERT currently does not support named column lists. {{INSERT INTO (col1, col2,…) VALUES( 'val1', 'val2', … )}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols
[ https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200846#comment-17200846 ] Sean R. Owen commented on SPARK-32973: -- It looks like "real" is ignored here? I only see two features hashed. That would at least be consistent with the comment if so. We could change it to an error, which seems OK, but, maybe just a warning to avoid a behavior change? > FeatureHasher does not check categoricalCols in inputCols > - > > Key: SPARK-32973 > URL: https://issues.apache.org/jira/browse/SPARK-32973 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML >Affects Versions: 2.3.0, 2.4.0, 3.0.0, 3.1.0 >Reporter: zhengruifeng >Priority: Trivial > > doc related to {{categoricalCols}}: > {code:java} > Numeric columns to treat as categorical features. By default only string and > boolean columns are treated as categorical, so this param can be used to > explicitly specify the numerical columns to treat as categorical. Note, the > relevant columns must also be set in inputCols. {code} > > However, the check to make sure {{categoricalCols}} in {{inputCols}} was > never implemented: > for example, in 2.4.7 and current master(3.1.0): > {code:java} > scala> import org.apache.spark.ml.feature._ > import org.apache.spark.ml.feature._ > scala> import org.apache.spark.ml.linalg.{Vector, Vectors} > import org.apache.spark.ml.linalg.{Vector, Vectors} > scala> val df = Seq((2.0, 1, "foo"),(3.0, 2, "bar")).toDF("real", "int", > "string") > df: org.apache.spark.sql.DataFrame = [real: double, int: int ... 1 more field] > scala> val n = 100 > n: Int = 100 > scala> val hasher = new FeatureHasher().setInputCols("int", > "string").setCategoricalCols(Array("real")).setOutputCol("features").setNumFeatures(n) > > hasher: org.apache.spark.ml.feature.FeatureHasher = featureHasher_fbe05968b33f > scala> hasher.transform(df).show > ++---+--++ > |real|int|string|features| > ++---+--++ > | 2.0| 1| foo|(100,[2,39],[1.0,...| > | 3.0| 2| bar|(100,[2,42],[2.0,...| > ++---+--++ > {code} > > CategoricalCols "real" is not in inputCols ("int", "string"). > > I think there are two options: > 1, remove this comment "Note, the relevant columns must also be set in > inputCols. ", since this requirement seems unnecessary; > 2, add a check to make sure all CategoricalCols are in inputCols. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32852) spark.sql.hive.metastore.jars support HDFS location
[ https://issues.apache.org/jira/browse/SPARK-32852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200798#comment-17200798 ] Yuming Wang commented on SPARK-32852: - Workaround: {code:sh} bin/spark-submit --deploy-mode cluster --conf spark.yarn.dist.archives=/tmp/hive-1.2.1-lib.tgz --conf spark.sql.hive.metastore.jars=./hive-1.2.1-lib.tgz/hive-1.2.1-lib/* --conf "spark.sql.hive.metastore.version=1.2.1 {code} > spark.sql.hive.metastore.jars support HDFS location > --- > > Key: SPARK-32852 > URL: https://issues.apache.org/jira/browse/SPARK-32852 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > It would be great if {{spark.sql.hive.metastore.jars}} supports HDFS > location. The cluster mode will be very convenient. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32975) [K8S] - executor fails to be restarted after it goes to ERROR/Failure state
[ https://issues.apache.org/jira/browse/SPARK-32975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200771#comment-17200771 ] Shenson Joseph commented on SPARK-32975: [~anirudh] [~eje] [~liyinan926] > [K8S] - executor fails to be restarted after it goes to ERROR/Failure state > --- > > Key: SPARK-32975 > URL: https://issues.apache.org/jira/browse/SPARK-32975 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Scheduler >Affects Versions: 2.4.4 >Reporter: Shenson Joseph >Priority: Critical > > We are using v1beta2-1.1.2-2.4.5 version of operator with spark-2.4.4 > spark executors keeps getting killed with exit code 1 and we are seeing > following exception in the executor which goes to error state. Once this > error happens, driver doesn't restart executor. > > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > ... 4 more > Caused by: java.io.IOException: Failed to connect to > act-pipeline-app-1600187491917-driver-svc.default.svc:7078 > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187) > at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198) > at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194) > at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.net.UnknownHostException: > act-pipeline-app-1600187491917-driver-svc.default.svc > at java.net.InetAddress.getAllByName0(InetAddress.java:1281) > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > at java.net.InetAddress.getByName(InetAddress.java:1077) > at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146) > at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143) > at java.security.AccessController.doPrivileged(Native Method) > at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143) > at > io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43) > at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63) > at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55) > at > io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57) > at > io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32) > at > io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108) > at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208) > at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49) > at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188) > at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) > at > io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) > at
[jira] [Created] (SPARK-32975) [K8S] - executor fails to be restarted after it goes to ERROR/Failure state
Shenson Joseph created SPARK-32975: -- Summary: [K8S] - executor fails to be restarted after it goes to ERROR/Failure state Key: SPARK-32975 URL: https://issues.apache.org/jira/browse/SPARK-32975 Project: Spark Issue Type: Bug Components: Kubernetes, Scheduler Affects Versions: 2.4.4 Reporter: Shenson Joseph We are using v1beta2-1.1.2-2.4.5 version of operator with spark-2.4.4 spark executors keeps getting killed with exit code 1 and we are seeing following exception in the executor which goes to error state. Once this error happens, driver doesn't restart executor. Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) ... 4 more Caused by: java.io.IOException: Failed to connect to act-pipeline-app-1600187491917-driver-svc.default.svc:7078 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: act-pipeline-app-1600187491917-driver-svc.default.svc at java.net.InetAddress.getAllByName0(InetAddress.java:1281) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at java.net.InetAddress.getByName(InetAddress.java:1077) at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146) at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143) at java.security.AccessController.doPrivileged(Native Method) at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143) at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43) at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63) at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55) at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57) at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32) at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108) at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208) at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49) at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188) at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978) at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:512) at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423) at
[jira] [Commented] (SPARK-21481) Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF
[ https://issues.apache.org/jira/browse/SPARK-21481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200757#comment-17200757 ] Apache Spark commented on SPARK-21481: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/29852 > Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF > - > > Key: SPARK-21481 > URL: https://issues.apache.org/jira/browse/SPARK-21481 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.1.0, 2.2.0 >Reporter: Aseem Bansal >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.0.0 > > > If we want to find the index of any input based on hashing trick then it is > possible in > https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.mllib.feature.HashingTF > but not in > https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.ml.feature.HashingTF. > Should allow that for feature parity -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses
[ https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200747#comment-17200747 ] Apache Spark commented on SPARK-22674: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/29851 > PySpark breaks serialization of namedtuple subclasses > - > > Key: SPARK-22674 > URL: https://issues.apache.org/jira/browse/SPARK-22674 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.0, 2.3.0 >Reporter: Jonas Amrich >Priority: Major > > Pyspark monkey patches the namedtuple class to make it serializable, however > this breaks serialization of its subclasses. With current implementation, any > subclass will be serialized (and deserialized) as it's parent namedtuple. > Consider this code, which will fail with {{AttributeError: 'Point' object has > no attribute 'sum'}}: > {code} > from collections import namedtuple > Point = namedtuple("Point", "x y") > class PointSubclass(Point): > def sum(self): > return self.x + self.y > rdd = spark.sparkContext.parallelize([[PointSubclass(1, 1)]]) > rdd.collect()[0][0].sum() > {code} > Moreover, as PySpark hijacks all namedtuples in the main module, importing > pyspark breaks serialization of namedtuple subclasses even in code which is > not related to spark / distributed execution. I don't see any clean solution > to this; a possible workaround may be to limit serialization hack only to > direct namedtuple subclasses like in > https://github.com/JonasAmrich/spark/commit/f3efecee28243380ecf6657fe54e1a165c1b7204 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32965) pyspark reading csv files with utf_16le encoding
[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200743#comment-17200743 ] Punit Shah commented on SPARK-32965: It looks similar. I've attached a utf-16le file to this ticket. The pyspark code is essentially: spark.read.csv("16le.csv", inferSchema=True, header=True, encoding="utf_16le"). The attached picture shows the result. > pyspark reading csv files with utf_16le encoding > > > Key: SPARK-32965 > URL: https://issues.apache.org/jira/browse/SPARK-32965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Priority: Major > Attachments: 16le.csv, 32965.png > > > If you have a file encoded in utf_16le or utf_16be and try to use > spark.read.csv("", encoding="utf_16le") the dataframe isn't > rendered properly > if you use python decoding like: > prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : > x.decode("utf_16le").splitlines()) > and then do spark.read.csv(prdd), then it works. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32965) pyspark reading csv files with utf_16le encoding
[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-32965: --- Attachment: 32965.png > pyspark reading csv files with utf_16le encoding > > > Key: SPARK-32965 > URL: https://issues.apache.org/jira/browse/SPARK-32965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Priority: Major > Attachments: 16le.csv, 32965.png > > > If you have a file encoded in utf_16le or utf_16be and try to use > spark.read.csv("", encoding="utf_16le") the dataframe isn't > rendered properly > if you use python decoding like: > prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : > x.decode("utf_16le").splitlines()) > and then do spark.read.csv(prdd), then it works. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32306) `approx_percentile` in Spark SQL gives incorrect results
[ https://issues.apache.org/jira/browse/SPARK-32306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32306: - Fix Version/s: 3.0.2 2.4.8 > `approx_percentile` in Spark SQL gives incorrect results > > > Key: SPARK-32306 > URL: https://issues.apache.org/jira/browse/SPARK-32306 > Project: Spark > Issue Type: Documentation > Components: PySpark, SQL >Affects Versions: 2.4.4, 3.0.0, 3.1.0 >Reporter: Sean Malory >Assignee: Maxim Gekk >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > The `approx_percentile` function in Spark SQL does not give the correct > result. I'm not sure how incorrect it is; it may just be a boundary issue. > From the docs: > {quote}The accuracy parameter (default: 1) is a positive numeric literal > which controls approximation accuracy at the cost of memory. Higher value of > accuracy yields better accuracy, 1.0/accuracy is the relative error of the > approximation. > {quote} > This is not true. Here is a minimum example in `pyspark` where, essentially, > the median of 5 and 8 is being calculated as 5: > {code:python} > import pyspark.sql.functions as psf > df = spark.createDataFrame( > [('bar', 5), ('bar', 8)], ['name', 'val'] > ) > median = psf.expr('percentile_approx(val, 0.5, 2147483647)') > df.groupBy('name').agg(median.alias('median'))# gives the median as 5 > {code} > I've tested this with Spark v2.4.4, pyspark v2.4.5- although I suspect this > is an issue with the underlying algorithm. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32965) pyspark reading csv files with utf_16le encoding
[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-32965: --- Attachment: 16le.csv > pyspark reading csv files with utf_16le encoding > > > Key: SPARK-32965 > URL: https://issues.apache.org/jira/browse/SPARK-32965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.7, 3.0.0, 3.0.1 >Reporter: Punit Shah >Priority: Major > Attachments: 16le.csv > > > If you have a file encoded in utf_16le or utf_16be and try to use > spark.read.csv("", encoding="utf_16le") the dataframe isn't > rendered properly > if you use python decoding like: > prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : > x.decode("utf_16le").splitlines()) > and then do spark.read.csv(prdd), then it works. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200732#comment-17200732 ] Yang Jie commented on SPARK-32972: -- There is only "training with sample weights" in org.apache.spark.ml.regression.RandomForestRegressorSuite not fixed, but there's no good idea to fix it now... > Pass all `mllib` module UTs in Scala 2.13 > - > > Key: SPARK-32972 > URL: https://issues.apache.org/jira/browse/SPARK-32972 > Project: Spark > Issue Type: Sub-task > Components: ML, MLlib >Affects Versions: 3.1.0 >Reporter: Yang Jie >Priority: Major > > There are 51 Scala test and 3 java test Failed of `mllib` module, details > will be added later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32974) FeatureHasher transform optimization
[ https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200721#comment-17200721 ] Apache Spark commented on SPARK-32974: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/29850 > FeatureHasher transform optimization > > > Key: SPARK-32974 > URL: https://issues.apache.org/jira/browse/SPARK-32974 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Priority: Minor > > for a numerical column, its output index is a hash of its col_name, we can > pre-compute it at first, instead of computing it on each row. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32974) FeatureHasher transform optimization
[ https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200719#comment-17200719 ] Apache Spark commented on SPARK-32974: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/29850 > FeatureHasher transform optimization > > > Key: SPARK-32974 > URL: https://issues.apache.org/jira/browse/SPARK-32974 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Priority: Minor > > for a numerical column, its output index is a hash of its col_name, we can > pre-compute it at first, instead of computing it on each row. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32974) FeatureHasher transform optimization
[ https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32974: Assignee: (was: Apache Spark) > FeatureHasher transform optimization > > > Key: SPARK-32974 > URL: https://issues.apache.org/jira/browse/SPARK-32974 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Priority: Minor > > for a numerical column, its output index is a hash of its col_name, we can > pre-compute it at first, instead of computing it on each row. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32974) FeatureHasher transform optimization
[ https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32974: Assignee: Apache Spark > FeatureHasher transform optimization > > > Key: SPARK-32974 > URL: https://issues.apache.org/jira/browse/SPARK-32974 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: Apache Spark >Priority: Minor > > for a numerical column, its output index is a hash of its col_name, we can > pre-compute it at first, instead of computing it on each row. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org