[jira] [Commented] (SPARK-35589) BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating
[ https://issues.apache.org/jira/browse/SPARK-35589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354822#comment-17354822 ] Apache Spark commented on SPARK-35589: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/32727 > BlockManagerMasterEndpoint should not ignore index-only shuffle file during > updating > > > Key: SPARK-35589 > URL: https://issues.apache.org/jira/browse/SPARK-35589 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35589) BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating
[ https://issues.apache.org/jira/browse/SPARK-35589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35589: Assignee: Apache Spark > BlockManagerMasterEndpoint should not ignore index-only shuffle file during > updating > > > Key: SPARK-35589 > URL: https://issues.apache.org/jira/browse/SPARK-35589 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35589) BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating
[ https://issues.apache.org/jira/browse/SPARK-35589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354821#comment-17354821 ] Apache Spark commented on SPARK-35589: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/32727 > BlockManagerMasterEndpoint should not ignore index-only shuffle file during > updating > > > Key: SPARK-35589 > URL: https://issues.apache.org/jira/browse/SPARK-35589 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35589) BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating
[ https://issues.apache.org/jira/browse/SPARK-35589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35589: Assignee: (was: Apache Spark) > BlockManagerMasterEndpoint should not ignore index-only shuffle file during > updating > > > Key: SPARK-35589 > URL: https://issues.apache.org/jira/browse/SPARK-35589 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35587) Initial porting of Koalas documentation
[ https://issues.apache.org/jira/browse/SPARK-35587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354820#comment-17354820 ] Apache Spark commented on SPARK-35587: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/32726 > Initial porting of Koalas documentation > --- > > Key: SPARK-35587 > URL: https://issues.apache.org/jira/browse/SPARK-35587 > Project: Spark > Issue Type: Sub-task > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > This JIRA aims initial porting of the Koalas documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35587) Initial porting of Koalas documentation
[ https://issues.apache.org/jira/browse/SPARK-35587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35587: Assignee: (was: Apache Spark) > Initial porting of Koalas documentation > --- > > Key: SPARK-35587 > URL: https://issues.apache.org/jira/browse/SPARK-35587 > Project: Spark > Issue Type: Sub-task > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > This JIRA aims initial porting of the Koalas documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35587) Initial porting of Koalas documentation
[ https://issues.apache.org/jira/browse/SPARK-35587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35587: Assignee: Apache Spark > Initial porting of Koalas documentation > --- > > Key: SPARK-35587 > URL: https://issues.apache.org/jira/browse/SPARK-35587 > Project: Spark > Issue Type: Sub-task > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > This JIRA aims initial porting of the Koalas documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33933) Broadcast timeout happened unexpectedly in AQE
[ https://issues.apache.org/jira/browse/SPARK-33933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354818#comment-17354818 ] Apache Spark commented on SPARK-33933: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/32725 > Broadcast timeout happened unexpectedly in AQE > --- > > Key: SPARK-33933 > URL: https://issues.apache.org/jira/browse/SPARK-33933 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Yu Zhong >Assignee: Yu Zhong >Priority: Major > Fix For: 3.2.0 > > > In Spark 3.0, when AQE is enabled, there is often broadcast timeout in normal > queries as below. > > {code:java} > Could not execute broadcast in 300 secs. You can increase the timeout for > broadcasts via spark.sql.broadcastTimeout or disable broadcast join by > setting spark.sql.autoBroadcastJoinThreshold to -1 > {code} > > This is usually happens when broadcast join(with or without hint) after a > long running shuffle (more than 5 minutes). By disable AQE, the issues > disappear. > The workaround is to increase spark.sql.broadcastTimeout and it works. But > because the data to broadcast is very small, that doesn't make sense. > After investigation, the root cause should be like this: when enable AQE, in > getFinalPhysicalPlan, spark traversal the physical plan bottom up and create > query stage for materialized part by createQueryStages and materialize those > new created query stages to submit map stages or broadcasting. When > ShuffleQueryStage are materializing before BroadcastQueryStage, the map job > and broadcast job are submitted almost at the same time, but map job will > hold all the computing resources. If the map job runs slow (when lots of data > needs to process and the resource is limited), the broadcast job cannot be > started(and finished) before spark.sql.broadcastTimeout, thus cause whole job > failed (introduced in SPARK-31475). > Code to reproduce: > > {code:java} > import java.util.UUID > import scala.util.Random > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.SparkSession > val spark = SparkSession.builder() > .master("local[2]") > .appName("Test Broadcast").getOrCreate() > import spark.implicits._ > spark.conf.set("spark.sql.adaptive.enabled", "true") > val sc = spark.sparkContext > sc.setLogLevel("INFO") > val uuid = UUID.randomUUID > val df = sc.parallelize(Range(0, 1), 1).flatMap(x => { > for (i <- Range(0, 1 + Random.nextInt(1))) > yield (x % 26, x, Random.nextInt(10), UUID.randomUUID.toString) > }).toDF("index", "part", "pv", "uuid") > .withColumn("md5", md5($"uuid")) > val dim_data = Range(0, 26).map(x => (('a' + x).toChar.toString, x)) > val dim = dim_data.toDF("name", "index") > val result = df.groupBy("index") > .agg(sum($"pv").alias("pv"), countDistinct("uuid").alias("uv")) > .join(dim, Seq("index")) > .collect(){code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35589) BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating
Dongjoon Hyun created SPARK-35589: - Summary: BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating Key: SPARK-35589 URL: https://issues.apache.org/jira/browse/SPARK-35589 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35588) Merge Binder integration and quickstart notebook.
Hyukjin Kwon created SPARK-35588: Summary: Merge Binder integration and quickstart notebook. Key: SPARK-35588 URL: https://issues.apache.org/jira/browse/SPARK-35588 Project: Spark Issue Type: Sub-task Components: docs, PySpark Affects Versions: 3.2.0 Reporter: Hyukjin Kwon We should merge: https://github.com/apache/spark/blob/master/python/docs/source/getting_started/quickstart.ipynb https://github.com/databricks/koalas/blob/master/docs/source/getting_started/10min.ipynb -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35585) Support propagate empty relation through project/filter
[ https://issues.apache.org/jira/browse/SPARK-35585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35585: Assignee: Apache Spark > Support propagate empty relation through project/filter > --- > > Key: SPARK-35585 > URL: https://issues.apache.org/jira/browse/SPARK-35585 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Minor > > Support propagate empty local relation through project and filter like such > SQL case: > {code:java} > Aggregate > Project > Join > ShuffleStage > ShuffleStage > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35585) Support propagate empty relation through project/filter
[ https://issues.apache.org/jira/browse/SPARK-35585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354815#comment-17354815 ] Apache Spark commented on SPARK-35585: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/32724 > Support propagate empty relation through project/filter > --- > > Key: SPARK-35585 > URL: https://issues.apache.org/jira/browse/SPARK-35585 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Minor > > Support propagate empty local relation through project and filter like such > SQL case: > {code:java} > Aggregate > Project > Join > ShuffleStage > ShuffleStage > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35585) Support propagate empty relation through project/filter
[ https://issues.apache.org/jira/browse/SPARK-35585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35585: Assignee: (was: Apache Spark) > Support propagate empty relation through project/filter > --- > > Key: SPARK-35585 > URL: https://issues.apache.org/jira/browse/SPARK-35585 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Minor > > Support propagate empty local relation through project and filter like such > SQL case: > {code:java} > Aggregate > Project > Join > ShuffleStage > ShuffleStage > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35585) Support propagate empty relation through project/filter
[ https://issues.apache.org/jira/browse/SPARK-35585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354816#comment-17354816 ] Apache Spark commented on SPARK-35585: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/32724 > Support propagate empty relation through project/filter > --- > > Key: SPARK-35585 > URL: https://issues.apache.org/jira/browse/SPARK-35585 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Minor > > Support propagate empty local relation through project and filter like such > SQL case: > {code:java} > Aggregate > Project > Join > ShuffleStage > ShuffleStage > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35423) The output of PCA is inconsistent
[ https://issues.apache.org/jira/browse/SPARK-35423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354810#comment-17354810 ] shahid commented on SPARK-35423: I would like to analyse this issue > The output of PCA is inconsistent > - > > Key: SPARK-35423 > URL: https://issues.apache.org/jira/browse/SPARK-35423 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 3.1.1 > Environment: Spark Version: 3.1.1 >Reporter: cqfrog >Priority: Major > > 1. The example from doc > > {code:java} > import org.apache.spark.ml.feature.PCA > import org.apache.spark.ml.linalg.Vectors > val data = Array( > Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))), > Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0), > Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0) > ) > val df = spark.createDataFrame(data.map(Tuple1.apply)).toDF("features") > val pca = new PCA() > .setInputCol("features") > .setOutputCol("pcaFeatures") > .setK(3) > .fit(df) > val result = pca.transform(df).select("pcaFeatures") > result.show(false) > {code} > > > the output show: > {code:java} > +---+ > |pcaFeatures| > +---+ > |[1.6485728230883807,-4.013282700516296,-5.524543751369388] | > |[-4.645104331781534,-1.1167972663619026,-5.524543751369387]| > |[-6.428880535676489,-5.337951427775355,-5.524543751369389] | > +---+ > {code} > 2. change the Vector format > I modified the code from "Vectors.sparse(5, Seq((1, 1.0), (3, 7.0)))" to > "Vectors.dense(0.0,1.0,0.0,7.0,0.0)" 。 > but the output show: > {code:java} > ++ > |pcaFeatures | > ++ > |[1.6485728230883814,-4.0132827005162985,-1.0091435193998504]| > |[-4.645104331781533,-1.1167972663619048,-1.0091435193998501]| > |[-6.428880535676488,-5.337951427775359,-1.009143519399851] | > ++ > {code} > It's strange that the two outputs are inconsistent. Why? > Thanks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35579) Fix a bug in janino or work around it in Spark.
[ https://issues.apache.org/jira/browse/SPARK-35579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-35579: Priority: Blocker (was: Critical) > Fix a bug in janino or work around it in Spark. > --- > > Key: SPARK-35579 > URL: https://issues.apache.org/jira/browse/SPARK-35579 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Blocker > > See the test in SPARK-35578 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35583) Move JDBC data source options from Python and Scala into a single page
[ https://issues.apache.org/jira/browse/SPARK-35583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354804#comment-17354804 ] Apache Spark commented on SPARK-35583: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/32723 > Move JDBC data source options from Python and Scala into a single page > -- > > Key: SPARK-35583 > URL: https://issues.apache.org/jira/browse/SPARK-35583 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > Refer to https://issues.apache.org/jira/browse/SPARK-34491 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35583) Move JDBC data source options from Python and Scala into a single page
[ https://issues.apache.org/jira/browse/SPARK-35583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35583: Assignee: (was: Apache Spark) > Move JDBC data source options from Python and Scala into a single page > -- > > Key: SPARK-35583 > URL: https://issues.apache.org/jira/browse/SPARK-35583 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > Refer to https://issues.apache.org/jira/browse/SPARK-34491 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35583) Move JDBC data source options from Python and Scala into a single page
[ https://issues.apache.org/jira/browse/SPARK-35583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35583: Assignee: Apache Spark > Move JDBC data source options from Python and Scala into a single page > -- > > Key: SPARK-35583 > URL: https://issues.apache.org/jira/browse/SPARK-35583 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > Refer to https://issues.apache.org/jira/browse/SPARK-34491 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35583) Move JDBC data source options from Python and Scala into a single page
[ https://issues.apache.org/jira/browse/SPARK-35583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354803#comment-17354803 ] Apache Spark commented on SPARK-35583: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/32723 > Move JDBC data source options from Python and Scala into a single page > -- > > Key: SPARK-35583 > URL: https://issues.apache.org/jira/browse/SPARK-35583 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > Refer to https://issues.apache.org/jira/browse/SPARK-34491 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35587) Initial porting of Koalas documentation
Hyukjin Kwon created SPARK-35587: Summary: Initial porting of Koalas documentation Key: SPARK-35587 URL: https://issues.apache.org/jira/browse/SPARK-35587 Project: Spark Issue Type: Sub-task Components: docs, PySpark Affects Versions: 3.2.0 Reporter: Hyukjin Kwon This JIRA aims initial porting of the Koalas documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34885) Port/integrate Koalas documentation into PySpark
[ https://issues.apache.org/jira/browse/SPARK-34885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34885: - Target Version/s: 3.2.0 > Port/integrate Koalas documentation into PySpark > > > Key: SPARK-34885 > URL: https://issues.apache.org/jira/browse/SPARK-34885 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > This JIRA aims to port [Koalas > documentation|https://koalas.readthedocs.io/en/latest/index.html] > appropriately to [PySpark > documentation|https://spark.apache.org/docs/latest/api/python/index.html]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34885) Port/integrate Koalas documentation into PySpark
[ https://issues.apache.org/jira/browse/SPARK-34885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34885: - Parent: (was: SPARK-34849) Issue Type: Improvement (was: Sub-task) > Port/integrate Koalas documentation into PySpark > > > Key: SPARK-34885 > URL: https://issues.apache.org/jira/browse/SPARK-34885 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > This JIRA aims to port [Koalas > documentation|https://koalas.readthedocs.io/en/latest/index.html] > appropriately to [PySpark > documentation|https://spark.apache.org/docs/latest/api/python/index.html]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35586) Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
[ https://issues.apache.org/jira/browse/SPARK-35586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35586: Assignee: Kousuke Saruta (was: Apache Spark) > Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for > Kubernetes integration tests > -- > > Key: SPARK-35586 > URL: https://issues.apache.org/jira/browse/SPARK-35586 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In kubernetes/integration-tests/pom.xml, there is no default value for > spark.kubernetes.test.sparkTgz so running tests with the following command > will fail. > {code} > build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes > -Pkubernetes-integration-tests -Psparkr -pl > resource-managers/kubernetes/integration-tests integration-test > {code} > + mkdir -p > /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked > + tar -xzvf --test-exclude-tags --strip-components=1 -C > /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked > tar (child): --test-exclude-tags: Cannot open: No such file or directory > tar (child): Error is not recoverable: exiting now > tar: Child returned status 2 > tar: Error is not recoverable: exiting now > [ERROR] Command execution failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35586) Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
[ https://issues.apache.org/jira/browse/SPARK-35586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35586: Assignee: Apache Spark (was: Kousuke Saruta) > Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for > Kubernetes integration tests > -- > > Key: SPARK-35586 > URL: https://issues.apache.org/jira/browse/SPARK-35586 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > In kubernetes/integration-tests/pom.xml, there is no default value for > spark.kubernetes.test.sparkTgz so running tests with the following command > will fail. > {code} > build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes > -Pkubernetes-integration-tests -Psparkr -pl > resource-managers/kubernetes/integration-tests integration-test > {code} > + mkdir -p > /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked > + tar -xzvf --test-exclude-tags --strip-components=1 -C > /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked > tar (child): --test-exclude-tags: Cannot open: No such file or directory > tar (child): Error is not recoverable: exiting now > tar: Child returned status 2 > tar: Error is not recoverable: exiting now > [ERROR] Command execution failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35586) Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
[ https://issues.apache.org/jira/browse/SPARK-35586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354798#comment-17354798 ] Apache Spark commented on SPARK-35586: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32722 > Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for > Kubernetes integration tests > -- > > Key: SPARK-35586 > URL: https://issues.apache.org/jira/browse/SPARK-35586 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In kubernetes/integration-tests/pom.xml, there is no default value for > spark.kubernetes.test.sparkTgz so running tests with the following command > will fail. > {code} > build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes > -Pkubernetes-integration-tests -Psparkr -pl > resource-managers/kubernetes/integration-tests integration-test > {code} > + mkdir -p > /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked > + tar -xzvf --test-exclude-tags --strip-components=1 -C > /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked > tar (child): --test-exclude-tags: Cannot open: No such file or directory > tar (child): Error is not recoverable: exiting now > tar: Child returned status 2 > tar: Error is not recoverable: exiting now > [ERROR] Command execution failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35586) Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
[ https://issues.apache.org/jira/browse/SPARK-35586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35586: --- Description: In kubernetes/integration-tests/pom.xml, there is no default value for spark.kubernetes.test.sparkTgz so running tests with the following command will fail. {code} build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes -Pkubernetes-integration-tests -Psparkr -pl resource-managers/kubernetes/integration-tests integration-test {code} + mkdir -p /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked + tar -xzvf --test-exclude-tags --strip-components=1 -C /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked tar (child): --test-exclude-tags: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now [ERROR] Command execution failed. {code} was: In kubernetes/integration-tests/pom.xml, there are no default value for spark.kubernetes.test.sparkTgz so running tests with the following command will fail. {code} build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes -Pkubernetes-integration-tests -Psparkr -pl resource-managers/kubernetes/integration-tests integration-test {code} + mkdir -p /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked + tar -xzvf --test-exclude-tags --strip-components=1 -C /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked tar (child): --test-exclude-tags: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now [ERROR] Command execution failed. {code} > Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for > Kubernetes integration tests > -- > > Key: SPARK-35586 > URL: https://issues.apache.org/jira/browse/SPARK-35586 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Tests >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In kubernetes/integration-tests/pom.xml, there is no default value for > spark.kubernetes.test.sparkTgz so running tests with the following command > will fail. > {code} > build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes > -Pkubernetes-integration-tests -Psparkr -pl > resource-managers/kubernetes/integration-tests integration-test > {code} > + mkdir -p > /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked > + tar -xzvf --test-exclude-tags --strip-components=1 -C > /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked > tar (child): --test-exclude-tags: Cannot open: No such file or directory > tar (child): Error is not recoverable: exiting now > tar: Child returned status 2 > tar: Error is not recoverable: exiting now > [ERROR] Command execution failed. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35586) Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests
Kousuke Saruta created SPARK-35586: -- Summary: Set a default value for spark.kubernetes.test.sparkTgz in pom.xml for Kubernetes integration tests Key: SPARK-35586 URL: https://issues.apache.org/jira/browse/SPARK-35586 Project: Spark Issue Type: Bug Components: Kubernetes, Tests Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In kubernetes/integration-tests/pom.xml, there are no default value for spark.kubernetes.test.sparkTgz so running tests with the following command will fail. {code} build/mvn -Dspark.kubernetes.test.namespace=default -Pkubernetes -Pkubernetes-integration-tests -Psparkr -pl resource-managers/kubernetes/integration-tests integration-test {code} + mkdir -p /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked + tar -xzvf --test-exclude-tags --strip-components=1 -C /home/kou/work/oss/spark/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked tar (child): --test-exclude-tags: Cannot open: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now [ERROR] Command execution failed. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35077) Migrate to transformWithPruning for leftover optimizer rules
[ https://issues.apache.org/jira/browse/SPARK-35077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35077: Assignee: Apache Spark > Migrate to transformWithPruning for leftover optimizer rules > > > Key: SPARK-35077 > URL: https://issues.apache.org/jira/browse/SPARK-35077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 3.1.0 >Reporter: Yingyi Bu >Assignee: Apache Spark >Priority: Major > > E.g., PushDownPredicates and a few others. > > Commit > [https://github.com/apache/spark/commit/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631] > contains the framework level change and a few example rule changes. > > Example patterns: > [https://github.com/apache/spark/blob/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala#L24-L32] > > Example rule: > [https://github.com/apache/spark/blob/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala] > > [https://github.com/apache/spark/pull/32247] is another example -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35077) Migrate to transformWithPruning for leftover optimizer rules
[ https://issues.apache.org/jira/browse/SPARK-35077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354795#comment-17354795 ] Apache Spark commented on SPARK-35077: -- User 'sigmod' has created a pull request for this issue: https://github.com/apache/spark/pull/32721 > Migrate to transformWithPruning for leftover optimizer rules > > > Key: SPARK-35077 > URL: https://issues.apache.org/jira/browse/SPARK-35077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 3.1.0 >Reporter: Yingyi Bu >Priority: Major > > E.g., PushDownPredicates and a few others. > > Commit > [https://github.com/apache/spark/commit/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631] > contains the framework level change and a few example rule changes. > > Example patterns: > [https://github.com/apache/spark/blob/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala#L24-L32] > > Example rule: > [https://github.com/apache/spark/blob/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala] > > [https://github.com/apache/spark/pull/32247] is another example -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35077) Migrate to transformWithPruning for leftover optimizer rules
[ https://issues.apache.org/jira/browse/SPARK-35077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35077: Assignee: (was: Apache Spark) > Migrate to transformWithPruning for leftover optimizer rules > > > Key: SPARK-35077 > URL: https://issues.apache.org/jira/browse/SPARK-35077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 3.1.0 >Reporter: Yingyi Bu >Priority: Major > > E.g., PushDownPredicates and a few others. > > Commit > [https://github.com/apache/spark/commit/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631] > contains the framework level change and a few example rule changes. > > Example patterns: > [https://github.com/apache/spark/blob/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala#L24-L32] > > Example rule: > [https://github.com/apache/spark/blob/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala] > > [https://github.com/apache/spark/pull/32247] is another example -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35077) Migrate to transformWithPruning for leftover optimizer rules
[ https://issues.apache.org/jira/browse/SPARK-35077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35077: Assignee: Apache Spark > Migrate to transformWithPruning for leftover optimizer rules > > > Key: SPARK-35077 > URL: https://issues.apache.org/jira/browse/SPARK-35077 > Project: Spark > Issue Type: Sub-task > Components: Optimizer >Affects Versions: 3.1.0 >Reporter: Yingyi Bu >Assignee: Apache Spark >Priority: Major > > E.g., PushDownPredicates and a few others. > > Commit > [https://github.com/apache/spark/commit/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631] > contains the framework level change and a few example rule changes. > > Example patterns: > [https://github.com/apache/spark/blob/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala#L24-L32] > > Example rule: > [https://github.com/apache/spark/blob/3db8ec258c4a8438bda73c26fc7b1eb6f9d51631/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala] > > [https://github.com/apache/spark/pull/32247] is another example -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35585) Support propagate empty relation through project/filter
XiDuo You created SPARK-35585: - Summary: Support propagate empty relation through project/filter Key: SPARK-35585 URL: https://issues.apache.org/jira/browse/SPARK-35585 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: XiDuo You Support propagate empty local relation through project and filter like such SQL case: {code:java} Aggregate Project Join ShuffleStage ShuffleStage {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354783#comment-17354783 ] Apache Spark commented on SPARK-35576: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/32720 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.2, 3.1.2, > 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354782#comment-17354782 ] Apache Spark commented on SPARK-35576: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/32720 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.2, 3.1.2, > 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35544) Add tree pattern pruning into Analyzer rules
[ https://issues.apache.org/jira/browse/SPARK-35544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35544: -- Assignee: Yingyi Bu > Add tree pattern pruning into Analyzer rules > > > Key: SPARK-35544 > URL: https://issues.apache.org/jira/browse/SPARK-35544 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yingyi Bu >Assignee: Yingyi Bu >Priority: Major > > Analyzer rules have ruleid pruning, but do not have tree pattern prunings yet. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35544) Add tree pattern pruning into Analyzer rules
[ https://issues.apache.org/jira/browse/SPARK-35544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35544. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32686 [https://github.com/apache/spark/pull/32686] > Add tree pattern pruning into Analyzer rules > > > Key: SPARK-35544 > URL: https://issues.apache.org/jira/browse/SPARK-35544 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yingyi Bu >Assignee: Yingyi Bu >Priority: Major > Fix For: 3.2.0 > > > Analyzer rules have ruleid pruning, but do not have tree pattern prunings yet. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35584) Increase the timeout in FallbackStorageSuite
[ https://issues.apache.org/jira/browse/SPARK-35584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354765#comment-17354765 ] Apache Spark commented on SPARK-35584: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/32719 > Increase the timeout in FallbackStorageSuite > > > Key: SPARK-35584 > URL: https://issues.apache.org/jira/browse/SPARK-35584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.2 >Reporter: Yikun Jiang >Priority: Minor > > {{Discovery starting. Discovery completed in 2 seconds, 396 milliseconds. Run > starting. Expected test count is: 9 FallbackStorageSuite: - fallback storage > APIs - copy/exists - SPARK-34142: fallback storage API - cleanUp - migrate > shuffle data to fallback storage - Upload from all decommissioned executors}} > {{- Upload multi stages *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:243)}} > {{- lz4 - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.010694845 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{- lzf - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.00972281101 seconds. Last failure message: > fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} > {{- snappy - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.009750581 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{- zstd - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.00968885 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{Run completed in 1 minute, 37 seconds.}} > {{Total number of tests run: 9}} > {{Suites: completed 2, aborted 0}} > {{Tests: succeeded 4, failed 5, canceled 0, ignored 0, pending 0}} > {{*** 5 TESTS FAILED ***}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35584) Increase the timeout in FallbackStorageSuite
[ https://issues.apache.org/jira/browse/SPARK-35584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35584: Assignee: (was: Apache Spark) > Increase the timeout in FallbackStorageSuite > > > Key: SPARK-35584 > URL: https://issues.apache.org/jira/browse/SPARK-35584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.2 >Reporter: Yikun Jiang >Priority: Minor > > {{Discovery starting. Discovery completed in 2 seconds, 396 milliseconds. Run > starting. Expected test count is: 9 FallbackStorageSuite: - fallback storage > APIs - copy/exists - SPARK-34142: fallback storage API - cleanUp - migrate > shuffle data to fallback storage - Upload from all decommissioned executors}} > {{- Upload multi stages *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:243)}} > {{- lz4 - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.010694845 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{- lzf - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.00972281101 seconds. Last failure message: > fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} > {{- snappy - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.009750581 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{- zstd - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.00968885 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{Run completed in 1 minute, 37 seconds.}} > {{Total number of tests run: 9}} > {{Suites: completed 2, aborted 0}} > {{Tests: succeeded 4, failed 5, canceled 0, ignored 0, pending 0}} > {{*** 5 TESTS FAILED ***}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35584) Increase the timeout in FallbackStorageSuite
[ https://issues.apache.org/jira/browse/SPARK-35584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35584: Assignee: Apache Spark > Increase the timeout in FallbackStorageSuite > > > Key: SPARK-35584 > URL: https://issues.apache.org/jira/browse/SPARK-35584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.2 >Reporter: Yikun Jiang >Assignee: Apache Spark >Priority: Minor > > {{Discovery starting. Discovery completed in 2 seconds, 396 milliseconds. Run > starting. Expected test count is: 9 FallbackStorageSuite: - fallback storage > APIs - copy/exists - SPARK-34142: fallback storage API - cleanUp - migrate > shuffle data to fallback storage - Upload from all decommissioned executors}} > {{- Upload multi stages *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:243)}} > {{- lz4 - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.010694845 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{- lzf - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.00972281101 seconds. Last failure message: > fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} > {{- snappy - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.009750581 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{- zstd - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.00968885 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{Run completed in 1 minute, 37 seconds.}} > {{Total number of tests run: 9}} > {{Suites: completed 2, aborted 0}} > {{Tests: succeeded 4, failed 5, canceled 0, ignored 0, pending 0}} > {{*** 5 TESTS FAILED ***}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35584) Increase the timeout in FallbackStorageSuite
[ https://issues.apache.org/jira/browse/SPARK-35584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yikun Jiang updated SPARK-35584: Description: The aarch64 case failed due to: {{Discovery starting. Discovery completed in 2 seconds, 396 milliseconds. Run starting. Expected test count is: 9 FallbackStorageSuite: - fallback storage APIs - copy/exists - SPARK-34142: fallback storage API - cleanUp - migrate shuffle data to fallback storage - Upload from all decommissioned executors}} {{- Upload multi stages *** FAILED ***}} \{{ The code passed to eventually never returned normally. Attempted 20 times over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:243)}} {{- lz4 - Newly added executors should access old data from remote storage *** FAILED ***}} \{{ The code passed to eventually never returned normally. Attempted 20 times over 10.010694845 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- lzf - Newly added executors should access old data from remote storage *** FAILED ***}} \{{ The code passed to eventually never returned normally. Attempted 20 times over 10.00972281101 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- snappy - Newly added executors should access old data from remote storage *** FAILED ***}} \{{ The code passed to eventually never returned normally. Attempted 20 times over 10.009750581 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- zstd - Newly added executors should access old data from remote storage *** FAILED ***}} \{{ The code passed to eventually never returned normally. Attempted 20 times over 10.00968885 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{Run completed in 1 minute, 37 seconds.}} {{Total number of tests run: 9}} {{Suites: completed 2, aborted 0}} {{Tests: succeeded 4, failed 5, canceled 0, ignored 0, pending 0}} {{*** 5 TESTS FAILED ***}} was: {{Discovery starting. Discovery completed in 2 seconds, 396 milliseconds. Run starting. Expected test count is: 9 FallbackStorageSuite: - fallback storage APIs - copy/exists - SPARK-34142: fallback storage API - cleanUp - migrate shuffle data to fallback storage - Upload from all decommissioned executors}} {{- Upload multi stages *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:243)}} {{- lz4 - Newly added executors should access old data from remote storage *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.010694845 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- lzf - Newly added executors should access old data from remote storage *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.00972281101 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- snappy - Newly added executors should access old data from remote storage *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.009750581 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- zstd - Newly added executors should access old data from remote storage *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.00968885 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{Run completed in 1 minute, 37 seconds.}} {{Total number of tests run: 9}} {{Suites: completed 2, aborted 0}} {{Tests: succeeded 4, failed 5, canceled 0, ignored 0, pending 0}} {{*** 5 TESTS FAILED ***}} > Increase the timeout in FallbackStorageSuite > > > Key: SPARK-35584 > URL: https://issues.apache.org/jira/browse/SPARK-35584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.2 >Reporter: Yikun Jiang >Priority: Minor > > The aarch64 case failed due to: > > {{Discovery starting. Discovery completed in 2 seconds, 396 milliseconds. Run > starting. Expected test count is: 9 FallbackStorageSuite: - fallback storage > APIs - copy/exists - SPARK-34142: fallback storage API - cleanUp - migrate > shuffle data to fallback storage - Upload from all decommissioned executors}} > {{- Upload multi stages *** FAILED
[jira] [Commented] (SPARK-34059) Use for/foreach rather than map to make sure execute it eagerly
[ https://issues.apache.org/jira/browse/SPARK-34059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354762#comment-17354762 ] Apache Spark commented on SPARK-34059: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/32719 > Use for/foreach rather than map to make sure execute it eagerly > > > Key: SPARK-34059 > URL: https://issues.apache.org/jira/browse/SPARK-34059 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.4.8, 3.0.2, 3.1.1, 3.2.0 > > > This is virtually a clone of SPARK-16694. There are some more new places that > foreach has to be map. Please see the original ticket and PR for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34059) Use for/foreach rather than map to make sure execute it eagerly
[ https://issues.apache.org/jira/browse/SPARK-34059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354761#comment-17354761 ] Apache Spark commented on SPARK-34059: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/32719 > Use for/foreach rather than map to make sure execute it eagerly > > > Key: SPARK-34059 > URL: https://issues.apache.org/jira/browse/SPARK-34059 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 2.4.8, 3.0.2, 3.1.1, 3.2.0 > > > This is virtually a clone of SPARK-16694. There are some more new places that > foreach has to be map. Please see the original ticket and PR for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32166) Metastore problem on Spark3.0 with Hive3.0
[ https://issues.apache.org/jira/browse/SPARK-32166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354759#comment-17354759 ] angerszhu commented on SPARK-32166: --- http://apache-spark-user-list.1001560.n3.nabble.com/Re-Metastore-problem-on-Spark2-3-with-Hive3-0-td33474.html > Metastore problem on Spark3.0 with Hive3.0 > --- > > Key: SPARK-32166 > URL: https://issues.apache.org/jira/browse/SPARK-32166 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: hzk >Priority: Major > > When i use spark-sql to create table ,the problem appear. > {code:java} > create table bigbig as select b.user_id , b.name , b.age , c.address , c.city > , a.position , a.object , a.problem , a.complaint_time from ( select user_id > , position , object , problem , complaint_time from > HIVE_COMBINE_7efde4e2dcb34c218b3fb08872e698d5 ) as a left join > HIVE_ODS_17_TEST_DEMO_ODS_USERS_INFO_20200608141945 as b on b.user_id = > a.user_id left join HIVE_ODS_17_TEST_ADDRESS_CITY_20200608141942 as c on > c.address_id = b.address_id; > {code} > It opened a connection to hive metastore. > my hive version is 3.1.0. > {code:java} > org.apache.thrift.TApplicationException: Required field 'filesAdded' is > unset! > Struct:InsertEventRequestData(filesAdded:null)org.apache.thrift.TApplicationException: > Required field 'filesAdded' is unset! > Struct:InsertEventRequestData(filesAdded:null) at > org.apache.thrift.TApplicationException.read(TApplicationException.java:111) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4182) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4169) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:1954) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) > at com.sun.proxy.$Proxy5.fireListenerEvent(Unknown Source) at > org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:1947) at > org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1673) at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:847) at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:757) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:757) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:757) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255) > at > org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:756) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:829) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:827) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:827) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:827) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:416) > at > org.apache.spark.sql.execution.command.LoadDataCommand.run(tables.scala:403) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at >
[jira] [Commented] (SPARK-21957) Add current_user function
[ https://issues.apache.org/jira/browse/SPARK-21957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354755#comment-17354755 ] Apache Spark commented on SPARK-21957: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/32718 > Add current_user function > - > > Key: SPARK-21957 > URL: https://issues.apache.org/jira/browse/SPARK-21957 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: Marco Gaido >Priority: Minor > Labels: bulk-closed > > Spark doesn't support the {{current_user}} function. > Despite the user can be retrieved in other ways, the function would help > making easier to migrate existing Hive queries to Spark and it can also be > convenient for people who are just using SQL to interact with Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21957) Add current_user function
[ https://issues.apache.org/jira/browse/SPARK-21957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354756#comment-17354756 ] Apache Spark commented on SPARK-21957: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/32718 > Add current_user function > - > > Key: SPARK-21957 > URL: https://issues.apache.org/jira/browse/SPARK-21957 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.2.0 >Reporter: Marco Gaido >Priority: Minor > Labels: bulk-closed > > Spark doesn't support the {{current_user}} function. > Despite the user can be retrieved in other ways, the function would help > making easier to migrate existing Hive queries to Spark and it can also be > convenient for people who are just using SQL to interact with Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35584) Increase the timeout in FallbackStorageSuite
[ https://issues.apache.org/jira/browse/SPARK-35584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354753#comment-17354753 ] Yikun Jiang commented on SPARK-35584: - I also see some random timeout failed test on github action, like [1][2]: [[1] https://github.com/apache/spark/actions/runs/489319612|https://github.com/apache/spark/actions/runs/489319612] [[2]https://github.com/apache/spark/actions/runs/479317320|https://github.com/apache/spark/actions/runs/479317320] > Increase the timeout in FallbackStorageSuite > > > Key: SPARK-35584 > URL: https://issues.apache.org/jira/browse/SPARK-35584 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.1.2 >Reporter: Yikun Jiang >Priority: Minor > > {{Discovery starting. Discovery completed in 2 seconds, 396 milliseconds. Run > starting. Expected test count is: 9 FallbackStorageSuite: - fallback storage > APIs - copy/exists - SPARK-34142: fallback storage API - cleanUp - migrate > shuffle data to fallback storage - Upload from all decommissioned executors}} > {{- Upload multi stages *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:243)}} > {{- lz4 - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.010694845 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{- lzf - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.00972281101 seconds. Last failure message: > fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} > {{- snappy - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.009750581 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{- zstd - Newly added executors should access old data from remote storage > *** FAILED ***}} > {{ The code passed to eventually never returned normally. Attempted 20 times > over 10.00968885 seconds. Last failure message: fallbackStorage.exists(0, > file) was false. (FallbackStorageSuite.scala:268)}} > {{Run completed in 1 minute, 37 seconds.}} > {{Total number of tests run: 9}} > {{Suites: completed 2, aborted 0}} > {{Tests: succeeded 4, failed 5, canceled 0, ignored 0, pending 0}} > {{*** 5 TESTS FAILED ***}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35584) Increase the timeout in FallbackStorageSuite
Yikun Jiang created SPARK-35584: --- Summary: Increase the timeout in FallbackStorageSuite Key: SPARK-35584 URL: https://issues.apache.org/jira/browse/SPARK-35584 Project: Spark Issue Type: Test Components: Tests Affects Versions: 3.1.2 Reporter: Yikun Jiang {{Discovery starting. Discovery completed in 2 seconds, 396 milliseconds. Run starting. Expected test count is: 9 FallbackStorageSuite: - fallback storage APIs - copy/exists - SPARK-34142: fallback storage API - cleanUp - migrate shuffle data to fallback storage - Upload from all decommissioned executors}} {{- Upload multi stages *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.011176743 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:243)}} {{- lz4 - Newly added executors should access old data from remote storage *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.010694845 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- lzf - Newly added executors should access old data from remote storage *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.00972281101 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- snappy - Newly added executors should access old data from remote storage *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.009750581 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{- zstd - Newly added executors should access old data from remote storage *** FAILED ***}} {{ The code passed to eventually never returned normally. Attempted 20 times over 10.00968885 seconds. Last failure message: fallbackStorage.exists(0, file) was false. (FallbackStorageSuite.scala:268)}} {{Run completed in 1 minute, 37 seconds.}} {{Total number of tests run: 9}} {{Suites: completed 2, aborted 0}} {{Tests: succeeded 4, failed 5, canceled 0, ignored 0, pending 0}} {{*** 5 TESTS FAILED ***}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35583) Move JDBC data source options from Python and Scala into a single page
Haejoon Lee created SPARK-35583: --- Summary: Move JDBC data source options from Python and Scala into a single page Key: SPARK-35583 URL: https://issues.apache.org/jira/browse/SPARK-35583 Project: Spark Issue Type: Sub-task Components: docs Affects Versions: 3.2.0 Reporter: Haejoon Lee Refer to https://issues.apache.org/jira/browse/SPARK-34491 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35433) Move CSV data source options from Python and Scala into a single page.
[ https://issues.apache.org/jira/browse/SPARK-35433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35433. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32658 [https://github.com/apache/spark/pull/32658] > Move CSV data source options from Python and Scala into a single page. > -- > > Key: SPARK-35433 > URL: https://issues.apache.org/jira/browse/SPARK-35433 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.2.0 > > > Refer to https://issues.apache.org/jira/browse/SPARK-34491 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35433) Move CSV data source options from Python and Scala into a single page.
[ https://issues.apache.org/jira/browse/SPARK-35433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35433: Assignee: Haejoon Lee > Move CSV data source options from Python and Scala into a single page. > -- > > Key: SPARK-35433 > URL: https://issues.apache.org/jira/browse/SPARK-35433 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Refer to https://issues.apache.org/jira/browse/SPARK-34491 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35578) Add a test case for a janino bug
[ https://issues.apache.org/jira/browse/SPARK-35578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35578. -- Fix Version/s: 3.2.0 Assignee: Wenchen Fan Resolution: Fixed Fixed in https://github.com/apache/spark/pull/32716 > Add a test case for a janino bug > > > Key: SPARK-35578 > URL: https://issues.apache.org/jira/browse/SPARK-35578 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35582) Remove # noqa in Python API documents.
[ https://issues.apache.org/jira/browse/SPARK-35582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354732#comment-17354732 ] Haejoon Lee commented on SPARK-35582: - I'm working on this > Remove # noqa in Python API documents. > -- > > Key: SPARK-35582 > URL: https://issues.apache.org/jira/browse/SPARK-35582 > Project: Spark > Issue Type: Sub-task > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > There are some unnecessary words "# noqa" are exposed in the Python API > documentation. > > For example, > [https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameReader.parquet.html#pyspark.sql.DataFrameReader.parquet.] > > We should remove this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35582) Remove # noqa in Python API documents.
Haejoon Lee created SPARK-35582: --- Summary: Remove # noqa in Python API documents. Key: SPARK-35582 URL: https://issues.apache.org/jira/browse/SPARK-35582 Project: Spark Issue Type: Sub-task Components: docs, PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee There are some unnecessary words "# noqa" are exposed in the Python API documentation. For example, [https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameReader.parquet.html#pyspark.sql.DataFrameReader.parquet.] We should remove this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35573) Support R 4.1.0
[ https://issues.apache.org/jira/browse/SPARK-35573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35573: Assignee: Hyukjin Kwon (was: Dongjoon Hyun) > Support R 4.1.0 > --- > > Key: SPARK-35573 > URL: https://issues.apache.org/jira/browse/SPARK-35573 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.3, 3.2.0, 3.1.3 > > > Currently, there exists 6 SparkR UT failures in R 4.1.0. > Until R 4.0.5, there was no errors. > {code} > ══ Failed > ══ > ── 1. Failure (test_sparkSQL_arrow.R:71:3): createDataFrame/collect Arrow > optimi > collect(createDataFrame(rdf)) not equal to `expected`. > Component “g”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 2. Failure (test_sparkSQL_arrow.R:143:3): dapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 3. Failure (test_sparkSQL_arrow.R:229:3): gapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 4. Error (test_sparkSQL.R:1454:3): column functions > ─ > Error: (converted from warning) cannot xtfrm data frames > Backtrace: > 1. base::sort(collect(distinct(select(df, input_file_name() > test_sparkSQL.R:1454:2 > 2. base::sort.default(collect(distinct(select(df, input_file_name() > 5. base::order(x, na.last = na.last, decreasing = decreasing) > 6. base::lapply(z, function(x) if (is.object(x)) as.vector(xtfrm(x)) else x) > 7. base:::FUN(X[[i]], ...) > 10. base::xtfrm.data.frame(x) > ── 5. Failure (test_utils.R:67:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > ── 6. Failure (test_utils.R:80:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35573) Make SparkR tests pass with R 4.1+
[ https://issues.apache.org/jira/browse/SPARK-35573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-35573: - Summary: Make SparkR tests pass with R 4.1+ (was: Support R 4.1.0) > Make SparkR tests pass with R 4.1+ > -- > > Key: SPARK-35573 > URL: https://issues.apache.org/jira/browse/SPARK-35573 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.3, 3.2.0, 3.1.3 > > > Currently, there exists 6 SparkR UT failures in R 4.1.0. > Until R 4.0.5, there was no errors. > {code} > ══ Failed > ══ > ── 1. Failure (test_sparkSQL_arrow.R:71:3): createDataFrame/collect Arrow > optimi > collect(createDataFrame(rdf)) not equal to `expected`. > Component “g”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 2. Failure (test_sparkSQL_arrow.R:143:3): dapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 3. Failure (test_sparkSQL_arrow.R:229:3): gapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 4. Error (test_sparkSQL.R:1454:3): column functions > ─ > Error: (converted from warning) cannot xtfrm data frames > Backtrace: > 1. base::sort(collect(distinct(select(df, input_file_name() > test_sparkSQL.R:1454:2 > 2. base::sort.default(collect(distinct(select(df, input_file_name() > 5. base::order(x, na.last = na.last, decreasing = decreasing) > 6. base::lapply(z, function(x) if (is.object(x)) as.vector(xtfrm(x)) else x) > 7. base:::FUN(X[[i]], ...) > 10. base::xtfrm.data.frame(x) > ── 5. Failure (test_utils.R:67:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > ── 6. Failure (test_utils.R:80:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35573) Support R 4.1.0
[ https://issues.apache.org/jira/browse/SPARK-35573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-35573: - Fix Version/s: 3.1.3 3.0.3 > Support R 4.1.0 > --- > > Key: SPARK-35573 > URL: https://issues.apache.org/jira/browse/SPARK-35573 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.3, 3.2.0, 3.1.3 > > > Currently, there exists 6 SparkR UT failures in R 4.1.0. > Until R 4.0.5, there was no errors. > {code} > ══ Failed > ══ > ── 1. Failure (test_sparkSQL_arrow.R:71:3): createDataFrame/collect Arrow > optimi > collect(createDataFrame(rdf)) not equal to `expected`. > Component “g”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 2. Failure (test_sparkSQL_arrow.R:143:3): dapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 3. Failure (test_sparkSQL_arrow.R:229:3): gapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 4. Error (test_sparkSQL.R:1454:3): column functions > ─ > Error: (converted from warning) cannot xtfrm data frames > Backtrace: > 1. base::sort(collect(distinct(select(df, input_file_name() > test_sparkSQL.R:1454:2 > 2. base::sort.default(collect(distinct(select(df, input_file_name() > 5. base::order(x, na.last = na.last, decreasing = decreasing) > 6. base::lapply(z, function(x) if (is.object(x)) as.vector(xtfrm(x)) else x) > 7. base:::FUN(X[[i]], ...) > 10. base::xtfrm.data.frame(x) > ── 5. Failure (test_utils.R:67:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > ── 6. Failure (test_utils.R:80:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35573) Support R 4.1.0
[ https://issues.apache.org/jira/browse/SPARK-35573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35573. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32709 [https://github.com/apache/spark/pull/32709] > Support R 4.1.0 > --- > > Key: SPARK-35573 > URL: https://issues.apache.org/jira/browse/SPARK-35573 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > > Currently, there exists 6 SparkR UT failures in R 4.1.0. > Until R 4.0.5, there was no errors. > {code} > ══ Failed > ══ > ── 1. Failure (test_sparkSQL_arrow.R:71:3): createDataFrame/collect Arrow > optimi > collect(createDataFrame(rdf)) not equal to `expected`. > Component “g”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 2. Failure (test_sparkSQL_arrow.R:143:3): dapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 3. Failure (test_sparkSQL_arrow.R:229:3): gapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 4. Error (test_sparkSQL.R:1454:3): column functions > ─ > Error: (converted from warning) cannot xtfrm data frames > Backtrace: > 1. base::sort(collect(distinct(select(df, input_file_name() > test_sparkSQL.R:1454:2 > 2. base::sort.default(collect(distinct(select(df, input_file_name() > 5. base::order(x, na.last = na.last, decreasing = decreasing) > 6. base::lapply(z, function(x) if (is.object(x)) as.vector(xtfrm(x)) else x) > 7. base:::FUN(X[[i]], ...) > 10. base::xtfrm.data.frame(x) > ── 5. Failure (test_utils.R:67:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > ── 6. Failure (test_utils.R:80:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35573) Support R 4.1.0
[ https://issues.apache.org/jira/browse/SPARK-35573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35573: Assignee: Dongjoon Hyun > Support R 4.1.0 > --- > > Key: SPARK-35573 > URL: https://issues.apache.org/jira/browse/SPARK-35573 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > Currently, there exists 6 SparkR UT failures in R 4.1.0. > Until R 4.0.5, there was no errors. > {code} > ══ Failed > ══ > ── 1. Failure (test_sparkSQL_arrow.R:71:3): createDataFrame/collect Arrow > optimi > collect(createDataFrame(rdf)) not equal to `expected`. > Component “g”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 2. Failure (test_sparkSQL_arrow.R:143:3): dapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 3. Failure (test_sparkSQL_arrow.R:229:3): gapply() Arrow optimization - > type > collect(ret) not equal to `rdf`. > Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') > ── 4. Error (test_sparkSQL.R:1454:3): column functions > ─ > Error: (converted from warning) cannot xtfrm data frames > Backtrace: > 1. base::sort(collect(distinct(select(df, input_file_name() > test_sparkSQL.R:1454:2 > 2. base::sort.default(collect(distinct(select(df, input_file_name() > 5. base::order(x, na.last = na.last, decreasing = decreasing) > 6. base::lapply(z, function(x) if (is.object(x)) as.vector(xtfrm(x)) else x) > 7. base:::FUN(X[[i]], ...) > 10. base::xtfrm.data.frame(x) > ── 5. Failure (test_utils.R:67:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > ── 6. Failure (test_utils.R:80:3): cleanClosure on R functions > ─ > `actual` not equal to `g`. > names for current but not for target > Length mismatch: comparison on first 0 components > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35453) Move Koalas accessor to pandas_on_spark accessor
[ https://issues.apache.org/jira/browse/SPARK-35453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35453. -- Fix Version/s: 3.2.0 Assignee: Haejoon Lee Resolution: Fixed Fixed in https://github.com/apache/spark/pull/32674 > Move Koalas accessor to pandas_on_spark accessor > > > Key: SPARK-35453 > URL: https://issues.apache.org/jira/browse/SPARK-35453 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.2.0 > > > The existing Koalas has the "Koalas accessor" which named after Koalas > project. > > We should rename this accessor to "Pandas on Spark accessor". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35561) partition result is incorrect when insert into partition table with int datatype partition column
[ https://issues.apache.org/jira/browse/SPARK-35561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354728#comment-17354728 ] YuanGuanhu commented on SPARK-35561: [~Stelyus] I know,but the amazing thing is, if I execute this statement `insert into table orc_part03 partition (p_int=002) select * from partitiontb04 where id > 10006`, the partition is 002. I think we should have same behavior. > partition result is incorrect when insert into partition table with int > datatype partition column > - > > Key: SPARK-35561 > URL: https://issues.apache.org/jira/browse/SPARK-35561 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.1, 3.1.2 >Reporter: YuanGuanhu >Priority: Major > > when inserting into partitioned table with int datatype partition column, if > partition column value is starting with 0, like 001, get wrong partition > result > > *How to reproduce the problem:* > CREATE TABLE partitiontb04 (id INT, c_string STRING) STORED AS orc; > insert into table partitiontb04 values (10001,'test1'); > CREATE TABLE orc_part03(id INT, c_string STRING) partitioned by (p_int int) > STORED AS orc; > insert into table orc_part03 partition (p_int=001) select * from > partitiontb04 where id < 10006; > show partitions orc_part03; > expect result: > p_int=001 > > actural result: > p_int=1 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-35396) Support to manual close/release entries in MemoryStore and InMemoryRelation instead of replying on GC
[ https://issues.apache.org/jira/browse/SPARK-35396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chendi.Xue reopened SPARK-35396: Reopen this JIRA, since this Jira is aim to add manual close to both MemoryStore and InMemoryRelation, and the second PR was just submitted. https://github.com/apache/spark/pull/32717 > Support to manual close/release entries in MemoryStore and InMemoryRelation > instead of replying on GC > - > > Key: SPARK-35396 > URL: https://issues.apache.org/jira/browse/SPARK-35396 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Chendi.Xue >Assignee: Apache Spark >Priority: Minor > Fix For: 3.2.0 > > > This PR is proposing a add-on to support to manual close entries in > MemoryStore and InMemoryRelation > h3. What changes were proposed in this pull request? > Currently: > MemoryStore uses a LinkedHashMap[BlockId, MemoryEntry[_]] to store all OnHeap > or OffHeap entries. > And when memoryStore.remove(blockId) is called, codes will simply remove one > entry from LinkedHashMap and leverage Java GC to do release work. > This PR: > We are proposing a add-on to manually close any object stored in MemoryStore > and InMemoryRelation if this object is extended from AutoCloseable. > Veifiication: > In our own use case, we implemented a user-defined off-heap-hashRelation for > BHJ, and we verified that by adding this manual close, we can make sure our > defined off-heap-hashRelation can be released when evict is called. > Also, we implemented user-defined cachedBatch and will be release when > InMemoryRelation.clearCache() is called by this PR > h3. Why are the changes needed? > This changes can help to clean some off-heap user-defined object may be > cached in InMemoryRelation or MemoryStore > h3. Does this PR introduce _any_ user-facing change? > NO > h3. How was this patch tested? > WIP > Signed-off-by: Chendi Xue [chendi@intel.com|mailto:chendi@intel.com] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35396) Support to manual close/release entries in MemoryStore and InMemoryRelation instead of replying on GC
[ https://issues.apache.org/jira/browse/SPARK-35396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354724#comment-17354724 ] Apache Spark commented on SPARK-35396: -- User 'xuechendi' has created a pull request for this issue: https://github.com/apache/spark/pull/32717 > Support to manual close/release entries in MemoryStore and InMemoryRelation > instead of replying on GC > - > > Key: SPARK-35396 > URL: https://issues.apache.org/jira/browse/SPARK-35396 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Chendi.Xue >Assignee: Apache Spark >Priority: Minor > Fix For: 3.2.0 > > > This PR is proposing a add-on to support to manual close entries in > MemoryStore and InMemoryRelation > h3. What changes were proposed in this pull request? > Currently: > MemoryStore uses a LinkedHashMap[BlockId, MemoryEntry[_]] to store all OnHeap > or OffHeap entries. > And when memoryStore.remove(blockId) is called, codes will simply remove one > entry from LinkedHashMap and leverage Java GC to do release work. > This PR: > We are proposing a add-on to manually close any object stored in MemoryStore > and InMemoryRelation if this object is extended from AutoCloseable. > Veifiication: > In our own use case, we implemented a user-defined off-heap-hashRelation for > BHJ, and we verified that by adding this manual close, we can make sure our > defined off-heap-hashRelation can be released when evict is called. > Also, we implemented user-defined cachedBatch and will be release when > InMemoryRelation.clearCache() is called by this PR > h3. Why are the changes needed? > This changes can help to clean some off-heap user-defined object may be > cached in InMemoryRelation or MemoryStore > h3. Does this PR introduce _any_ user-facing change? > NO > h3. How was this patch tested? > WIP > Signed-off-by: Chendi Xue [chendi@intel.com|mailto:chendi@intel.com] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35396) Support to manual close/release entries in MemoryStore and InMemoryRelation instead of replying on GC
[ https://issues.apache.org/jira/browse/SPARK-35396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354723#comment-17354723 ] Apache Spark commented on SPARK-35396: -- User 'xuechendi' has created a pull request for this issue: https://github.com/apache/spark/pull/32717 > Support to manual close/release entries in MemoryStore and InMemoryRelation > instead of replying on GC > - > > Key: SPARK-35396 > URL: https://issues.apache.org/jira/browse/SPARK-35396 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Chendi.Xue >Assignee: Apache Spark >Priority: Minor > Fix For: 3.2.0 > > > This PR is proposing a add-on to support to manual close entries in > MemoryStore and InMemoryRelation > h3. What changes were proposed in this pull request? > Currently: > MemoryStore uses a LinkedHashMap[BlockId, MemoryEntry[_]] to store all OnHeap > or OffHeap entries. > And when memoryStore.remove(blockId) is called, codes will simply remove one > entry from LinkedHashMap and leverage Java GC to do release work. > This PR: > We are proposing a add-on to manually close any object stored in MemoryStore > and InMemoryRelation if this object is extended from AutoCloseable. > Veifiication: > In our own use case, we implemented a user-defined off-heap-hashRelation for > BHJ, and we verified that by adding this manual close, we can make sure our > defined off-heap-hashRelation can be released when evict is called. > Also, we implemented user-defined cachedBatch and will be release when > InMemoryRelation.clearCache() is called by this PR > h3. Why are the changes needed? > This changes can help to clean some off-heap user-defined object may be > cached in InMemoryRelation or MemoryStore > h3. Does this PR introduce _any_ user-facing change? > NO > h3. How was this patch tested? > WIP > Signed-off-by: Chendi Xue [chendi@intel.com|mailto:chendi@intel.com] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34731) ConcurrentModificationException in EventLoggingListener when redacting properties
[ https://issues.apache.org/jira/browse/SPARK-34731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354670#comment-17354670 ] John Pugliesi commented on SPARK-34731: --- To clarify, does this issue potentially prevent event logs from being created/written entirely? We're seeing this exception in some of our Spark 3.1.1 applications - namely the applications with particularly large Window queries - where the final event log is never successfully written out (using an s3a:// spark.eventLog.dir, for what it's worth): {code:bash} # spark-defaults.conf spark.eventLog.enabled true spark.eventLog.dir s3a://my-bucket/spark-event-logs/ {code} > ConcurrentModificationException in EventLoggingListener when redacting > properties > - > > Key: SPARK-34731 > URL: https://issues.apache.org/jira/browse/SPARK-34731 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1, 3.2.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Fix For: 3.1.2, 3.2.0 > > > Reproduction: > The key elements of reproduction are enabling event logging, setting > spark.executor.cores, and some bad luck: > {noformat} > $ bin/spark-shell --conf spark.ui.showConsoleProgress=false \ > --conf spark.executor.cores=1 --driver-memory 4g --conf \ > "spark.ui.showConsoleProgress=false" \ > --conf spark.eventLog.enabled=true \ > --conf spark.eventLog.dir=/tmp/spark-events > ... > scala> (0 to 500).foreach { i => > | val df = spark.range(0, 2).toDF("a") > | df.filter("a > 12").count > | } > 21/03/12 18:16:44 ERROR AsyncEventQueue: Listener EventLoggingListener threw > an exception > java.util.ConcurrentModificationException > at java.util.Hashtable$Enumerator.next(Hashtable.java:1387) > at > scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:424) > at > scala.collection.convert.Wrappers$JPropertiesWrapper$$anon$6.next(Wrappers.scala:420) > at scala.collection.Iterator.foreach(Iterator.scala:941) > at scala.collection.Iterator.foreach$(Iterator.scala:941) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.mutable.MapLike.toSeq(MapLike.scala:75) > at scala.collection.mutable.MapLike.toSeq$(MapLike.scala:72) > at scala.collection.mutable.AbstractMap.toSeq(Map.scala:82) > at > org.apache.spark.scheduler.EventLoggingListener.redactProperties(EventLoggingListener.scala:290) > at > org.apache.spark.scheduler.EventLoggingListener.onJobStart(EventLoggingListener.scala:162) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:37) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) > at > scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) > at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1379) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) > {noformat} > Analysis from quick reading of the code: > DAGScheduler posts a JobSubmitted event containing a clone of a properties > object > [here|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L834]. > This event is handled > [here|https://github.com/apache/spark/blob/4f1e434ec57070b52b28f98c66b53ca6ec4de7a4/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2394]. > DAGScheduler#handleJobSubmitted stores the properties object in a [Job >
[jira] [Resolved] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35576. --- Fix Version/s: 3.2.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/32712 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.2, 3.1.2, > 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35576: -- Affects Version/s: 1.6.3 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.2, 3.1.2, > 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35576: -- Affects Version/s: 2.0.2 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.2, 3.1.2, 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35576: -- Affects Version/s: 2.1.3 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.2, 3.1.2, 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35576: -- Affects Version/s: 2.2.3 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.2, 3.1.2, 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35576: -- Affects Version/s: 2.3.4 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 2.3.4, 2.4.8, 3.0.2, 3.1.2, 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35576: -- Affects Version/s: 2.4.8 3.0.2 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.2, 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35576: -- Issue Type: Bug (was: Task) > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Bug > Components: Security, SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35581) Casting special strings to DATE/TIMESTAMP returns inconsistent results
[ https://issues.apache.org/jira/browse/SPARK-35581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35581: Assignee: Max Gekk (was: Apache Spark) > Casting special strings to DATE/TIMESTAMP returns inconsistent results > -- > > Key: SPARK-35581 > URL: https://issues.apache.org/jira/browse/SPARK-35581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > When casting the special values "now", "today", "tomorrow", and "yesterday" > to DATE/TIMESTAMP, Spark may return inconsistent results. > Looks like Spark runs the expression on each executor, on every row > independently. So the results could differ across executors if they have > different system time, and across rows because of the resolution of "now". > https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L876 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35581) Casting special strings to DATE/TIMESTAMP returns inconsistent results
[ https://issues.apache.org/jira/browse/SPARK-35581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35581: Assignee: Apache Spark (was: Max Gekk) > Casting special strings to DATE/TIMESTAMP returns inconsistent results > -- > > Key: SPARK-35581 > URL: https://issues.apache.org/jira/browse/SPARK-35581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > When casting the special values "now", "today", "tomorrow", and "yesterday" > to DATE/TIMESTAMP, Spark may return inconsistent results. > Looks like Spark runs the expression on each executor, on every row > independently. So the results could differ across executors if they have > different system time, and across rows because of the resolution of "now". > https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L876 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35581) Casting special strings to DATE/TIMESTAMP returns inconsistent results
[ https://issues.apache.org/jira/browse/SPARK-35581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354651#comment-17354651 ] Apache Spark commented on SPARK-35581: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/32714 > Casting special strings to DATE/TIMESTAMP returns inconsistent results > -- > > Key: SPARK-35581 > URL: https://issues.apache.org/jira/browse/SPARK-35581 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > When casting the special values "now", "today", "tomorrow", and "yesterday" > to DATE/TIMESTAMP, Spark may return inconsistent results. > Looks like Spark runs the expression on each executor, on every row > independently. So the results could differ across executors if they have > different system time, and across rows because of the resolution of "now". > https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L876 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35581) Casting special strings to DATE/TIMESTAMP returns inconsistent results
Max Gekk created SPARK-35581: Summary: Casting special strings to DATE/TIMESTAMP returns inconsistent results Key: SPARK-35581 URL: https://issues.apache.org/jira/browse/SPARK-35581 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Max Gekk Assignee: Max Gekk When casting the special values "now", "today", "tomorrow", and "yesterday" to DATE/TIMESTAMP, Spark may return inconsistent results. Looks like Spark runs the expression on each executor, on every row independently. So the results could differ across executors if they have different system time, and across rows because of the resolution of "now". https://github.com/databricks/runtime/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L876 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35564) Support subexpression elimination for non-common branches of conditional expressions
[ https://issues.apache.org/jira/browse/SPARK-35564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354637#comment-17354637 ] Adam Binford commented on SPARK-35564: -- A 2x gain would be pretty significant to us, I don't know about others. I'm planning to implement this in our fork and if I get good results I'll put up a PR for further discussion. Could optionally add a config for this if it's workload dependent. Also, the only thing it could likely do to the generated code is reduce the overall size, albeit with more functional calls in worst cases. Whether smaller code size adds any value, I don't know enough about Java to know. >Oh, this is another issue. I noticed it last time when I worked on another PR >recently, but don't have time to look at it yet. I created https://issues.apache.org/jira/browse/SPARK-35580 to track what I've figured out so far. Not sure what the right fix is. > Support subexpression elimination for non-common branches of conditional > expressions > > > Key: SPARK-35564 > URL: https://issues.apache.org/jira/browse/SPARK-35564 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Adam Binford >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-7 added support for pulling > subexpressions out of branches of conditional expressions for expressions > present in all branches. We should be able to take this a step further and > pull out subexpressions for any branch, as long as that expression will > definitely be evaluated at least once. > Consider a common data validation example: > {code:java} > from pyspark.sql.functions import * > df = spark.createDataFrame([['word'], ['1234']]) > col = regexp_replace('_1', r'\d', '') > df = df.withColumn('numbers_removed', when(length(col) > 0, col)){code} > We only want to keep the value if it's non-empty with numbers removed, > otherwise we want it to be null. > Because we have no otherwise value, `col` is not a candidate for > subexpression elimination (you can see two regular expression replacements in > the codegen). But whenever the length is greater than 0, we will have to > execute the regular expression replacement twice. Since we know we will > always calculate `col` at least once, it makes sense to consider that as a > subexpression since we might need it again in the branch value. So we can > update the logic from: > Create a subexpression if an expression will always be evaluated at least > twice > To: > Create a subexpression if an expression will always be evaluated at least > once AND will either always or conditionally be evaluated at least twice. > The trade off is potentially another subexpression function call (for split > subexpressions) if the second evaluation doesn't happen, but this seems like > it would be worth it for when it is evaluated the second time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35580) Support subexpression elimination for higher order functions
Adam Binford created SPARK-35580: Summary: Support subexpression elimination for higher order functions Key: SPARK-35580 URL: https://issues.apache.org/jira/browse/SPARK-35580 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.1 Reporter: Adam Binford Currently higher order functions are not candidates for subexpression elimination. This is because all higher order functions have different semantic hashes, due to "exprId" and "value" in "NamedLambdaVariable". These always are unique, so the semanticHash of a NamedLambdaVariable is always unique. Also, [https://github.com/apache/spark/pull/32424] might throw a wrench in things some too, depending on how you define your expressions the name could be different. {code:java} scala> var d = transform($"a", x => x + 1) d: org.apache.spark.sql.Column = transform(a, lambdafunction((x_2 + 1), x_2)) scala> var e = transform($"a", x => x + 1) e: org.apache.spark.sql.Column = transform(a, lambdafunction((x_3 + 1), x_3)) scala> struct(d.alias("1"), d.alias("2")).expr res9: org.apache.spark.sql.catalyst.expressions.Expression = struct(NamePlaceholder, transform('a, lambdafunction((lambda 'x_2 + 1), lambda 'x_2, false)) AS 1#4, NamePlaceholder, transform('a, lambdafunction((lambda 'x_2 + 1), lambda 'x_2, false)) AS 2#5) scala> struct(d.alias("1"), e.alias("2")).expr res10: org.apache.spark.sql.catalyst.expressions.Expression = struct(NamePlaceholder, transform('a, lambdafunction((lambda 'x_2 + 1), lambda 'x_2, false)) AS 1#6, NamePlaceholder, transform('a, lambdafunction((lambda 'x_3 + 1), lambda 'x_3, false)) AS 2#7) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35564) Support subexpression elimination for non-common branches of conditional expressions
[ https://issues.apache.org/jira/browse/SPARK-35564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354590#comment-17354590 ] L. C. Hsieh commented on SPARK-35564: - > I don't really think this is much of a corner case, but a common case of > using a when expression for data validation. Most of our ETL process comes > down to normalizing, cleaning, and validating strings, which at the end of > the day usually looks like: This is a corner case because it simplifies other possible cases, although you might actually use this pattern in your ETL process. For example, when we treat an always-evaluate-at-least-once and optionally-evaluate-at-least-once expression as subexpression, there are many expressions qualified for this. A child expression of the first predicate of when, if it is also part of any conditional predicate/value, might also be treated as subexpression. Finally we might end with tons of subexpressions like that to flood generated code. On the other hand, how much gain we can get from this case? In the example, for the worst case we evaluate it twice, not 5 or 10 times. It may be just small piece of the entire ETL process. I feel it's not worth because we might pay a lot cost including making the code more complicated and creating tons of subexpressions, but in the end we only get a little bit from it and it is also only for a worst case. > though currently higher order functions are always semantically different so > they don't get subexpressions regardless I think. That's something I plan to > look into as a follow up. Oh, this is another issue. I noticed it last time when I worked on another PR recently, but don't have time to look at it yet. > Support subexpression elimination for non-common branches of conditional > expressions > > > Key: SPARK-35564 > URL: https://issues.apache.org/jira/browse/SPARK-35564 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Adam Binford >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-7 added support for pulling > subexpressions out of branches of conditional expressions for expressions > present in all branches. We should be able to take this a step further and > pull out subexpressions for any branch, as long as that expression will > definitely be evaluated at least once. > Consider a common data validation example: > {code:java} > from pyspark.sql.functions import * > df = spark.createDataFrame([['word'], ['1234']]) > col = regexp_replace('_1', r'\d', '') > df = df.withColumn('numbers_removed', when(length(col) > 0, col)){code} > We only want to keep the value if it's non-empty with numbers removed, > otherwise we want it to be null. > Because we have no otherwise value, `col` is not a candidate for > subexpression elimination (you can see two regular expression replacements in > the codegen). But whenever the length is greater than 0, we will have to > execute the regular expression replacement twice. Since we know we will > always calculate `col` at least once, it makes sense to consider that as a > subexpression since we might need it again in the branch value. So we can > update the logic from: > Create a subexpression if an expression will always be evaluated at least > twice > To: > Create a subexpression if an expression will always be evaluated at least > once AND will either always or conditionally be evaluated at least twice. > The trade off is potentially another subexpression function call (for split > subexpressions) if the second evaluation doesn't happen, but this seems like > it would be worth it for when it is evaluated the second time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35561) partition result is incorrect when insert into partition table with int datatype partition column
[ https://issues.apache.org/jira/browse/SPARK-35561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354556#comment-17354556 ] Franck Thang commented on SPARK-35561: -- I personally don't expect 001 because the type is an INT, if I wanted 001, I would have use the type STRING > partition result is incorrect when insert into partition table with int > datatype partition column > - > > Key: SPARK-35561 > URL: https://issues.apache.org/jira/browse/SPARK-35561 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.1, 3.1.2 >Reporter: YuanGuanhu >Priority: Major > > when inserting into partitioned table with int datatype partition column, if > partition column value is starting with 0, like 001, get wrong partition > result > > *How to reproduce the problem:* > CREATE TABLE partitiontb04 (id INT, c_string STRING) STORED AS orc; > insert into table partitiontb04 values (10001,'test1'); > CREATE TABLE orc_part03(id INT, c_string STRING) partitioned by (p_int int) > STORED AS orc; > insert into table orc_part03 partition (p_int=001) select * from > partitiontb04 where id < 10006; > show partitions orc_part03; > expect result: > p_int=001 > > actural result: > p_int=1 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35567) Explain cost is not showing statistics for all the nodes
[ https://issues.apache.org/jira/browse/SPARK-35567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35567: --- Assignee: shahid > Explain cost is not showing statistics for all the nodes > > > Key: SPARK-35567 > URL: https://issues.apache.org/jira/browse/SPARK-35567 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.0, 3.1.2 >Reporter: shahid >Assignee: shahid >Priority: Minor > Attachments: image-2021-05-31-05-09-09-637.png > > > Explain cost command doesn't show statistics for all the nodes in most of the > TPCDS queries > For eg: Query1 > !image-2021-05-31-05-09-09-637.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35567) Explain cost is not showing statistics for all the nodes
[ https://issues.apache.org/jira/browse/SPARK-35567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35567. - Fix Version/s: 3.2.0 Resolution: Fixed > Explain cost is not showing statistics for all the nodes > > > Key: SPARK-35567 > URL: https://issues.apache.org/jira/browse/SPARK-35567 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 3.0.0, 3.1.2 >Reporter: shahid >Assignee: shahid >Priority: Minor > Fix For: 3.2.0 > > Attachments: image-2021-05-31-05-09-09-637.png > > > Explain cost command doesn't show statistics for all the nodes in most of the > TPCDS queries > For eg: Query1 > !image-2021-05-31-05-09-09-637.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35578) Add a test case for a janino bug
[ https://issues.apache.org/jira/browse/SPARK-35578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354508#comment-17354508 ] Apache Spark commented on SPARK-35578: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/32716 > Add a test case for a janino bug > > > Key: SPARK-35578 > URL: https://issues.apache.org/jira/browse/SPARK-35578 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35578) Add a test case for a janino bug
[ https://issues.apache.org/jira/browse/SPARK-35578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354507#comment-17354507 ] Apache Spark commented on SPARK-35578: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/32716 > Add a test case for a janino bug > > > Key: SPARK-35578 > URL: https://issues.apache.org/jira/browse/SPARK-35578 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35578) Add a test case for a janino bug
[ https://issues.apache.org/jira/browse/SPARK-35578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35578: Assignee: Apache Spark > Add a test case for a janino bug > > > Key: SPARK-35578 > URL: https://issues.apache.org/jira/browse/SPARK-35578 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35578) Add a test case for a janino bug
[ https://issues.apache.org/jira/browse/SPARK-35578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35578: Assignee: (was: Apache Spark) > Add a test case for a janino bug > > > Key: SPARK-35578 > URL: https://issues.apache.org/jira/browse/SPARK-35578 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35579) Fix a bug in janino or work around it in Spark.
[ https://issues.apache.org/jira/browse/SPARK-35579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-35579: Priority: Critical (was: Major) > Fix a bug in janino or work around it in Spark. > --- > > Key: SPARK-35579 > URL: https://issues.apache.org/jira/browse/SPARK-35579 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Critical > > See the test in SPARK-35578 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35576) Redact the sensitive info in the result of Set command
[ https://issues.apache.org/jira/browse/SPARK-35576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-35576: --- Affects Version/s: 3.1.2 > Redact the sensitive info in the result of Set command > -- > > Key: SPARK-35576 > URL: https://issues.apache.org/jira/browse/SPARK-35576 > Project: Spark > Issue Type: Task > Components: Security, SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Currently, the results of following SQL queries are not redacted: > ``` > SET [KEY]; > SET; > ``` > For example: > {code:java} > scala> spark.sql("set javax.jdo.option.ConnectionPassword=123456").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set javax.jdo.option.ConnectionPassword").show() > ++--+ > | key| value| > ++--+ > |javax.jdo.option|123456| > ++--+ > scala> spark.sql("set").show() > +++ > | key| value| > +++ > |javax.jdo.option| 123456| > {code} > We should hide the sensitive information and redact the query output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35579) Fix a bug in janino or work around it in Spark.
Wenchen Fan created SPARK-35579: --- Summary: Fix a bug in janino or work around it in Spark. Key: SPARK-35579 URL: https://issues.apache.org/jira/browse/SPARK-35579 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan See the test in SPARK-35578 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35578) Add a test case for a janino bug
Wenchen Fan created SPARK-35578: --- Summary: Add a test case for a janino bug Key: SPARK-35578 URL: https://issues.apache.org/jira/browse/SPARK-35578 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35557) Adapt uses of JDK 17 Internal APIs
[ https://issues.apache.org/jira/browse/SPARK-35557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated SPARK-35557: - Summary: Adapt uses of JDK 17 Internal APIs (was: Adapt uses of JDK 17 Internal APIs (Unsafe, etc)) > Adapt uses of JDK 17 Internal APIs > -- > > Key: SPARK-35557 > URL: https://issues.apache.org/jira/browse/SPARK-35557 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Ismaël Mejía >Priority: Major > > I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with > Spark 2.12.4 on Java 17 and I found this exception: > {code:java} > java.lang.ExceptionInInitializerError > at org.apache.spark.unsafe.array.ByteArrayMethods. > (ByteArrayMethods.java:54) > at org.apache.spark.internal.config.package$. (package.scala:1149) > at org.apache.spark.SparkConf$. (SparkConf.scala:654) > at org.apache.spark.SparkConf.contains (SparkConf.scala:455) > ... > Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make > private java.nio.DirectByteBuffer(long,int) accessible: module java.base does > not "opens java.nio" to unnamed module @110df513 > at java.lang.reflect.AccessibleObject.checkCanSetAccessible > (AccessibleObject.java:357) > at java.lang.reflect.AccessibleObject.checkCanSetAccessible > (AccessibleObject.java:297) > at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188) > at java.lang.reflect.Constructor.setAccessible (Constructor.java:181) > at org.apache.spark.unsafe.Platform. (Platform.java:56) > at org.apache.spark.unsafe.array.ByteArrayMethods. > (ByteArrayMethods.java:54) > at org.apache.spark.internal.config.package$. (package.scala:1149) > at org.apache.spark.SparkConf$. (SparkConf.scala:654) > at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}} > {code} > It seems that Java 17 will be more strict about uses of JDK Internals > [https://openjdk.java.net/jeps/403] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35557) Adapt uses of JDK 17 Internal APIs (Unsafe, etc)
[ https://issues.apache.org/jira/browse/SPARK-35557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated SPARK-35557: - Description: I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with Spark 2.12.4 on Java 17 and I found this exception: {code:java} java.lang.ExceptionInInitializerError at org.apache.spark.unsafe.array.ByteArrayMethods. (ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$. (package.scala:1149) at org.apache.spark.SparkConf$. (SparkConf.scala:654) at org.apache.spark.SparkConf.contains (SparkConf.scala:455) ... Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module @110df513 at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:357) at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297) at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188) at java.lang.reflect.Constructor.setAccessible (Constructor.java:181) at org.apache.spark.unsafe.Platform. (Platform.java:56) at org.apache.spark.unsafe.array.ByteArrayMethods. (ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$. (package.scala:1149) at org.apache.spark.SparkConf$. (SparkConf.scala:654) at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}} {code} It seems that Java 17 will be more strict about uses of JDK Internals [https://openjdk.java.net/jeps/403] was: I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with Spark 2.13 on Java 17 and I found this exception: {code:java} java.lang.ExceptionInInitializerError at org.apache.spark.unsafe.array.ByteArrayMethods. (ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$. (package.scala:1149) at org.apache.spark.SparkConf$. (SparkConf.scala:654) at org.apache.spark.SparkConf.contains (SparkConf.scala:455) ... Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module @110df513 at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:357) at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297) at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188) at java.lang.reflect.Constructor.setAccessible (Constructor.java:181) at org.apache.spark.unsafe.Platform. (Platform.java:56) at org.apache.spark.unsafe.array.ByteArrayMethods. (ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$. (package.scala:1149) at org.apache.spark.SparkConf$. (SparkConf.scala:654) at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}} {code} It seems that Java 17 will be more strict about uses of JDK Internals [https://openjdk.java.net/jeps/403] > Adapt uses of JDK 17 Internal APIs (Unsafe, etc) > > > Key: SPARK-35557 > URL: https://issues.apache.org/jira/browse/SPARK-35557 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Ismaël Mejía >Priority: Major > > I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with > Spark 2.12.4 on Java 17 and I found this exception: > {code:java} > java.lang.ExceptionInInitializerError > at org.apache.spark.unsafe.array.ByteArrayMethods. > (ByteArrayMethods.java:54) > at org.apache.spark.internal.config.package$. (package.scala:1149) > at org.apache.spark.SparkConf$. (SparkConf.scala:654) > at org.apache.spark.SparkConf.contains (SparkConf.scala:455) > ... > Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make > private java.nio.DirectByteBuffer(long,int) accessible: module java.base does > not "opens java.nio" to unnamed module @110df513 > at java.lang.reflect.AccessibleObject.checkCanSetAccessible > (AccessibleObject.java:357) > at java.lang.reflect.AccessibleObject.checkCanSetAccessible > (AccessibleObject.java:297) > at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188) > at java.lang.reflect.Constructor.setAccessible (Constructor.java:181) > at org.apache.spark.unsafe.Platform. (Platform.java:56) > at org.apache.spark.unsafe.array.ByteArrayMethods. > (ByteArrayMethods.java:54) > at org.apache.spark.internal.config.package$. (package.scala:1149) > at org.apache.spark.SparkConf$. (SparkConf.scala:654) > at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}} > {code} > It seems that Java 17 will be more strict about uses of JDK Internals > [https://openjdk.java.net/jeps/403] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (SPARK-35564) Support subexpression elimination for non-common branches of conditional expressions
[ https://issues.apache.org/jira/browse/SPARK-35564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354447#comment-17354447 ] Adam Binford commented on SPARK-35564: -- >Do you mean "Create a subexpression if an expression will always be evaluated >at least once AND will be evaluated at least once in conditional expression"? Yeah you can think of it that way in terms of adding to existing functionality. I was trying to word it in a way that encompassed existing functionality as well. >And this looks like a corner case, so I'm not sure if it is worth to do this. I don't really think this is much of a corner case, but a common case of using a when expression for data validation. Most of our ETL process comes down to normalizing, cleaning, and validating strings, which at the end of the day usually looks like: {code:java} column = normalize_value(col('my_raw_value')) result = when(column != '', column){code} where "normalize_value" usually involves some combination of regexp_repace's, lower/upper, and trim. And things get worse when you are dealing with arrays of strings and want to minimize your data: {code:java} column = filter(transform(col('my_raw_array_value'), lambda x: normalize_value(x)), lambda x: x != '') result = when(size(column) > 0, column){code} though currently higher order functions are always semantically different so they don't get subexpressions regardless I think. That's something I plan to look into as a follow up. It's natural for users to think that these expressions only get evaluated once, and not that they are doubling their runtime trying to clean their data. To me the edge case is creating a subexpression in this case decreasing throughput. It would require a very large percentage of the rows to not pass the conditional check, since the additional calculation is much more expensive than the additional function call. I'm playing around with an implementation so we'll see how far I can get with it. > Support subexpression elimination for non-common branches of conditional > expressions > > > Key: SPARK-35564 > URL: https://issues.apache.org/jira/browse/SPARK-35564 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Adam Binford >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-7 added support for pulling > subexpressions out of branches of conditional expressions for expressions > present in all branches. We should be able to take this a step further and > pull out subexpressions for any branch, as long as that expression will > definitely be evaluated at least once. > Consider a common data validation example: > {code:java} > from pyspark.sql.functions import * > df = spark.createDataFrame([['word'], ['1234']]) > col = regexp_replace('_1', r'\d', '') > df = df.withColumn('numbers_removed', when(length(col) > 0, col)){code} > We only want to keep the value if it's non-empty with numbers removed, > otherwise we want it to be null. > Because we have no otherwise value, `col` is not a candidate for > subexpression elimination (you can see two regular expression replacements in > the codegen). But whenever the length is greater than 0, we will have to > execute the regular expression replacement twice. Since we know we will > always calculate `col` at least once, it makes sense to consider that as a > subexpression since we might need it again in the branch value. So we can > update the logic from: > Create a subexpression if an expression will always be evaluated at least > twice > To: > Create a subexpression if an expression will always be evaluated at least > once AND will either always or conditionally be evaluated at least twice. > The trade off is potentially another subexpression function call (for split > subexpressions) if the second evaluation doesn't happen, but this seems like > it would be worth it for when it is evaluated the second time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35557) Adapt uses of JDK 17 Internal APIs (Unsafe, etc)
[ https://issues.apache.org/jira/browse/SPARK-35557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated SPARK-35557: - Description: I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with Spark 2.13 on Java 17 and I found this exception: {code:java} java.lang.ExceptionInInitializerError at org.apache.spark.unsafe.array.ByteArrayMethods. (ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$. (package.scala:1149) at org.apache.spark.SparkConf$. (SparkConf.scala:654) at org.apache.spark.SparkConf.contains (SparkConf.scala:455) ... Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module @110df513 at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:357) at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297) at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188) at java.lang.reflect.Constructor.setAccessible (Constructor.java:181) at org.apache.spark.unsafe.Platform. (Platform.java:56) at org.apache.spark.unsafe.array.ByteArrayMethods. (ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$. (package.scala:1149) at org.apache.spark.SparkConf$. (SparkConf.scala:654) at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}} {code} It seems that Java 17 will be more strict about uses of JDK Internals [https://openjdk.java.net/jeps/403] was: I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with Spark 2.13 on Java 17 and I found this exception: {code:borderStyle=solid} java.lang.ExceptionInInitializerError at org.apache.spark.unsafe.array.ByteArrayMethods. (ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$. (package.scala:1149) at org.apache.spark.SparkConf$. (SparkConf.scala:654) at org.apache.spark.SparkConf.contains (SparkConf.scala:455) ... Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module @110df513 at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:357) at java.lang.reflect.AccessibleObject.checkCanSetAccessible (AccessibleObject.java:297) at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188) at java.lang.reflect.Constructor.setAccessible (Constructor.java:181) at org.apache.spark.unsafe.Platform. (Platform.java:56) at org.apache.spark.unsafe.array.ByteArrayMethods. (ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$. (package.scala:1149) at org.apache.spark.SparkConf$. (SparkConf.scala:654) at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}} {code} Not sure if this is the case here but it seems that Java 17 will be more strict about uses of JDK Internals https://openjdk.java.net/jeps/403 > Adapt uses of JDK 17 Internal APIs (Unsafe, etc) > > > Key: SPARK-35557 > URL: https://issues.apache.org/jira/browse/SPARK-35557 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Ismaël Mejía >Priority: Major > > I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with > Spark 2.13 on Java 17 and I found this exception: > {code:java} > java.lang.ExceptionInInitializerError > at org.apache.spark.unsafe.array.ByteArrayMethods. > (ByteArrayMethods.java:54) > at org.apache.spark.internal.config.package$. (package.scala:1149) > at org.apache.spark.SparkConf$. (SparkConf.scala:654) > at org.apache.spark.SparkConf.contains (SparkConf.scala:455) > ... > Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make > private java.nio.DirectByteBuffer(long,int) accessible: module java.base does > not "opens java.nio" to unnamed module @110df513 > at java.lang.reflect.AccessibleObject.checkCanSetAccessible > (AccessibleObject.java:357) > at java.lang.reflect.AccessibleObject.checkCanSetAccessible > (AccessibleObject.java:297) > at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188) > at java.lang.reflect.Constructor.setAccessible (Constructor.java:181) > at org.apache.spark.unsafe.Platform. (Platform.java:56) > at org.apache.spark.unsafe.array.ByteArrayMethods. > (ByteArrayMethods.java:54) > at org.apache.spark.internal.config.package$. (package.scala:1149) > at org.apache.spark.SparkConf$. (SparkConf.scala:654) > at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}} > {code} > It seems that Java 17 will be more strict about uses of JDK Internals > [https://openjdk.java.net/jeps/403] -- This message
[jira] [Updated] (SPARK-35557) Adapt uses of JDK 17 Internal APIs (Unsafe, etc)
[ https://issues.apache.org/jira/browse/SPARK-35557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated SPARK-35557: - Summary: Adapt uses of JDK 17 Internal APIs (Unsafe, etc) (was: Adapt uses of JDK Internal APIs (Unsafe, etc)) > Adapt uses of JDK 17 Internal APIs (Unsafe, etc) > > > Key: SPARK-35557 > URL: https://issues.apache.org/jira/browse/SPARK-35557 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Ismaël Mejía >Priority: Major > > I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with > Spark 2.13 on Java 17 and I found this exception: > {code:borderStyle=solid} > java.lang.ExceptionInInitializerError > at org.apache.spark.unsafe.array.ByteArrayMethods. > (ByteArrayMethods.java:54) > at org.apache.spark.internal.config.package$. (package.scala:1149) > at org.apache.spark.SparkConf$. (SparkConf.scala:654) > at org.apache.spark.SparkConf.contains (SparkConf.scala:455) > ... > Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make > private java.nio.DirectByteBuffer(long,int) accessible: module java.base does > not "opens java.nio" to unnamed module @110df513 > at java.lang.reflect.AccessibleObject.checkCanSetAccessible > (AccessibleObject.java:357) > at java.lang.reflect.AccessibleObject.checkCanSetAccessible > (AccessibleObject.java:297) > at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188) > at java.lang.reflect.Constructor.setAccessible (Constructor.java:181) > at org.apache.spark.unsafe.Platform. (Platform.java:56) > at org.apache.spark.unsafe.array.ByteArrayMethods. > (ByteArrayMethods.java:54) > at org.apache.spark.internal.config.package$. (package.scala:1149) > at org.apache.spark.SparkConf$. (SparkConf.scala:654) > at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}} > {code} > Not sure if this is the case here but it seems that Java 17 will be more > strict about uses of JDK Internals https://openjdk.java.net/jeps/403 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35577) Allow to log container output for docker integration tests
[ https://issues.apache.org/jira/browse/SPARK-35577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354429#comment-17354429 ] Apache Spark commented on SPARK-35577: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32715 > Allow to log container output for docker integration tests > -- > > Key: SPARK-35577 > URL: https://issues.apache.org/jira/browse/SPARK-35577 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current master, docker integration tests don't log their container > output. > If we have container logs, it's useful to debug especially for GA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35577) Allow to log container output for docker integration tests
[ https://issues.apache.org/jira/browse/SPARK-35577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35577: Assignee: Kousuke Saruta (was: Apache Spark) > Allow to log container output for docker integration tests > -- > > Key: SPARK-35577 > URL: https://issues.apache.org/jira/browse/SPARK-35577 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current master, docker integration tests don't log their container > output. > If we have container logs, it's useful to debug especially for GA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35577) Allow to log container output for docker integration tests
[ https://issues.apache.org/jira/browse/SPARK-35577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35577: Assignee: Apache Spark (was: Kousuke Saruta) > Allow to log container output for docker integration tests > -- > > Key: SPARK-35577 > URL: https://issues.apache.org/jira/browse/SPARK-35577 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > In the current master, docker integration tests don't log their container > output. > If we have container logs, it's useful to debug especially for GA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35577) Allow to log container output for docker integration tests
[ https://issues.apache.org/jira/browse/SPARK-35577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354428#comment-17354428 ] Apache Spark commented on SPARK-35577: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32715 > Allow to log container output for docker integration tests > -- > > Key: SPARK-35577 > URL: https://issues.apache.org/jira/browse/SPARK-35577 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the current master, docker integration tests don't log their container > output. > If we have container logs, it's useful to debug especially for GA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org