[jira] [Assigned] (SPARK-31521) The fetch size is not correct when merging blocks into a merged block
[ https://issues.apache.org/jira/browse/SPARK-31521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31521: - Assignee: wuyi > The fetch size is not correct when merging blocks into a merged block > - > > Key: SPARK-31521 > URL: https://issues.apache.org/jira/browse/SPARK-31521 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > When merging blocks into a merged block, we should count the size of that > merged block as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31521) The fetch size is not correct when merging blocks into a merged block
[ https://issues.apache.org/jira/browse/SPARK-31521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31521. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28301 [https://github.com/apache/spark/pull/28301] > The fetch size is not correct when merging blocks into a merged block > - > > Key: SPARK-31521 > URL: https://issues.apache.org/jira/browse/SPARK-31521 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.0.0 > > > When merging blocks into a merged block, we should count the size of that > merged block as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31516) Non-existed metric hiveClientCalls.count of CodeGenerator in Monitoring Doc
[ https://issues.apache.org/jira/browse/SPARK-31516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31516. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28292 [https://github.com/apache/spark/pull/28292] > Non-existed metric hiveClientCalls.count of CodeGenerator in Monitoring Doc > --- > > Key: SPARK-31516 > URL: https://issues.apache.org/jira/browse/SPARK-31516 > Project: Spark > Issue Type: Improvement > Components: Documentation, Spark Core >Affects Versions: 3.0.0 >Reporter: ZHANG Wei >Assignee: ZHANG Wei >Priority: Minor > Fix For: 3.0.0 > > > There is a duplicated `hiveClientCalls.count` metric in both > `namespace=HiveExternalCatalog` and `namespace=CodeGenerator` bullet lists > of [Spark Monitoring > doc|https://spark.apache.org/docs/3.0.0-preview2/monitoring.html#component-instance--executor], > but there is only one inside object HiveCatalogMetrics in [source > code|https://github.com/apache/spark/blob/6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala#L85]. > {quote} * namespace=HiveExternalCatalog > ** *note:*: these metrics are conditional to a configuration parameter: > {{spark.metrics.staticSources.enabled}} (default is true) > ** fileCacheHits.count > ** filesDiscovered.count > ** +{color:#ff}*hiveClientCalls.count*{color}+ > ** parallelListingJobCount.count > ** partitionsFetched.count > * namespace=CodeGenerator > ** *note:*: these metrics are conditional to a configuration parameter: > {{spark.metrics.staticSources.enabled}} (default is true) > ** compilationTime (histogram) > ** generatedClassSize (histogram) > ** generatedMethodSize (histogram) > ** *{color:#ff}+hiveClientCalls.count+{color}* > ** sourceCodeSize (histogram){quote} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31516) Non-existed metric hiveClientCalls.count of CodeGenerator in Monitoring Doc
[ https://issues.apache.org/jira/browse/SPARK-31516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31516: - Assignee: ZHANG Wei > Non-existed metric hiveClientCalls.count of CodeGenerator in Monitoring Doc > --- > > Key: SPARK-31516 > URL: https://issues.apache.org/jira/browse/SPARK-31516 > Project: Spark > Issue Type: Improvement > Components: Documentation, Spark Core >Affects Versions: 3.0.0 >Reporter: ZHANG Wei >Assignee: ZHANG Wei >Priority: Minor > > There is a duplicated `hiveClientCalls.count` metric in both > `namespace=HiveExternalCatalog` and `namespace=CodeGenerator` bullet lists > of [Spark Monitoring > doc|https://spark.apache.org/docs/3.0.0-preview2/monitoring.html#component-instance--executor], > but there is only one inside object HiveCatalogMetrics in [source > code|https://github.com/apache/spark/blob/6550d0d5283efdbbd838f3aeaf0476c7f52a0fb1/core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala#L85]. > {quote} * namespace=HiveExternalCatalog > ** *note:*: these metrics are conditional to a configuration parameter: > {{spark.metrics.staticSources.enabled}} (default is true) > ** fileCacheHits.count > ** filesDiscovered.count > ** +{color:#ff}*hiveClientCalls.count*{color}+ > ** parallelListingJobCount.count > ** partitionsFetched.count > * namespace=CodeGenerator > ** *note:*: these metrics are conditional to a configuration parameter: > {{spark.metrics.staticSources.enabled}} (default is true) > ** compilationTime (histogram) > ** generatedClassSize (histogram) > ** generatedMethodSize (histogram) > ** *{color:#ff}+hiveClientCalls.count+{color}* > ** sourceCodeSize (histogram){quote} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31560) Add V1/V2 tests for TextSuite and WholeTextFileSuite
[ https://issues.apache.org/jira/browse/SPARK-31560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31560. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28335 [https://github.com/apache/spark/pull/28335] > Add V1/V2 tests for TextSuite and WholeTextFileSuite > > > Key: SPARK-31560 > URL: https://issues.apache.org/jira/browse/SPARK-31560 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-20732) Copy cache data when node is being shut down
[ https://issues.apache.org/jira/browse/SPARK-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-20732: --- > Copy cache data when node is being shut down > > > Key: SPARK-20732 > URL: https://issues.apache.org/jira/browse/SPARK-20732 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Holden Karau >Assignee: Prakhar Jain >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20732) Copy cache data when node is being shut down
[ https://issues.apache.org/jira/browse/SPARK-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-20732: -- Fix Version/s: (was: 3.1.0) > Copy cache data when node is being shut down > > > Key: SPARK-20732 > URL: https://issues.apache.org/jira/browse/SPARK-20732 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Holden Karau >Assignee: Prakhar Jain >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31554) Flaky test suite org.apache.spark.sql.hive.thriftserver.CliSuite
[ https://issues.apache.org/jira/browse/SPARK-31554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092012#comment-17092012 ] Jungtaek Lim commented on SPARK-31554: -- There're two existing PRs addressing the test suite: https://github.com/apache/spark/pull/28156 https://github.com/apache/spark/pull/28055 > Flaky test suite org.apache.spark.sql.hive.thriftserver.CliSuite > > > Key: SPARK-31554 > URL: https://issues.apache.org/jira/browse/SPARK-31554 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > The test org.apache.spark.sql.hive.thriftserver.CliSuite fails very often, > for example: > * https://github.com/apache/spark/pull/28328#issuecomment-618992335 > The error message: > {code} > org.apache.spark.sql.hive.thriftserver.CliSuite.SPARK-11188 Analysis error > reporting > Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: Failed with > error line 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;' > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$4(CliSuite.scala:138) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.captureOutput$1(CliSuite.scala:135) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$6(CliSuite.scala:152) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$6$adapted(CliSuite.scala:152) > at scala.sys.process.BasicIO$.readFully$1(BasicIO.scala:188) > at > scala.sys.process.BasicIO$.$anonfun$processFully$1$adapted(BasicIO.scala:192) > at > org.apache.spark.sql.test.ProcessTestUtils$ProcessOutputCapturer.run(ProcessTestUtils.scala:30) > {code} > * https://github.com/apache/spark/pull/28261#issuecomment-618950225 > * https://github.com/apache/spark/pull/28261#issuecomment-618950225 > * https://github.com/apache/spark/pull/27617#issuecomment-614318644 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31561) Add QUALIFY Clause
Yuming Wang created SPARK-31561: --- Summary: Add QUALIFY Clause Key: SPARK-31561 URL: https://issues.apache.org/jira/browse/SPARK-31561 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang In a SELECT statement, the QUALIFY clause filters the results of window functions. QUALIFY does with window functions what HAVING does with aggregate functions and GROUP BY clauses. In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed. Examples: https://docs.snowflake.com/en/sql-reference/constructs/qualify.html#examples More details: https://docs.snowflake.com/en/sql-reference/constructs/qualify.html https://docs.teradata.com/reader/2_MC9vCtAJRlKle2Rpb0mA/19NnI91neorAi7LX6SJXBw -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31560) Add V1/V2 tests for TextSuite and WholeTextFileSuite
Gengliang Wang created SPARK-31560: -- Summary: Add V1/V2 tests for TextSuite and WholeTextFileSuite Key: SPARK-31560 URL: https://issues.apache.org/jira/browse/SPARK-31560 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0, 3.0.1, 3.1.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31559) AM starts with initial fetched tokens in any attempt
Jungtaek Lim created SPARK-31559: Summary: AM starts with initial fetched tokens in any attempt Key: SPARK-31559 URL: https://issues.apache.org/jira/browse/SPARK-31559 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 3.0.0 Reporter: Jungtaek Lim The issue is only occurred in yarn-cluster mode. Submitter will obtain delegation tokens for yarn-cluster mode, and add these credentials to the launch context. AM will be launched with these credentials, and AM and driver are able to leverage these tokens. In Yarn cluster mode, driver is launched in AM, which in turn initializes token manager (while initializing SparkContext) and obtain delegation tokens (+ schedule to renew) if both principal and keytab are available. That said, even we provide principal and keytab to run application with yarn-cluster mode, AM always starts with initial tokens from launch context until token manager runs and obtains delegation tokens. So there's a "gap", and if user codes (driver) access to external system with delegation tokens (e.g. HDFS) before initializing SparkContext, it cannot leverage the tokens token manager will obtain. It will make the application fail if AM is killed "after" the initial tokens are expired and relaunched. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31552. --- Fix Version/s: 3.0.0 Assignee: Kent Yao Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/28324 > Fix potential ClassCastException in ScalaReflection arrayClassFor > - > > Key: SPARK-31552 > URL: https://issues.apache.org/jira/browse/SPARK-31552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.0.0 > > > arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, > the cases in dataTypeFor are not fully handled in arrayClassFor > For example: > {code:java} > scala> import scala.reflect.runtime.universe.TypeTag > scala> import org.apache.spark.sql._ > scala> import org.apache.spark.sql.catalyst.encoders._ > scala> import org.apache.spark.sql.types._ > scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = > ExpressionEncoder() > newArrayEncoder: [T <: Array[_]](implicit evidence$1: > reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] > scala> val decOne = Decimal(1, 38, 18) > decOne: org.apache.spark.sql.types.Decimal = 1E-18 > scala> val decTwo = Decimal(2, 38, 18) > decTwo: org.apache.spark.sql.types.Decimal = 2E-18 > scala> val decSpark = Array(decOne, decTwo) > decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) > scala> Seq(decSpark).toDF() > java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot > be cast to org.apache.spark.sql.types.ObjectType > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) > at newArrayEncoder(:57) > ... 53 elided > scala> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31533) Enable DB2IntegrationSuite test and upgrade the DB2 docker inside
[ https://issues.apache.org/jira/browse/SPARK-31533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31533: - Assignee: Gabor Somogyi > Enable DB2IntegrationSuite test and upgrade the DB2 docker inside > - > > Key: SPARK-31533 > URL: https://issues.apache.org/jira/browse/SPARK-31533 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31533) Enable DB2IntegrationSuite test and upgrade the DB2 docker inside
[ https://issues.apache.org/jira/browse/SPARK-31533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31533. --- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 28325 [https://github.com/apache/spark/pull/28325] > Enable DB2IntegrationSuite test and upgrade the DB2 docker inside > - > > Key: SPARK-31533 > URL: https://issues.apache.org/jira/browse/SPARK-31533 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31546) Backport SPARK-25595 Ignore corrupt Avro file if flag IGNORE_CORRUPT_FILES enabled
[ https://issues.apache.org/jira/browse/SPARK-31546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091991#comment-17091991 ] Gengliang Wang commented on SPARK-31546: I have created backport PR for this: https://github.com/apache/spark/pull/28334 > Backport SPARK-25595 Ignore corrupt Avro file if flag > IGNORE_CORRUPT_FILES enabled > > > Key: SPARK-31546 > URL: https://issues.apache.org/jira/browse/SPARK-31546 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.6 >Reporter: Holden Karau >Priority: Major > > Backport SPARK-25595 Ignore corrupt Avro file if flag > IGNORE_CORRUPT_FILES enabled > cc [~Gengliang.Wang]& [~hyukjin.kwon] for comments -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31558) Code cleanup in spark-sql-viz.js
[ https://issues.apache.org/jira/browse/SPARK-31558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-31558: --- Summary: Code cleanup in spark-sql-viz.js (was: Code clean up in spark-sql-viz.js) > Code cleanup in spark-sql-viz.js > > > Key: SPARK-31558 > URL: https://issues.apache.org/jira/browse/SPARK-31558 > Project: Spark > Issue Type: Task > Components: Web UI >Affects Versions: 3.0.0, 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > 1. Remove console.log(), which seems unnecessary in release. > 2. Replace the double equals to triple equals > 3. Reuse jquery selector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31558) Code clean up in spark-sql-viz.js
Gengliang Wang created SPARK-31558: -- Summary: Code clean up in spark-sql-viz.js Key: SPARK-31558 URL: https://issues.apache.org/jira/browse/SPARK-31558 Project: Spark Issue Type: Task Components: Web UI Affects Versions: 3.0.0, 3.1.0 Reporter: Gengliang Wang Assignee: Gengliang Wang 1. Remove console.log(), which seems unnecessary in release. 2. Replace the double equals to triple equals 3. Reuse jquery selector. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31557) Legacy parser incorrectly interprets pre-Gregorian dates
Bruce Robbins created SPARK-31557: - Summary: Legacy parser incorrectly interprets pre-Gregorian dates Key: SPARK-31557 URL: https://issues.apache.org/jira/browse/SPARK-31557 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0, 3.1.0 Reporter: Bruce Robbins With CSV: {noformat} scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY") res0: org.apache.spark.sql.DataFrame = [key: string, value: string] scala> val seq = Seq("0002-01-01", "1000-01-01", "1500-01-01", "1800-01-01").map(x => s"$x,$x") seq: Seq[String] = List(0002-01-01,0002-01-01, 1000-01-01,1000-01-01, 1500-01-01,1500-01-01, 1800-01-01,1800-01-01) scala> val ds = seq.toDF("value").as[String] ds: org.apache.spark.sql.Dataset[String] = [value: string] scala> spark.read.schema("expected STRING, actual DATE").csv(ds).show +--+--+ | expected|actual| +--+--+ |0002-01-01|0001-12-30| |1000-01-01|1000-01-06| |1500-01-01|1500-01-10| |1800-01-01|1800-01-01| +--+--+ scala> {noformat} Similarly, with JSON: {noformat} scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY") res0: org.apache.spark.sql.DataFrame = [key: string, value: string] scala> val seq = Seq("0002-01-01", "1000-01-01", "1500-01-01", "1800-01-01").map { x => s"""{"expected": "$x", "actual": "$x"}""" } | | seq: Seq[String] = List({"expected": "0002-01-01", "actual": "0002-01-01"}, {"expected": "1000-01-01", "actual": "1000-01-01"}, {"expected": "1500-01-01", "actual": "1500-01-01"}, {"expected": "1800-01-01", "actual": "1800-01-01"}) scala> scala> val ds = seq.toDF("value").as[String] ds: org.apache.spark.sql.Dataset[String] = [value: string] scala> spark.read.schema("expected STRING, actual DATE").json(ds).show +--+--+ | expected|actual| +--+--+ |0002-01-01|0001-12-30| |1000-01-01|1000-01-06| |1500-01-01|1500-01-10| |1800-01-01|1800-01-01| +--+--+ scala> {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31491) Re-arrange Data Types page to document Floating Point Special Values
[ https://issues.apache.org/jira/browse/SPARK-31491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-31491. -- Fix Version/s: 3.0.0 Assignee: Huaxin Gao Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28264 > Re-arrange Data Types page to document Floating Point Special Values > > > Key: SPARK-31491 > URL: https://issues.apache.org/jira/browse/SPARK-31491 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31532) SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession
[ https://issues.apache.org/jira/browse/SPARK-31532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-31532. -- Fix Version/s: 2.4.6 Assignee: Kent Yao Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28316 > SparkSessionBuilder shoud not propagate static sql configurations to the > existing active/default SparkSession > - > > Key: SPARK-31532 > URL: https://issues.apache.org/jira/browse/SPARK-31532 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 2.4.6 > > > Clearly, this is a bug. > {code:java} > scala> spark.sql("set spark.sql.warehouse.dir").show > +++ > | key| value| > +++ > |spark.sql.warehou...|file:/Users/kenty...| > +++ > scala> spark.sql("set spark.sql.warehouse.dir=2"); > org.apache.spark.sql.AnalysisException: Cannot modify the value of a static > config: spark.sql.warehouse.dir; > at > org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154) > at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) > at > org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100) > at > org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642) > at org.apache.spark.sql.Dataset.(Dataset.scala:229) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) > ... 47 elided > scala> import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.SparkSession > scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get > getClass getOrCreate > scala> SparkSession.builder.config("spark.sql.warehouse.dir", > "xyz").getOrCreate > 20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing SparkSession; > some configuration may not take effect. > res7: org.apache.spark.sql.SparkSession = > org.apache.spark.sql.SparkSession@6403d574 > scala> spark.sql("set spark.sql.warehouse.dir").show > ++-+ > | key|value| > ++-+ > |spark.sql.warehou...| xyz| > ++-+ > scala> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31556) Document LIKE clause in SQL Reference
[ https://issues.apache.org/jira/browse/SPARK-31556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091965#comment-17091965 ] Huaxin Gao commented on SPARK-31556: https://github.com/apache/spark/pull/28332 > Document LIKE clause in SQL Reference > - > > Key: SPARK-31556 > URL: https://issues.apache.org/jira/browse/SPARK-31556 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Priority: Minor > > Document LIKE clause in SQL Reference. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31556) Document LIKE clause in SQL Reference
Huaxin Gao created SPARK-31556: -- Summary: Document LIKE clause in SQL Reference Key: SPARK-31556 URL: https://issues.apache.org/jira/browse/SPARK-31556 Project: Spark Issue Type: Sub-task Components: Documentation, SQL Affects Versions: 3.0.0 Reporter: Huaxin Gao Document LIKE clause in SQL Reference. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31364) Benchmark Nested Parquet Predicate Pushdown
[ https://issues.apache.org/jira/browse/SPARK-31364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-31364. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28319 [https://github.com/apache/spark/pull/28319] > Benchmark Nested Parquet Predicate Pushdown > --- > > Key: SPARK-31364 > URL: https://issues.apache.org/jira/browse/SPARK-31364 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: DB Tsai >Priority: Major > Fix For: 3.0.0 > > > We would like to benchmark best and worst scenarios such as no record matches > the predicate, and how much extra overhead is added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31364) Benchmark Nested Parquet Predicate Pushdown
[ https://issues.apache.org/jira/browse/SPARK-31364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-31364: Summary: Benchmark Nested Parquet Predicate Pushdown (was: Benchmark Parquet Predicate Pushdown) > Benchmark Nested Parquet Predicate Pushdown > --- > > Key: SPARK-31364 > URL: https://issues.apache.org/jira/browse/SPARK-31364 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: DB Tsai >Priority: Major > > We would like to benchmark best and worst scenarios such as no record matches > the predicate, and how much extra overhead is added. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31377) Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite
[ https://issues.apache.org/jira/browse/SPARK-31377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivas Rishindra Pothireddi updated SPARK-31377: -- Description: For some combinations of join algorithm and join types there are no unit tests for the "number of output rows" metric. A list of missing unit tests include the following. * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi * BroadcastNestedLoopJoin: RightOuter * BroadcastHashJoin: LeftAnti was: For some combinations of join algorithm and join types there are no unit tests for the "number of output rows" metric. A list of missing unit tests include the following. * SortMergeJoin: ExistenceJoin * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi, ExistenseJoin * BroadcastNestedLoopJoin: RightOuter, InnerJoin, ExistenceJoin * BroadcastHashJoin: LeftAnti, ExistenceJoin > Add unit tests for "number of output rows" metric for joins in SQLMetricsSuite > -- > > Key: SPARK-31377 > URL: https://issues.apache.org/jira/browse/SPARK-31377 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Srinivas Rishindra Pothireddi >Priority: Minor > > For some combinations of join algorithm and join types there are no unit > tests for the "number of output rows" metric. > A list of missing unit tests include the following. > * ShuffledHashJoin: leftOuter, RightOuter, LeftAnti, LeftSemi > * BroadcastNestedLoopJoin: RightOuter > * BroadcastHashJoin: LeftAnti -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31500) collect_set() of BinaryType returns duplicate elements
[ https://issues.apache.org/jira/browse/SPARK-31500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091833#comment-17091833 ] Pablo Langa Blanco commented on SPARK-31500: Hi [~ewasserman], This is a scala base problem, equality between arrays is not behaving as expected. [https://blog.bruchez.name/2013/05/scala-array-comparison-without-phd.html] I'm going to work to find a solution, but here is a workaround, change the definition of the case class and put Seq instead of Array and it will work as expected. {code:java} case class R(id: String, value: String, bytes: Seq[Byte]){code} > collect_set() of BinaryType returns duplicate elements > -- > > Key: SPARK-31500 > URL: https://issues.apache.org/jira/browse/SPARK-31500 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 2.4.5 >Reporter: Eric Wasserman >Priority: Major > > The collect_set() aggregate function should produce a set of distinct > elements. When the column argument's type is BinayType this is not the case. > > Example: > {{import org.apache.spark.sql.functions._}} > {{import org.apache.spark.sql.expressions.Window}} > {{case class R(id: String, value: String, bytes: Array[Byte])}} > {{def makeR(id: String, value: String) = R(id, value, value.getBytes)}} > {{val df = Seq(makeR("a", "dog"), makeR("a", "cat"), makeR("a", "cat"), > makeR("b", "fish")).toDF()}} > > {{// In the example below "bytesSet" erroneously has duplicates but > "stringSet" does not (as expected).}} > {{df.agg(collect_set('value) as "stringSet", collect_set('bytes) as > "byteSet").show(truncate=false)}} > > {{// The same problem is displayed when using window functions.}} > {{val win = Window.partitionBy('id).rowsBetween(Window.unboundedPreceding, > Window.unboundedFollowing)}} > {{val result = df.select(}} > collect_set('value).over(win) as "stringSet", > collect_set('bytes).over(win) as "bytesSet" > {{)}} > {{.select('stringSet, 'bytesSet, size('stringSet) as "stringSetSize", > size('bytesSet) as "bytesSetSize")}} > {{.show()}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20732) Copy cache data when node is being shut down
[ https://issues.apache.org/jira/browse/SPARK-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau resolved SPARK-20732. -- Fix Version/s: 3.1.0 Target Version/s: 3.1.0 Resolution: Fixed Fixed, thank you! > Copy cache data when node is being shut down > > > Key: SPARK-20732 > URL: https://issues.apache.org/jira/browse/SPARK-20732 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Holden Karau >Assignee: Prakhar Jain >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20732) Copy cache data when node is being shut down
[ https://issues.apache.org/jira/browse/SPARK-20732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau reassigned SPARK-20732: Assignee: Prakhar Jain > Copy cache data when node is being shut down > > > Key: SPARK-20732 > URL: https://issues.apache.org/jira/browse/SPARK-20732 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Holden Karau >Assignee: Prakhar Jain >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31555) Improve cache block migration
Holden Karau created SPARK-31555: Summary: Improve cache block migration Key: SPARK-31555 URL: https://issues.apache.org/jira/browse/SPARK-31555 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.1.0 Reporter: Holden Karau We should explore the following improvements to cache block migration: 1) Peer selection (right now may overbalance on certain peers) 2) Do we need to configure the number of blocks to be migrated at the same time 3) Do we want to prioritize migrating blocks with no replicas -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31007) KMeans optimization based on triangle-inequality
[ https://issues.apache.org/jira/browse/SPARK-31007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31007. -- Fix Version/s: 3.1.0 Resolution: Fixed Resolved by https://github.com/apache/spark/pull/27758 > KMeans optimization based on triangle-inequality > > > Key: SPARK-31007 > URL: https://issues.apache.org/jira/browse/SPARK-31007 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > Fix For: 3.1.0 > > Attachments: ICML03-022.pdf > > > In current impl, following Lemma is used in KMeans: > 0, Let x be a point, let b be a center and o be the origin, then d(x,c) >= > |(d(x,o) - d(c,o))| = |norm(x)-norm(c)| > this can be applied in {{EuclideanDistance}}, but not in {{CosineDistance}} > According to [Using the Triangle Inequality to Accelerate > K-Means|[https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf]], we can go > futher, and there are another two Lemmas can be used: > 1, Let x be a point, and let b and c be centers. If d(b,c)>=2d(x,b) then > d(x,c) >= d(x,b); > this can be applied in {{EuclideanDistance}}, but not in {{CosineDistance}}. > However, luckily for CosineDistance we can get a variant in the space of > radian/angle. > 2, Let x be a point, and let b and c be centers. Then d(x,c) >= max\{0, > d(x,b)-d(b,c)}; > this can be applied in {{EuclideanDistance}}, but not in {{CosineDistance}} > The application of Lemma 2 is a little complex: It need to cache/update the > distance/lower bounds to previous centers, and thus can be only applied in > training, not usable in prediction. > So this ticket is mainly for Lemma 1. Its idea is quite simple, if point x is > close to center b enough (less than a pre-computed radius), then we can say > point x belong to center c without computing the distances between x and > other centers. It can be used in both training and predction. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31539) Backport SPARK-27138 Remove AdminUtils calls (fixes deprecation)
[ https://issues.apache.org/jira/browse/SPARK-31539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091705#comment-17091705 ] Dylan Guedes commented on SPARK-31539: -- Agreed, I think it is not worth it. > Backport SPARK-27138 Remove AdminUtils calls (fixes deprecation) > -- > > Key: SPARK-31539 > URL: https://issues.apache.org/jira/browse/SPARK-31539 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 2.4.6 >Reporter: Holden Karau >Priority: Major > > SPARK-27138 Remove AdminUtils calls (fixes deprecation) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31538) Backport SPARK-25338 Ensure to call super.beforeAll() and super.afterAll() in test cases
[ https://issues.apache.org/jira/browse/SPARK-31538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091699#comment-17091699 ] Kazuaki Ishizaki commented on SPARK-31538: -- We could backport this. On the other hand, this is not a bug fix. As far as I know, this change does not find new issues immediately. If we have already found problems related to this, they should have been backported to the 2.4 branch. I think that this is a nice-to-have in the maintenance branch. > Backport SPARK-25338 Ensure to call super.beforeAll() and > super.afterAll() in test cases > -- > > Key: SPARK-31538 > URL: https://issues.apache.org/jira/browse/SPARK-31538 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.4.6 >Reporter: Holden Karau >Priority: Major > > Backport SPARK-25338 Ensure to call super.beforeAll() and > super.afterAll() in test cases -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31553) Wrong result of isInCollection for large collections
[ https://issues.apache.org/jira/browse/SPARK-31553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31553: -- Labels: correctness (was: ) > Wrong result of isInCollection for large collections > > > Key: SPARK-31553 > URL: https://issues.apache.org/jira/browse/SPARK-31553 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maxim Gekk >Priority: Major > Labels: correctness > > If the size of a collection passed to isInCollection is bigger than > spark.sql.optimizer.inSetConversionThreshold, the method can return wrong > results for some inputs. For example: > {code:scala} > val set = (0 to 20).map(_.toString).toSet > val data = Seq("1").toDF("x") > println(set.contains("1")) > data.select($"x".isInCollection(set).as("isInCollection")).show() > {code} > {code} > true > +--+ > |isInCollection| > +--+ > | false| > +--+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31554) Flaky test suite org.apache.spark.sql.hive.thriftserver.CliSuite
[ https://issues.apache.org/jira/browse/SPARK-31554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091636#comment-17091636 ] Wenchen Fan commented on SPARK-31554: - [~Qin Yao] do you have any clue? > Flaky test suite org.apache.spark.sql.hive.thriftserver.CliSuite > > > Key: SPARK-31554 > URL: https://issues.apache.org/jira/browse/SPARK-31554 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > The test org.apache.spark.sql.hive.thriftserver.CliSuite fails very often, > for example: > * https://github.com/apache/spark/pull/28328#issuecomment-618992335 > The error message: > {code} > org.apache.spark.sql.hive.thriftserver.CliSuite.SPARK-11188 Analysis error > reporting > Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: Failed with > error line 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;' > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$4(CliSuite.scala:138) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.captureOutput$1(CliSuite.scala:135) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$6(CliSuite.scala:152) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$6$adapted(CliSuite.scala:152) > at scala.sys.process.BasicIO$.readFully$1(BasicIO.scala:188) > at > scala.sys.process.BasicIO$.$anonfun$processFully$1$adapted(BasicIO.scala:192) > at > org.apache.spark.sql.test.ProcessTestUtils$ProcessOutputCapturer.run(ProcessTestUtils.scala:30) > {code} > * https://github.com/apache/spark/pull/28261#issuecomment-618950225 > * https://github.com/apache/spark/pull/28261#issuecomment-618950225 > * https://github.com/apache/spark/pull/27617#issuecomment-614318644 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091633#comment-17091633 ] Dongjoon Hyun commented on SPARK-31552: --- Hi, [~Qin Yao]. I updated the Affected Version by adding 2.0.2 ~ 2.4.5. > Fix potential ClassCastException in ScalaReflection arrayClassFor > - > > Key: SPARK-31552 > URL: https://issues.apache.org/jira/browse/SPARK-31552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, > the cases in dataTypeFor are not fully handled in arrayClassFor > For example: > {code:java} > scala> import scala.reflect.runtime.universe.TypeTag > scala> import org.apache.spark.sql._ > scala> import org.apache.spark.sql.catalyst.encoders._ > scala> import org.apache.spark.sql.types._ > scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = > ExpressionEncoder() > newArrayEncoder: [T <: Array[_]](implicit evidence$1: > reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] > scala> val decOne = Decimal(1, 38, 18) > decOne: org.apache.spark.sql.types.Decimal = 1E-18 > scala> val decTwo = Decimal(2, 38, 18) > decTwo: org.apache.spark.sql.types.Decimal = 2E-18 > scala> val decSpark = Array(decOne, decTwo) > decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) > scala> Seq(decSpark).toDF() > java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot > be cast to org.apache.spark.sql.types.ObjectType > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) > at newArrayEncoder(:57) > ... 53 elided > scala> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31552: -- Affects Version/s: 2.0.2 2.1.3 > Fix potential ClassCastException in ScalaReflection arrayClassFor > - > > Key: SPARK-31552 > URL: https://issues.apache.org/jira/browse/SPARK-31552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, > the cases in dataTypeFor are not fully handled in arrayClassFor > For example: > {code:java} > scala> import scala.reflect.runtime.universe.TypeTag > scala> import org.apache.spark.sql._ > scala> import org.apache.spark.sql.catalyst.encoders._ > scala> import org.apache.spark.sql.types._ > scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = > ExpressionEncoder() > newArrayEncoder: [T <: Array[_]](implicit evidence$1: > reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] > scala> val decOne = Decimal(1, 38, 18) > decOne: org.apache.spark.sql.types.Decimal = 1E-18 > scala> val decTwo = Decimal(2, 38, 18) > decTwo: org.apache.spark.sql.types.Decimal = 2E-18 > scala> val decSpark = Array(decOne, decTwo) > decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) > scala> Seq(decSpark).toDF() > java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot > be cast to org.apache.spark.sql.types.ObjectType > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) > at newArrayEncoder(:57) > ... 53 elided > scala> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31552: -- Affects Version/s: 2.2.3 > Fix potential ClassCastException in ScalaReflection arrayClassFor > - > > Key: SPARK-31552 > URL: https://issues.apache.org/jira/browse/SPARK-31552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, > the cases in dataTypeFor are not fully handled in arrayClassFor > For example: > {code:java} > scala> import scala.reflect.runtime.universe.TypeTag > scala> import org.apache.spark.sql._ > scala> import org.apache.spark.sql.catalyst.encoders._ > scala> import org.apache.spark.sql.types._ > scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = > ExpressionEncoder() > newArrayEncoder: [T <: Array[_]](implicit evidence$1: > reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] > scala> val decOne = Decimal(1, 38, 18) > decOne: org.apache.spark.sql.types.Decimal = 1E-18 > scala> val decTwo = Decimal(2, 38, 18) > decTwo: org.apache.spark.sql.types.Decimal = 2E-18 > scala> val decSpark = Array(decOne, decTwo) > decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) > scala> Seq(decSpark).toDF() > java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot > be cast to org.apache.spark.sql.types.ObjectType > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) > at newArrayEncoder(:57) > ... 53 elided > scala> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31552: -- Affects Version/s: 2.3.4 > Fix potential ClassCastException in ScalaReflection arrayClassFor > - > > Key: SPARK-31552 > URL: https://issues.apache.org/jira/browse/SPARK-31552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, > the cases in dataTypeFor are not fully handled in arrayClassFor > For example: > {code:java} > scala> import scala.reflect.runtime.universe.TypeTag > scala> import org.apache.spark.sql._ > scala> import org.apache.spark.sql.catalyst.encoders._ > scala> import org.apache.spark.sql.types._ > scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = > ExpressionEncoder() > newArrayEncoder: [T <: Array[_]](implicit evidence$1: > reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] > scala> val decOne = Decimal(1, 38, 18) > decOne: org.apache.spark.sql.types.Decimal = 1E-18 > scala> val decTwo = Decimal(2, 38, 18) > decTwo: org.apache.spark.sql.types.Decimal = 2E-18 > scala> val decSpark = Array(decOne, decTwo) > decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) > scala> Seq(decSpark).toDF() > java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot > be cast to org.apache.spark.sql.types.ObjectType > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) > at newArrayEncoder(:57) > ... 53 elided > scala> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31552: -- Description: arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, the cases in dataTypeFor are not fully handled in arrayClassFor For example: {code:java} scala> import scala.reflect.runtime.universe.TypeTag scala> import org.apache.spark.sql._ scala> import org.apache.spark.sql.catalyst.encoders._ scala> import org.apache.spark.sql.types._ scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = ExpressionEncoder() newArrayEncoder: [T <: Array[_]](implicit evidence$1: reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] scala> val decOne = Decimal(1, 38, 18) decOne: org.apache.spark.sql.types.Decimal = 1E-18 scala> val decTwo = Decimal(2, 38, 18) decTwo: org.apache.spark.sql.types.Decimal = 2E-18 scala> val decSpark = Array(decOne, decTwo) decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) scala> Seq(decSpark).toDF() java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot be cast to org.apache.spark.sql.types.ObjectType at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) at newArrayEncoder(:57) ... 53 elided scala> {code} was: arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, the cases in dataTypeFor are not fully handled in arrayClassFor For example: {code:java} scala> import scala.reflect.runtime.universe.TypeTag scala> import org.apache.spark.sql._ scala> import org.apache.spark.sql.catalyst.encoders._ scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = ExpressionEncoder() newArrayEncoder: [T <: Array[_]](implicit evidence$1: reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] scala> val decOne = Decimal(1, 38, 18) decOne: org.apache.spark.sql.types.Decimal = 1E-18 scala> val decTwo = Decimal(2, 38, 18) decTwo: org.apache.spark.sql.types.Decimal = 2E-18 scala> val decSpark = Array(decOne, decTwo) decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) scala> Seq(decSpark).toDF() java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot be cast to org.apache.spark.sql.types.ObjectType at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) at
[jira] [Updated] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31552: -- Affects Version/s: 2.4.5 > Fix potential ClassCastException in ScalaReflection arrayClassFor > - > > Key: SPARK-31552 > URL: https://issues.apache.org/jira/browse/SPARK-31552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, > the cases in dataTypeFor are not fully handled in arrayClassFor > For example: > {code:java} > scala> import scala.reflect.runtime.universe.TypeTag > scala> import org.apache.spark.sql._ > scala> import org.apache.spark.sql.catalyst.encoders._ > scala> import org.apache.spark.sql.types._ > scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = > ExpressionEncoder() > newArrayEncoder: [T <: Array[_]](implicit evidence$1: > reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] > scala> val decOne = Decimal(1, 38, 18) > decOne: org.apache.spark.sql.types.Decimal = 1E-18 > scala> val decTwo = Decimal(2, 38, 18) > decTwo: org.apache.spark.sql.types.Decimal = 2E-18 > scala> val decSpark = Array(decOne, decTwo) > decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) > scala> Seq(decSpark).toDF() > java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot > be cast to org.apache.spark.sql.types.ObjectType > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) > at > org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) > at > scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) > at > org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) > at > org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) > at > org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) > at newArrayEncoder(:57) > ... 53 elided > scala> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31552: -- Description: arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, the cases in dataTypeFor are not fully handled in arrayClassFor For example: {code:java} scala> import scala.reflect.runtime.universe.TypeTag scala> import org.apache.spark.sql._ scala> import org.apache.spark.sql.catalyst.encoders._ scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = ExpressionEncoder() newArrayEncoder: [T <: Array[_]](implicit evidence$1: reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] scala> val decOne = Decimal(1, 38, 18) decOne: org.apache.spark.sql.types.Decimal = 1E-18 scala> val decTwo = Decimal(2, 38, 18) decTwo: org.apache.spark.sql.types.Decimal = 2E-18 scala> val decSpark = Array(decOne, decTwo) decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) scala> Seq(decSpark).toDF() java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot be cast to org.apache.spark.sql.types.ObjectType at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) at newArrayEncoder(:57) ... 53 elided scala> {code} was: arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, the cases in dataTypeFor are not fully handled in arrayClassFor For example: {code:java} import scala.reflect.runtime.universe.TypeTag import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.encoders._ {code:java} {code:java} scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = ExpressionEncoder() newArrayEncoder: [T <: Array[_]](implicit evidence$1: reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] scala> val decOne = Decimal(1, 38, 18) decOne: org.apache.spark.sql.types.Decimal = 1E-18 scala> val decTwo = Decimal(2, 38, 18) decTwo: org.apache.spark.sql.types.Decimal = 2E-18 scala> val decSpark = Array(decOne, decTwo) decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) scala> Seq(decSpark).toDF() java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot be cast to org.apache.spark.sql.types.ObjectType at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at
[jira] [Updated] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
[ https://issues.apache.org/jira/browse/SPARK-31552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31552: -- Description: arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, the cases in dataTypeFor are not fully handled in arrayClassFor For example: {code:java} import scala.reflect.runtime.universe.TypeTag import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.encoders._ {code:java} {code:java} scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = ExpressionEncoder() newArrayEncoder: [T <: Array[_]](implicit evidence$1: reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] scala> val decOne = Decimal(1, 38, 18) decOne: org.apache.spark.sql.types.Decimal = 1E-18 scala> val decTwo = Decimal(2, 38, 18) decTwo: org.apache.spark.sql.types.Decimal = 2E-18 scala> val decSpark = Array(decOne, decTwo) decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) scala> Seq(decSpark).toDF() java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot be cast to org.apache.spark.sql.types.ObjectType at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) at newArrayEncoder(:57) ... 53 elided scala> {code} was: arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, the cases in dataTypeFor are not fully handled in arrayClassFor For example: {code:java} scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = ExpressionEncoder() newArrayEncoder: [T <: Array[_]](implicit evidence$1: reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] scala> val decOne = Decimal(1, 38, 18) decOne: org.apache.spark.sql.types.Decimal = 1E-18 scala> val decTwo = Decimal(2, 38, 18) decTwo: org.apache.spark.sql.types.Decimal = 2E-18 scala> val decSpark = Array(decOne, decTwo) decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) scala> Seq(decSpark).toDF() java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot be cast to org.apache.spark.sql.types.ObjectType at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at
[jira] [Commented] (SPARK-31554) Flaky test suite org.apache.spark.sql.hive.thriftserver.CliSuite
[ https://issues.apache.org/jira/browse/SPARK-31554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091614#comment-17091614 ] Maxim Gekk commented on SPARK-31554: [~cloud_fan] [~hyukjin.kwon] Can I we disable the flaky test till someone makes it stable? > Flaky test suite org.apache.spark.sql.hive.thriftserver.CliSuite > > > Key: SPARK-31554 > URL: https://issues.apache.org/jira/browse/SPARK-31554 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > The test org.apache.spark.sql.hive.thriftserver.CliSuite fails very often, > for example: > * https://github.com/apache/spark/pull/28328#issuecomment-618992335 > The error message: > {code} > org.apache.spark.sql.hive.thriftserver.CliSuite.SPARK-11188 Analysis error > reporting > Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: Failed with > error line 'Exception in thread "main" > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: > Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;' > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$4(CliSuite.scala:138) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.captureOutput$1(CliSuite.scala:135) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$6(CliSuite.scala:152) > at > org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$6$adapted(CliSuite.scala:152) > at scala.sys.process.BasicIO$.readFully$1(BasicIO.scala:188) > at > scala.sys.process.BasicIO$.$anonfun$processFully$1$adapted(BasicIO.scala:192) > at > org.apache.spark.sql.test.ProcessTestUtils$ProcessOutputCapturer.run(ProcessTestUtils.scala:30) > {code} > * https://github.com/apache/spark/pull/28261#issuecomment-618950225 > * https://github.com/apache/spark/pull/28261#issuecomment-618950225 > * https://github.com/apache/spark/pull/27617#issuecomment-614318644 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31554) Flaky test suite org.apache.spark.sql.hive.thriftserver.CliSuite
Maxim Gekk created SPARK-31554: -- Summary: Flaky test suite org.apache.spark.sql.hive.thriftserver.CliSuite Key: SPARK-31554 URL: https://issues.apache.org/jira/browse/SPARK-31554 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk The test org.apache.spark.sql.hive.thriftserver.CliSuite fails very often, for example: * https://github.com/apache/spark/pull/28328#issuecomment-618992335 The error message: {code} org.apache.spark.sql.hive.thriftserver.CliSuite.SPARK-11188 Analysis error reporting Caused by: sbt.ForkMain$ForkError: java.lang.RuntimeException: Failed with error line 'Exception in thread "main" org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;' at org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$4(CliSuite.scala:138) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.hive.thriftserver.CliSuite.captureOutput$1(CliSuite.scala:135) at org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$6(CliSuite.scala:152) at org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$runCliWithin$6$adapted(CliSuite.scala:152) at scala.sys.process.BasicIO$.readFully$1(BasicIO.scala:188) at scala.sys.process.BasicIO$.$anonfun$processFully$1$adapted(BasicIO.scala:192) at org.apache.spark.sql.test.ProcessTestUtils$ProcessOutputCapturer.run(ProcessTestUtils.scala:30) {code} * https://github.com/apache/spark/pull/28261#issuecomment-618950225 * https://github.com/apache/spark/pull/28261#issuecomment-618950225 * https://github.com/apache/spark/pull/27617#issuecomment-614318644 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-31550) nondeterministic configurations with general meanings in sql configuration doc
[ https://issues.apache.org/jira/browse/SPARK-31550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JinxinTang updated SPARK-31550: --- Comment: was deleted (was: try to specify conf in spark-defaults.conf spark.sql.warehouse.dir /tmp spark.sql.session.timeZone America/New_York It not seems a bug) > nondeterministic configurations with general meanings in sql configuration doc > -- > > Key: SPARK-31550 > URL: https://issues.apache.org/jira/browse/SPARK-31550 > Project: Spark > Issue Type: Bug > Components: Documentation, SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > spark.sql.session.timeZone > spark.sql.warehouse.dir > > these 2 configs are nondeterministic and vary with environments -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-31532) SparkSessionBuilder shoud not propagate static sql configurations to the existing active/default SparkSession
[ https://issues.apache.org/jira/browse/SPARK-31532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JinxinTang updated SPARK-31532: --- Comment: was deleted (was: Thanks for your issue, these followings may not be allowed to modify after sparksession startup by design: [spark.sql.codegen.comments, spark.sql.queryExecutionListeners, spark.sql.catalogImplementation, spark.sql.subquery.maxThreadThreshold, spark.sql.globalTempDatabase, spark.sql.codegen.cache.maxEntries, spark.sql.filesourceTableRelationCacheSize, spark.sql.streaming.streamingQueryListeners, spark.sql.ui.retainedExecutions, spark.sql.hive.thriftServer.singleSession, spark.sql.extensions, spark.sql.debug, spark.sql.sources.schemaStringLengthThreshold, spark.sql.warehouse.dir] So it is might not a bug.) > SparkSessionBuilder shoud not propagate static sql configurations to the > existing active/default SparkSession > - > > Key: SPARK-31532 > URL: https://issues.apache.org/jira/browse/SPARK-31532 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > Clearly, this is a bug. > {code:java} > scala> spark.sql("set spark.sql.warehouse.dir").show > +++ > | key| value| > +++ > |spark.sql.warehou...|file:/Users/kenty...| > +++ > scala> spark.sql("set spark.sql.warehouse.dir=2"); > org.apache.spark.sql.AnalysisException: Cannot modify the value of a static > config: spark.sql.warehouse.dir; > at > org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:154) > at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:42) > at > org.apache.spark.sql.execution.command.SetCommand.$anonfun$x$7$6(SetCommand.scala:100) > at > org.apache.spark.sql.execution.command.SetCommand.run(SetCommand.scala:156) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3644) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3642) > at org.apache.spark.sql.Dataset.(Dataset.scala:229) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) > ... 47 elided > scala> import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.SparkSession > scala> SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").get > getClass getOrCreate > scala> SparkSession.builder.config("spark.sql.warehouse.dir", > "xyz").getOrCreate > 20/04/23 23:49:13 WARN SparkSession$Builder: Using an existing SparkSession; > some configuration may not take effect. > res7: org.apache.spark.sql.SparkSession = > org.apache.spark.sql.SparkSession@6403d574 > scala> spark.sql("set spark.sql.warehouse.dir").show > ++-+ > | key|value| > ++-+ > |spark.sql.warehou...| xyz| > ++-+ > scala> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30724) Support 'like any' and 'like all' operators
[ https://issues.apache.org/jira/browse/SPARK-30724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-30724. -- Fix Version/s: 3.1.0 Assignee: Yuming Wang Resolution: Fixed Resolved by https://github.com/apache/spark/pull/27477 > Support 'like any' and 'like all' operators > --- > > Key: SPARK-30724 > URL: https://issues.apache.org/jira/browse/SPARK-30724 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.1.0 > > > In Teradata/Hive and PostgreSQL 'like any' and 'like all' operators are > mostly used when we are matching a text field with numbers of patterns. For > example: > Teradata / Hive 3.0: > {code:sql} > --like any > select 'foo' LIKE ANY ('%foo%','%bar%'); > --like all > select 'foo' LIKE ALL ('%foo%','%bar%'); > {code} > PostgreSQL: > {code:sql} > -- like any > select 'foo' LIKE ANY (array['%foo%','%bar%']); > -- like all > select 'foo' LIKE ALL (array['%foo%','%bar%']); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31553) Wrong result of isInCollection for large collections
[ https://issues.apache.org/jira/browse/SPARK-31553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091490#comment-17091490 ] Maxim Gekk commented on SPARK-31553: I am working on the issue > Wrong result of isInCollection for large collections > > > Key: SPARK-31553 > URL: https://issues.apache.org/jira/browse/SPARK-31553 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > If the size of a collection passed to isInCollection is bigger than > spark.sql.optimizer.inSetConversionThreshold, the method can return wrong > results for some inputs. For example: > {code:scala} > val set = (0 to 20).map(_.toString).toSet > val data = Seq("1").toDF("x") > println(set.contains("1")) > data.select($"x".isInCollection(set).as("isInCollection")).show() > {code} > {code} > true > +--+ > |isInCollection| > +--+ > | false| > +--+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31553) Wrong result of isInCollection for large collections
Maxim Gekk created SPARK-31553: -- Summary: Wrong result of isInCollection for large collections Key: SPARK-31553 URL: https://issues.apache.org/jira/browse/SPARK-31553 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0, 3.1.0 Reporter: Maxim Gekk If the size of a collection passed to isInCollection is bigger than spark.sql.optimizer.inSetConversionThreshold, the method can return wrong results for some inputs. For example: {code:scala} val set = (0 to 20).map(_.toString).toSet val data = Seq("1").toDF("x") println(set.contains("1")) data.select($"x".isInCollection(set).as("isInCollection")).show() {code} {code} true +--+ |isInCollection| +--+ | false| +--+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31463) Enhance JsonDataSource by replacing jackson with simdjson
[ https://issues.apache.org/jira/browse/SPARK-31463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091389#comment-17091389 ] Maxim Gekk commented on SPARK-31463: Parsing itself takes 10-20%. JSON datasource spends significant time in conversions to desired types according to schema. Even if you improve performance of parsing by a few times, the total impact will be not so significant. > Enhance JsonDataSource by replacing jackson with simdjson > - > > Key: SPARK-31463 > URL: https://issues.apache.org/jira/browse/SPARK-31463 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Steven Moy >Priority: Minor > > I came across this VLDB paper: [https://arxiv.org/pdf/1902.08318.pdf] on how > to improve json reading speed. We use Spark to process terabytes of JSON, so > we try to find ways to improve JSON parsing speed. > > [https://lemire.me/blog/2020/03/31/we-released-simdjson-0-3-the-fastest-json-parser-in-the-world-is-even-better/] > > [https://github.com/simdjson/simdjson/issues/93] > > Anyone on the opensource communty interested in leading this effort to > integrate simdjson in spark json data source api? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31552) Fix potential ClassCastException in ScalaReflection arrayClassFor
Kent Yao created SPARK-31552: Summary: Fix potential ClassCastException in ScalaReflection arrayClassFor Key: SPARK-31552 URL: https://issues.apache.org/jira/browse/SPARK-31552 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0, 3.1.0 Reporter: Kent Yao arrayClassFor and dataTypeFor in ScalaReflection call each other circularly, the cases in dataTypeFor are not fully handled in arrayClassFor For example: {code:java} scala> implicit def newArrayEncoder[T <: Array[_] : TypeTag]: Encoder[T] = ExpressionEncoder() newArrayEncoder: [T <: Array[_]](implicit evidence$1: reflect.runtime.universe.TypeTag[T])org.apache.spark.sql.Encoder[T] scala> val decOne = Decimal(1, 38, 18) decOne: org.apache.spark.sql.types.Decimal = 1E-18 scala> val decTwo = Decimal(2, 38, 18) decTwo: org.apache.spark.sql.types.Decimal = 2E-18 scala> val decSpark = Array(decOne, decTwo) decSpark: Array[org.apache.spark.sql.types.Decimal] = Array(1E-18, 2E-18) scala> Seq(decSpark).toDF() java.lang.ClassCastException: org.apache.spark.sql.types.DecimalType cannot be cast to org.apache.spark.sql.types.ObjectType at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$arrayClassFor$1(ScalaReflection.scala:131) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.arrayClassFor(ScalaReflection.scala:120) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$dataTypeFor$1(ScalaReflection.scala:105) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.dataTypeFor(ScalaReflection.scala:88) at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerForType$1(ScalaReflection.scala:399) at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:69) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects(ScalaReflection.scala:879) at org.apache.spark.sql.catalyst.ScalaReflection.cleanUpReflectionObjects$(ScalaReflection.scala:878) at org.apache.spark.sql.catalyst.ScalaReflection$.cleanUpReflectionObjects(ScalaReflection.scala:49) at org.apache.spark.sql.catalyst.ScalaReflection$.serializerForType(ScalaReflection.scala:393) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:57) at newArrayEncoder(:57) ... 53 elided scala> {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31502) document identifier in SQL Reference
[ https://issues.apache.org/jira/browse/SPARK-31502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31502: --- Assignee: Huaxin Gao > document identifier in SQL Reference > > > Key: SPARK-31502 > URL: https://issues.apache.org/jira/browse/SPARK-31502 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > > document identifier in SQL Reference -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31502) document identifier in SQL Reference
[ https://issues.apache.org/jira/browse/SPARK-31502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31502. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28277 [https://github.com/apache/spark/pull/28277] > document identifier in SQL Reference > > > Key: SPARK-31502 > URL: https://issues.apache.org/jira/browse/SPARK-31502 > Project: Spark > Issue Type: Sub-task > Components: Documentation, SQL >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.0.0 > > > document identifier in SQL Reference -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31449) Investigate the difference between JDK and Spark's time zone offset calculation
[ https://issues.apache.org/jira/browse/SPARK-31449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-31449: --- Summary: Investigate the difference between JDK and Spark's time zone offset calculation (was: Is there a difference between JDK and Spark's time zone offset calculation) > Investigate the difference between JDK and Spark's time zone offset > calculation > --- > > Key: SPARK-31449 > URL: https://issues.apache.org/jira/browse/SPARK-31449 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 2.4.5 >Reporter: Maxim Gekk >Priority: Major > > Spark 2.4 calculates time zone offsets from wall clock timestamp using > `DateTimeUtils.getOffsetFromLocalMillis()` (see > https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1088-L1118): > {code:scala} > private[sql] def getOffsetFromLocalMillis(millisLocal: Long, tz: TimeZone): > Long = { > var guess = tz.getRawOffset > // the actual offset should be calculated based on milliseconds in UTC > val offset = tz.getOffset(millisLocal - guess) > if (offset != guess) { > guess = tz.getOffset(millisLocal - offset) > if (guess != offset) { > // fallback to do the reverse lookup using java.sql.Timestamp > // this should only happen near the start or end of DST > val days = Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt > val year = getYear(days) > val month = getMonth(days) > val day = getDayOfMonth(days) > var millisOfDay = (millisLocal % MILLIS_PER_DAY).toInt > if (millisOfDay < 0) { > millisOfDay += MILLIS_PER_DAY.toInt > } > val seconds = (millisOfDay / 1000L).toInt > val hh = seconds / 3600 > val mm = seconds / 60 % 60 > val ss = seconds % 60 > val ms = millisOfDay % 1000 > val calendar = Calendar.getInstance(tz) > calendar.set(year, month - 1, day, hh, mm, ss) > calendar.set(Calendar.MILLISECOND, ms) > guess = (millisLocal - calendar.getTimeInMillis()).toInt > } > } > guess > } > {code} > Meanwhile, JDK's GregorianCalendar uses special methods of ZoneInfo, see > https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/jdk/src/share/classes/java/util/GregorianCalendar.java#L2795-L2801: > {code:java} > if (zone instanceof ZoneInfo) { > ((ZoneInfo)zone).getOffsetsByWall(millis, zoneOffsets); > } else { > int gmtOffset = isFieldSet(fieldMask, ZONE_OFFSET) ? > internalGet(ZONE_OFFSET) : > zone.getRawOffset(); > zone.getOffsets(millis - gmtOffset, zoneOffsets); > } > {code} > Need to investigate are there any differences in results between 2 approaches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31449) Investigate the difference between JDK and Spark's time zone offset calculation
[ https://issues.apache.org/jira/browse/SPARK-31449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-31449: --- Issue Type: Improvement (was: Question) > Investigate the difference between JDK and Spark's time zone offset > calculation > --- > > Key: SPARK-31449 > URL: https://issues.apache.org/jira/browse/SPARK-31449 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5 >Reporter: Maxim Gekk >Priority: Major > > Spark 2.4 calculates time zone offsets from wall clock timestamp using > `DateTimeUtils.getOffsetFromLocalMillis()` (see > https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L1088-L1118): > {code:scala} > private[sql] def getOffsetFromLocalMillis(millisLocal: Long, tz: TimeZone): > Long = { > var guess = tz.getRawOffset > // the actual offset should be calculated based on milliseconds in UTC > val offset = tz.getOffset(millisLocal - guess) > if (offset != guess) { > guess = tz.getOffset(millisLocal - offset) > if (guess != offset) { > // fallback to do the reverse lookup using java.sql.Timestamp > // this should only happen near the start or end of DST > val days = Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt > val year = getYear(days) > val month = getMonth(days) > val day = getDayOfMonth(days) > var millisOfDay = (millisLocal % MILLIS_PER_DAY).toInt > if (millisOfDay < 0) { > millisOfDay += MILLIS_PER_DAY.toInt > } > val seconds = (millisOfDay / 1000L).toInt > val hh = seconds / 3600 > val mm = seconds / 60 % 60 > val ss = seconds % 60 > val ms = millisOfDay % 1000 > val calendar = Calendar.getInstance(tz) > calendar.set(year, month - 1, day, hh, mm, ss) > calendar.set(Calendar.MILLISECOND, ms) > guess = (millisLocal - calendar.getTimeInMillis()).toInt > } > } > guess > } > {code} > Meanwhile, JDK's GregorianCalendar uses special methods of ZoneInfo, see > https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/jdk/src/share/classes/java/util/GregorianCalendar.java#L2795-L2801: > {code:java} > if (zone instanceof ZoneInfo) { > ((ZoneInfo)zone).getOffsetsByWall(millis, zoneOffsets); > } else { > int gmtOffset = isFieldSet(fieldMask, ZONE_OFFSET) ? > internalGet(ZONE_OFFSET) : > zone.getRawOffset(); > zone.getOffsets(millis - gmtOffset, zoneOffsets); > } > {code} > Need to investigate are there any differences in results between 2 approaches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31535) Fix nested CTE substitution
[ https://issues.apache.org/jira/browse/SPARK-31535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091299#comment-17091299 ] Peter Toth commented on SPARK-31535: Hmm, for some reason my PR ([https://github.com/apache/spark/pull/28318]) didn't get linked to this ticket automatically. > Fix nested CTE substitution > --- > > Key: SPARK-31535 > URL: https://issues.apache.org/jira/browse/SPARK-31535 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Peter Toth >Priority: Blocker > Labels: correctness > > The following nested CTE should result empty result instead of {{1}} > {noformat} > WITH t(c) AS (SELECT 1) > SELECT * FROM t > WHERE c IN ( > WITH t(c) AS (SELECT 2) > SELECT * FROM t > ) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31551) createSparkUser lost user's non-Hadoop credentials
[ https://issues.apache.org/jira/browse/SPARK-31551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuqi Wang updated SPARK-31551: -- Description: See current *[createSparkUser|https://github.com/apache/spark/blob/263f04db865920d9c10251517b00a1b477b58ff1/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76]*: {code:java} def createSparkUser(): UserGroupInformation = { val user = Utils.getCurrentUserName() logDebug("creating UGI for user: " + user) val ugi = UserGroupInformation.createRemoteUser(user) transferCredentials(UserGroupInformation.getCurrentUser(), ugi) ugi } def transferCredentials(source: UserGroupInformation, dest: UserGroupInformation): Unit = { dest.addCredentials(source.getCredentials()) } def getCurrentUserName(): String = { Option(System.getenv("SPARK_USER")) .getOrElse(UserGroupInformation.getCurrentUser().getShortUserName()) } {code} The *transferCredentials* func can only transfer Hadoop creds such as Delegation Tokens. However, other creds stored in UGI.subject.getPrivateCredentials, will be lost here, such as: # Non-Hadoop creds: Such as, [Kafka creds |https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395] # Newly supported or 3rd party supported Hadoop creds: Such as to support OAuth/JWT token authn on Hadoop, we need to store the OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens are not supposed to be managed by Hadoop Credentials (currently it is only for Hadoop secret keys and delegation tokens) Another issue is that the *SPARK_USER* only gets the UserGroupInformation.getCurrentUser().getShortUserName() of the user, which may lost the user's fully qualified user name. We should better use the *getUserName* to get fully qualified user name in our client side, which is aligned to *[HADOOP_PROXY_USER|https://github.com/apache/hadoop/blob/30ef8d0f1a1463931fe581a46c739dad4c8260e4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L716-L720]*. Related to https://issues.apache.org/jira/browse/SPARK-1051 was: See current *[createSparkUser|https://github.com/apache/spark/blob/263f04db865920d9c10251517b00a1b477b58ff1/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76]*: {code:java} def createSparkUser(): UserGroupInformation = { val user = Utils.getCurrentUserName() logDebug("creating UGI for user: " + user) val ugi = UserGroupInformation.createRemoteUser(user) transferCredentials(UserGroupInformation.getCurrentUser(), ugi) ugi } def transferCredentials(source: UserGroupInformation, dest: UserGroupInformation): Unit = { dest.addCredentials(source.getCredentials()) } def getCurrentUserName(): String = { Option(System.getenv("SPARK_USER")) .getOrElse(UserGroupInformation.getCurrentUser().getShortUserName()) } {code} The *transferCredentials* func can only transfer Hadoop creds such as Delegation Tokens. However, other creds stored in UGI.subject.getPrivateCredentials, will be lost here, such as: # Non-Hadoop creds: Such as, [Kafka creds |https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395] # Newly supported or 3rd party supported Hadoop creds: Such as to support OAuth/JWT token authn on Hadoop, we need to store the OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens are not supposed to be managed by Hadoop Credentials (currently it is only for Hadoop secret keys and delegation tokens) Another issue is that the *SPARK_USER* only returns the getShortUserName of the user, which may lost the user's fully qualified user name that need to be passed to PRC server (such as YARN, HDFS, Kafka). We should better use the *getUserName* to get fully qualified user name in our client side, which is aligned to *[HADOOP_PROXY_USER|https://github.com/apache/hadoop/blob/30ef8d0f1a1463931fe581a46c739dad4c8260e4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L716-L720]*. Related to https://issues.apache.org/jira/browse/SPARK-1051 > createSparkUser lost user's non-Hadoop credentials > -- > > Key: SPARK-31551 > URL: https://issues.apache.org/jira/browse/SPARK-31551 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 2.4.5 >Reporter: Yuqi Wang >Priority: Major > > See current >
[jira] [Updated] (SPARK-31551) createSparkUser lost user's non-Hadoop credentials and fully qualified user name
[ https://issues.apache.org/jira/browse/SPARK-31551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuqi Wang updated SPARK-31551: -- Description: See current *[createSparkUser|https://github.com/apache/spark/blob/263f04db865920d9c10251517b00a1b477b58ff1/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76]*: {code:java} def createSparkUser(): UserGroupInformation = { val user = Utils.getCurrentUserName() logDebug("creating UGI for user: " + user) val ugi = UserGroupInformation.createRemoteUser(user) transferCredentials(UserGroupInformation.getCurrentUser(), ugi) ugi } def transferCredentials(source: UserGroupInformation, dest: UserGroupInformation): Unit = { dest.addCredentials(source.getCredentials()) } def getCurrentUserName(): String = { Option(System.getenv("SPARK_USER")) .getOrElse(UserGroupInformation.getCurrentUser().getShortUserName()) } {code} The *transferCredentials* func can only transfer Hadoop creds such as Delegation Tokens. However, other creds stored in UGI.subject.getPrivateCredentials, will be lost here, such as: # Non-Hadoop creds: Such as, [Kafka creds |https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395] # Newly supported or 3rd party supported Hadoop creds: Such as to support OAuth/JWT token authn on Hadoop, we need to store the OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens are not supposed to be managed by Hadoop Credentials (currently it is only for Hadoop secret keys and delegation tokens) Another issue is that the *SPARK_USER* only returns the getShortUserName of the user, which may lost the user's fully qualified user name that need to be passed to PRC server (such as YARN, HDFS, Kafka). We should better use the *getUserName* to get fully qualified user name in our client side, which is aligned to *[HADOOP_PROXY_USER|https://github.com/apache/hadoop/blob/30ef8d0f1a1463931fe581a46c739dad4c8260e4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L716-L720]*. Related to https://issues.apache.org/jira/browse/SPARK-1051 was: See current *[createSparkUser|https://github.com/apache/spark/blob/263f04db865920d9c10251517b00a1b477b58ff1/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76]*: {code:java} def createSparkUser(): UserGroupInformation = { val user = Utils.getCurrentUserName() logDebug("creating UGI for user: " + user) val ugi = UserGroupInformation.createRemoteUser(user) transferCredentials(UserGroupInformation.getCurrentUser(), ugi) ugi } def transferCredentials(source: UserGroupInformation, dest: UserGroupInformation): Unit = { dest.addCredentials(source.getCredentials()) } def getCurrentUserName(): String = { Option(System.getenv("SPARK_USER")) .getOrElse(UserGroupInformation.getCurrentUser().getShortUserName()) } {code} The *transferCredentials* func can only transfer Hadoop creds such as Delegation Tokens. However, other creds stored in UGI.subject.getPrivateCredentials, will be lost here, such as: # Non-Hadoop creds: Such as, [Kafka creds |https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395] # Newly supported or 3rd party supported Hadoop creds: Such as to support OAuth/JWT token authn on Hadoop, we need to store the OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens are not supposed to be managed by Hadoop Credentials (currently it is only for Hadoop secret keys and delegation tokens) Another issue is that the *getCurrentUserName* only returns the getShortUserName of the user, which may lost the user's fully qualified user name that need to be passed to PRC server (such as YARN, HDFS, Kafka). We should better use the *getUserName* to get fully qualified user name in our client side, which is aligned to *[HADOOP_PROXY_USER|https://github.com/apache/hadoop/blob/30ef8d0f1a1463931fe581a46c739dad4c8260e4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L716-L720]*. Related to https://issues.apache.org/jira/browse/SPARK-1051 > createSparkUser lost user's non-Hadoop credentials and fully qualified user > name > > > Key: SPARK-31551 > URL: https://issues.apache.org/jira/browse/SPARK-31551 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 2.4.5 >Reporter: Yuqi Wang >Priority: Major > > See current >
[jira] [Updated] (SPARK-31551) createSparkUser lost user's non-Hadoop credentials
[ https://issues.apache.org/jira/browse/SPARK-31551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuqi Wang updated SPARK-31551: -- Summary: createSparkUser lost user's non-Hadoop credentials (was: createSparkUser lost user's non-Hadoop credentials and fully qualified user name) > createSparkUser lost user's non-Hadoop credentials > -- > > Key: SPARK-31551 > URL: https://issues.apache.org/jira/browse/SPARK-31551 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 2.4.5 >Reporter: Yuqi Wang >Priority: Major > > See current > *[createSparkUser|https://github.com/apache/spark/blob/263f04db865920d9c10251517b00a1b477b58ff1/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L66-L76]*: > {code:java} > def createSparkUser(): UserGroupInformation = { > val user = Utils.getCurrentUserName() > logDebug("creating UGI for user: " + user) > val ugi = UserGroupInformation.createRemoteUser(user) > transferCredentials(UserGroupInformation.getCurrentUser(), ugi) > ugi > } > def transferCredentials(source: UserGroupInformation, dest: > UserGroupInformation): Unit = { > dest.addCredentials(source.getCredentials()) > } > def getCurrentUserName(): String = { > Option(System.getenv("SPARK_USER")) > .getOrElse(UserGroupInformation.getCurrentUser().getShortUserName()) > } > {code} > The *transferCredentials* func can only transfer Hadoop creds such as > Delegation Tokens. > However, other creds stored in UGI.subject.getPrivateCredentials, will be > lost here, such as: > # Non-Hadoop creds: > Such as, [Kafka creds > |https://github.com/apache/kafka/blob/f3c8bff311b0e4c4d0e316ac949fe4491f9b107f/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/OAuthBearerLoginModule.java#L395] > # Newly supported or 3rd party supported Hadoop creds: > Such as to support OAuth/JWT token authn on Hadoop, we need to store the > OAuth/JWT token into UGI.subject.getPrivateCredentials. However, these tokens > are not supposed to be managed by Hadoop Credentials (currently it is only > for Hadoop secret keys and delegation tokens) > Another issue is that the *SPARK_USER* only returns the getShortUserName of > the user, which may lost the user's fully qualified user name that need to be > passed to PRC server (such as YARN, HDFS, Kafka). We should better use the > *getUserName* to get fully qualified user name in our client side, which is > aligned to > *[HADOOP_PROXY_USER|https://github.com/apache/hadoop/blob/30ef8d0f1a1463931fe581a46c739dad4c8260e4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L716-L720]*. > Related to https://issues.apache.org/jira/browse/SPARK-1051 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31463) Enhance JsonDataSource by replacing jackson with simdjson
[ https://issues.apache.org/jira/browse/SPARK-31463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091232#comment-17091232 ] Shashanka Balakuntala Srinivasa commented on SPARK-31463: - Hi [~hyukjin.kwon], I will start looking into this. Thanks. > Enhance JsonDataSource by replacing jackson with simdjson > - > > Key: SPARK-31463 > URL: https://issues.apache.org/jira/browse/SPARK-31463 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Steven Moy >Priority: Minor > > I came across this VLDB paper: [https://arxiv.org/pdf/1902.08318.pdf] on how > to improve json reading speed. We use Spark to process terabytes of JSON, so > we try to find ways to improve JSON parsing speed. > > [https://lemire.me/blog/2020/03/31/we-released-simdjson-0-3-the-fastest-json-parser-in-the-world-is-even-better/] > > [https://github.com/simdjson/simdjson/issues/93] > > Anyone on the opensource communty interested in leading this effort to > integrate simdjson in spark json data source api? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31463) Enhance JsonDataSource by replacing jackson with simdjson
[ https://issues.apache.org/jira/browse/SPARK-31463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091230#comment-17091230 ] Hyukjin Kwon commented on SPARK-31463: -- Separate source might be ideal. We can start it from separate project and gradually move it into Apache Spark when it's proven very useful later. > Enhance JsonDataSource by replacing jackson with simdjson > - > > Key: SPARK-31463 > URL: https://issues.apache.org/jira/browse/SPARK-31463 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Steven Moy >Priority: Minor > > I came across this VLDB paper: [https://arxiv.org/pdf/1902.08318.pdf] on how > to improve json reading speed. We use Spark to process terabytes of JSON, so > we try to find ways to improve JSON parsing speed. > > [https://lemire.me/blog/2020/03/31/we-released-simdjson-0-3-the-fastest-json-parser-in-the-world-is-even-better/] > > [https://github.com/simdjson/simdjson/issues/93] > > Anyone on the opensource communty interested in leading this effort to > integrate simdjson in spark json data source api? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31438) Support JobCleaned Status in SparkListener
[ https://issues.apache.org/jira/browse/SPARK-31438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091225#comment-17091225 ] Hyukjin Kwon commented on SPARK-31438: -- PR https://github.com/apache/spark/pull/28280 > Support JobCleaned Status in SparkListener > -- > > Key: SPARK-31438 > URL: https://issues.apache.org/jira/browse/SPARK-31438 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Jackey Lee >Priority: Major > > In Spark, we need do some hook after job cleaned, such as cleaning hive > external temporary paths. This has already discussed in SPARK-31346 and > [GitHub Pull Request #28129.|https://github.com/apache/spark/pull/28129] > The JobEnd Status is not suitable for this. As JobEnd is responsible for Job > finished, once all result has generated, it should be finished. After finish, > Scheduler will leave the still running tasks to be zombie tasks and delete > abnormal tasks asynchronously. > Thus, we add JobCleaned Status to enable user to do some hook after all > tasks cleaned in Job. The JobCleaned Status can get from TaskSetManagers, > which is related to a stage, and once all stages of the job has been cleaned, > then the job is cleaned. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31453) Error while converting JavaRDD to Dataframe
[ https://issues.apache.org/jira/browse/SPARK-31453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31453. -- Resolution: Duplicate It duplicates SPARK-23862. See SPARK-21255 for the workaround > Error while converting JavaRDD to Dataframe > --- > > Key: SPARK-31453 > URL: https://issues.apache.org/jira/browse/SPARK-31453 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.4.5 >Reporter: Sachit Sharma >Priority: Trivial > > Please refer to this: > [https://stackoverflow.com/questions/61172007/error-while-converting-javardd-to-dataframe] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org