[jira] [Updated] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite
[ https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31716: -- Description: Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs except JDK11 Jenkins jobs which don't have old Spark releases supporting JDK11. {code} HiveExternalCatalogVersionsSuite: org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED *** Exception encountered when invoking run on a nested suite - Fail to get the lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180) {code} was: Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs except JDK11 Jenkins jobs which don't have old Spark releases supporting JDK11. {code} HiveExternalCatalogVersionsSuite: org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED *** Exception encountered when invoking run on a nested suite - Fail to get the lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180) {code] > Use a fallback version in HiveExternalCatalogVersionsSuite > -- > > Key: SPARK-31716 > URL: https://issues.apache.org/jira/browse/SPARK-31716 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs > except JDK11 Jenkins jobs which don't have old Spark releases supporting > JDK11. > {code} > HiveExternalCatalogVersionsSuite: > org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED *** > Exception encountered when invoking run on a nested suite - Fail to get the > lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite
[ https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31716: -- Description: Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs except JDK11 Jenkins jobs which don't have old Spark releases supporting JDK11. {code} HiveExternalCatalogVersionsSuite: org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED *** Exception encountered when invoking run on a nested suite - Fail to get the lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180) {code] > Use a fallback version in HiveExternalCatalogVersionsSuite > -- > > Key: SPARK-31716 > URL: https://issues.apache.org/jira/browse/SPARK-31716 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs > except JDK11 Jenkins jobs which don't have old Spark releases supporting > JDK11. > {code} > HiveExternalCatalogVersionsSuite: > org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED *** > Exception encountered when invoking run on a nested suite - Fail to get the > lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180) > {code] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31717) Remove a fallback version of HiveExternalCatalogVersionsSuite
[ https://issues.apache.org/jira/browse/SPARK-31717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31717: - Assignee: Dongjoon Hyun > Remove a fallback version of HiveExternalCatalogVersionsSuite > - > > Key: SPARK-31717 > URL: https://issues.apache.org/jira/browse/SPARK-31717 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > After we verify that there is not network issue, this issue aims to decide if > we will revert SPARK-31716 or not. > We may find another robust way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31717) Remove a fallback version of HiveExternalCatalogVersionsSuite
[ https://issues.apache.org/jira/browse/SPARK-31717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31717: -- Description: After we verify that there is not network issue, this issue aims to decide if we will revert SPARK-31716 or not. We may find another robust way. was:After we verify that there is not network issue, this issue aims to revert SPARK-31716. > Remove a fallback version of HiveExternalCatalogVersionsSuite > - > > Key: SPARK-31717 > URL: https://issues.apache.org/jira/browse/SPARK-31717 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > > After we verify that there is not network issue, this issue aims to decide if > we will revert SPARK-31716 or not. > We may find another robust way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31717) Remove a fallback version of HiveExternalCatalogVersionsSuite
Dongjoon Hyun created SPARK-31717: - Summary: Remove a fallback version of HiveExternalCatalogVersionsSuite Key: SPARK-31717 URL: https://issues.apache.org/jira/browse/SPARK-31717 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 2.4.6, 3.0.0, 3.1.0 Reporter: Dongjoon Hyun After we verify that there is not network issue, this issue aims to revert SPARK-31716. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite
[ https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31716: Assignee: (was: Apache Spark) > Use a fallback version in HiveExternalCatalogVersionsSuite > -- > > Key: SPARK-31716 > URL: https://issues.apache.org/jira/browse/SPARK-31716 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite
[ https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107963#comment-17107963 ] Apache Spark commented on SPARK-31716: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28536 > Use a fallback version in HiveExternalCatalogVersionsSuite > -- > > Key: SPARK-31716 > URL: https://issues.apache.org/jira/browse/SPARK-31716 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite
[ https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31716: Assignee: Apache Spark > Use a fallback version in HiveExternalCatalogVersionsSuite > -- > > Key: SPARK-31716 > URL: https://issues.apache.org/jira/browse/SPARK-31716 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.6, 3.0.0, 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite
Dongjoon Hyun created SPARK-31716: - Summary: Use a fallback version in HiveExternalCatalogVersionsSuite Key: SPARK-31716 URL: https://issues.apache.org/jira/browse/SPARK-31716 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 2.4.6, 3.0.0, 3.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01
[ https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31712. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28531 [https://github.com/apache/spark/pull/28531] > Check casting timestamps to byte/short/int/long before 1970-01-01 > - > > Key: SPARK-31712 > URL: https://issues.apache.org/jira/browse/SPARK-31712 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > There are tests for casting timestamps to byte/short/int/long after the epoch > 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but > there are no test for "negative" timestamps before the epoch. The ticket aims > to add such tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01
[ https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31712: --- Assignee: Maxim Gekk > Check casting timestamps to byte/short/int/long before 1970-01-01 > - > > Key: SPARK-31712 > URL: https://issues.apache.org/jira/browse/SPARK-31712 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > There are tests for casting timestamps to byte/short/int/long after the epoch > 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but > there are no test for "negative" timestamps before the epoch. The ticket aims > to add such tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard
[ https://issues.apache.org/jira/browse/SPARK-31715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31715: Assignee: Apache Spark > Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance > standard > --- > > Key: SPARK-31715 > URL: https://issues.apache.org/jira/browse/SPARK-31715 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > {code:java} > Caused by: sbt.ForkMain$ForkError: > org.apache.derby.iapi.error.StandardException: Another instance of Derby may > have already booted the database > /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db. > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown > Source) > at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) > at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown > Source) > at > org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown > Source) > ... 138 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard
[ https://issues.apache.org/jira/browse/SPARK-31715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31715: Assignee: (was: Apache Spark) > Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance > standard > --- > > Key: SPARK-31715 > URL: https://issues.apache.org/jira/browse/SPARK-31715 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > {code:java} > Caused by: sbt.ForkMain$ForkError: > org.apache.derby.iapi.error.StandardException: Another instance of Derby may > have already booted the database > /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db. > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown > Source) > at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) > at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown > Source) > at > org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown > Source) > ... 138 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard
[ https://issues.apache.org/jira/browse/SPARK-31715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107907#comment-17107907 ] Apache Spark commented on SPARK-31715: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/28537 > Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance > standard > --- > > Key: SPARK-31715 > URL: https://issues.apache.org/jira/browse/SPARK-31715 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > {code:java} > Caused by: sbt.ForkMain$ForkError: > org.apache.derby.iapi.error.StandardException: Another instance of Derby may > have already booted the database > /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db. > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown > Source) > at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) > at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown > Source) > at > org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown > Source) > ... 138 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm
[ https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107905#comment-17107905 ] zhengruifeng commented on SPARK-31714: -- test code: {code:java} test("performance: gemv vs dot") { for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 1024, 4096)) { val rng = new Random(123) val matrix = Matrices.dense(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble)).toDense val vectors = matrix.rowIter.toArray val vector = Vectors.dense(Array.fill(numCols)(rng.nextDouble)) val start1 = System.nanoTime Seq.range(0, 100).foreach { _ => matrix.multiply(vector) } val dur1 = System.nanoTime - start1 val start2 = System.nanoTime Seq.range(0, 100).foreach { _ => vectors.map(vector.dot) } val dur2 = System.nanoTime - start2 println(s"numRows=$numRows, numCols=$numCols, gemv: $dur1, dot: $dur2, " + s"dot/gemv: ${dur2.toDouble / dur1}") } } test("performance: gemv vs foreachNonZero") { for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 1024, 4096)) { val rng = new Random(123) val matrix = Matrices.dense(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble)).toDense val vectors = matrix.rowIter.toArray val coefVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble)) val coefArr = coefVec.toArray val start1 = System.nanoTime Seq.range(0, 100).foreach { _ => matrix.multiply(coefVec) } val dur1 = System.nanoTime - start1 val start2 = System.nanoTime Seq.range(0, 100).foreach { _ => vectors.map { vector => var sum = 0.0 vector.foreachNonZero((i, v) => sum += coefArr(i) * v) sum } } val dur2 = System.nanoTime - start2 println(s"numRows=$numRows, numCols=$numCols, gemv: $dur1, foreachNonZero: $dur2, " + s"foreachNonZero/gemv: ${dur2.toDouble / dur1}") } } test("performance: gemv vs foreachNonZero(std)") { for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 1024, 4096)) { val rng = new Random(123) val matrix = Matrices.dense(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble)).toDense val vectors = matrix.rowIter.toArray val coefVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble)) val coefArr = coefVec.toArray val stdVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble)) val stdArr = stdVec.toArray val start1 = System.nanoTime Seq.range(0, 100).foreach { _ => matrix.multiply(coefVec) } val dur1 = System.nanoTime - start1 val start2 = System.nanoTime Seq.range(0, 100).foreach { _ => vectors.map { vector => var sum = 0.0 vector.foreachNonZero { (i, v) => val std = stdArr(i) if (std != 0) sum += coefArr(i) * v } sum } } val dur2 = System.nanoTime - start2 println(s"numRows=$numRows, numCols=$numCols, gemv: $dur1, foreachNonZero(std): $dur2, " + s"foreachNonZero(std)/gemv: ${dur2.toDouble / dur1}") } } test("performance: gemv vs while") { for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 1024, 4096)) { val rng = new Random(123) val matrix = Matrices.dense(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble)).toDense val vectors = matrix.rowIter.toArray val coefVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble)) val coefArr = coefVec.toArray val start1 = System.nanoTime Seq.range(0, 100).foreach { _ => matrix.multiply(coefVec) } val dur1 = System.nanoTime - start1 val start2 = System.nanoTime Seq.range(0, 100).foreach { _ => vectors.map { case DenseVector(values) => var sum = 0.0 var i = 0 while (i < values.length) { sum += values(i) * coefArr(i) i += 1 } sum } } val dur2 = System.nanoTime - start2 println(s"numRows=$numRows, numCols=$numCols, gemv: $dur1, while: $dur2, " + s"while/gemv: ${dur2.toDouble / dur1}") } } test("performance: gemv vs while(std)") { for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 1024, 4096)) { val rng = new Random(123) val matrix = Matrices.dense(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble)).toDense val vectors = matrix.rowIter.toArray val coefVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble)) val coefArr = coefVec.toArray val stdVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble)) val stdArr = stdVec.toArray val start1 = System.nanoTime Seq.range(0, 100).foreach { _ => matrix.multiply(coefVec) } val dur1 = System.nanoTime - start1 val start2 = System.nanoTime Seq.range(0, 100).foreach { _ => vectors.map { case DenseVector(values) => var sum =
[jira] [Updated] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm
[ https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-31714: - Attachment: blas-perf > Performance test on java vectorization vs dot vs gemv vs gemm > - > > Key: SPARK-31714 > URL: https://issues.apache.org/jira/browse/SPARK-31714 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > Attachments: BLASSuite.scala, blas-perf > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm
[ https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-31714: - Attachment: BLASSuite.scala > Performance test on java vectorization vs dot vs gemv vs gemm > - > > Key: SPARK-31714 > URL: https://issues.apache.org/jira/browse/SPARK-31714 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > Attachments: BLASSuite.scala, blas-perf > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard
[ https://issues.apache.org/jira/browse/SPARK-31715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-31715: - Summary: Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard (was: Fix flaky SparkSQLEnvSuite that sometimes via single derby instance standard) > Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance > standard > --- > > Key: SPARK-31715 > URL: https://issues.apache.org/jira/browse/SPARK-31715 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > {code:java} > Caused by: sbt.ForkMain$ForkError: > org.apache.derby.iapi.error.StandardException: Another instance of Derby may > have already booted the database > /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db. > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown > Source) > at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown > Source) > at > org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) > at > org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown > Source) > at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) > at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) > at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown > Source) > at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown > Source) > at > org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown > Source) > at > org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown > Source) > ... 138 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes via single derby instance standard
Kent Yao created SPARK-31715: Summary: Fix flaky SparkSQLEnvSuite that sometimes via single derby instance standard Key: SPARK-31715 URL: https://issues.apache.org/jira/browse/SPARK-31715 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.0.0, 3.1.0 Reporter: Kent Yao {code:java} Caused by: sbt.ForkMain$ForkError: org.apache.derby.iapi.error.StandardException: Another instance of Derby may have already booted the database /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source) ... 138 more {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm
[ https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-31714: Assignee: zhengruifeng > Performance test on java vectorization vs dot vs gemv vs gemm > - > > Key: SPARK-31714 > URL: https://issues.apache.org/jira/browse/SPARK-31714 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm
zhengruifeng created SPARK-31714: Summary: Performance test on java vectorization vs dot vs gemv vs gemm Key: SPARK-31714 URL: https://issues.apache.org/jira/browse/SPARK-31714 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.1.0 Reporter: zhengruifeng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31713) Make test-dependencies.sh detect version string correctly
[ https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31713. --- Fix Version/s: 3.0.0 2.4.6 Resolution: Fixed Issue resolved by pull request 28532 [https://github.com/apache/spark/pull/28532] > Make test-dependencies.sh detect version string correctly > - > > Key: SPARK-31713 > URL: https://issues.apache.org/jira/browse/SPARK-31713 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 2.4.6, 3.0.0 > > > Currently, SBT jobs are broken like the following. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console > {code} > [error] running > /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh > ; received return code 1 > Build step 'Execute shell' marked build as failure > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31713) Make test-dependencies.sh detect version string correctly
[ https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31713: - Assignee: Dongjoon Hyun > Make test-dependencies.sh detect version string correctly > - > > Key: SPARK-31713 > URL: https://issues.apache.org/jira/browse/SPARK-31713 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > Currently, SBT jobs are broken like the following. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console > {code} > [error] running > /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh > ; received return code 1 > Build step 'Execute shell' marked build as failure > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31693) Investigate AmpLab Jenkins server network issue
[ https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107856#comment-17107856 ] Dongjoon Hyun edited comment on SPARK-31693 at 5/15/20, 2:24 AM: - Thank you, [~shaneknapp]. Although this is another issue, `amp-jenkins-worker-05` has a corrupted Maven local repo and fails consistently. {code} Using `mvn` from path: /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.3/bin/mvn [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on project spark-parent_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got > (position: END_TAG seen ...\n>... @13:2) -> [Help 1] {code} Could you nuke the local Maven repository directory in this machine. And maybe the other machine which fails consistently, too. was (Author: dongjoon): Thank you, [~shaneknapp]. Although this is another issue, `amp-jenkins-worker-05` has a corrupted Maven local repo and fails consistently. {code} Using `mvn` from path: /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.3/bin/mvn [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on project spark-parent_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got > (position: END_TAG seen ...\n>... @13:2) -> [Help 1] {code} Could you nuke all local Maven repository directory in Spark workers? > Investigate AmpLab Jenkins server network issue > --- > > Key: SPARK-31693 > URL: https://issues.apache.org/jira/browse/SPARK-31693 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Critical > > Given the series of failures in Spark packaging Jenkins job, it seems that > there is a network issue in AmbLab Jenkins cluster. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ > - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay. > - The node failed to download the maven mirror. (SPARK-31691) -> The primary > host is okay. > - The node failed to communicate repository.apache.org. (Current master > branch Jenkins job failure) > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) > on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve > remote metadata > org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could > not transfer metadata > org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to > apache.snapshots.https > (https://repository.apache.org/content/repositories/snapshots): Transfer > failed for > https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml: > Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] > failed: Connection timed out (Connection timed out) -> [Help 1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31693) Investigate AmpLab Jenkins server network issue
[ https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107856#comment-17107856 ] Dongjoon Hyun commented on SPARK-31693: --- Thank you, [~shaneknapp]. Although this is another issue, `amp-jenkins-worker-05` has a corrupted Maven local repo and fails consistently. {code} Using `mvn` from path: /home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.3/bin/mvn [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on project spark-parent_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got > (position: END_TAG seen ...\n>... @13:2) -> [Help 1] {code} Could you nuke all local Maven repository directory in Spark workers? > Investigate AmpLab Jenkins server network issue > --- > > Key: SPARK-31693 > URL: https://issues.apache.org/jira/browse/SPARK-31693 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Critical > > Given the series of failures in Spark packaging Jenkins job, it seems that > there is a network issue in AmbLab Jenkins cluster. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ > - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay. > - The node failed to download the maven mirror. (SPARK-31691) -> The primary > host is okay. > - The node failed to communicate repository.apache.org. (Current master > branch Jenkins job failure) > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) > on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve > remote metadata > org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could > not transfer metadata > org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to > apache.snapshots.https > (https://repository.apache.org/content/repositories/snapshots): Transfer > failed for > https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml: > Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] > failed: Connection timed out (Connection timed out) -> [Help 1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31710) result is the not the same when query and execute jobs
[ https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31710: Assignee: (was: Apache Spark) > result is the not the same when query and execute jobs > -- > > Key: SPARK-31710 > URL: https://issues.apache.org/jira/browse/SPARK-31710 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5 > Environment: hdp:2.7.7 > spark:2.4.5 >Reporter: philipse >Priority: Major > > Hi Team > Steps to reproduce. > {code:java} > create table test(id bigint); > insert into test select 1586318188000; > create table test1(id bigint) partitioned by (year string); > insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) > from test; > {code} > let's check the result. > Case 1: > *select * from test1;* > 234 | 52238-06-04 13:06:400.0 > --the result is wrong > Case 2: > *select 234,cast(id as TIMESTAMP) from test;* > > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:237) > at > org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441) > at > org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421) > at > org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530) > at org.apache.hive.beeline.Rows$Row.(Rows.java:166) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756) > at org.apache.hive.beeline.Commands.execute(Commands.java:826) > at org.apache.hive.beeline.Commands.sql(Commands.java:670) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:226) > at org.apache.hadoop.util.RunJar.main(RunJar.java:141) > Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0) > > I try hive,it works well,and the convert is fine and correct > {code:java} > select 234,cast(id as TIMESTAMP) from test; > 234 2020-04-08 11:56:28 > {code} > Two questions: > q1: > if we forbid this convert,should we keep all cases the same? > q2: > if we allow the convert in some cases, should we decide the long length, for > the code seems to force to convert to ns with times*100 nomatter how long > the data is,if it convert to timestamp with incorrect length, we can raise > the error. > {code:java} > // // converting seconds to us > private[this] def longToTimestamp(t: Long): Long = t * 100L{code} > > Thanks! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31710) result is the not the same when query and execute jobs
[ https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107823#comment-17107823 ] Apache Spark commented on SPARK-31710: -- User 'TJX2014' has created a pull request for this issue: https://github.com/apache/spark/pull/28534 > result is the not the same when query and execute jobs > -- > > Key: SPARK-31710 > URL: https://issues.apache.org/jira/browse/SPARK-31710 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5 > Environment: hdp:2.7.7 > spark:2.4.5 >Reporter: philipse >Priority: Major > > Hi Team > Steps to reproduce. > {code:java} > create table test(id bigint); > insert into test select 1586318188000; > create table test1(id bigint) partitioned by (year string); > insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) > from test; > {code} > let's check the result. > Case 1: > *select * from test1;* > 234 | 52238-06-04 13:06:400.0 > --the result is wrong > Case 2: > *select 234,cast(id as TIMESTAMP) from test;* > > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:237) > at > org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441) > at > org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421) > at > org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530) > at org.apache.hive.beeline.Rows$Row.(Rows.java:166) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756) > at org.apache.hive.beeline.Commands.execute(Commands.java:826) > at org.apache.hive.beeline.Commands.sql(Commands.java:670) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:226) > at org.apache.hadoop.util.RunJar.main(RunJar.java:141) > Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0) > > I try hive,it works well,and the convert is fine and correct > {code:java} > select 234,cast(id as TIMESTAMP) from test; > 234 2020-04-08 11:56:28 > {code} > Two questions: > q1: > if we forbid this convert,should we keep all cases the same? > q2: > if we allow the convert in some cases, should we decide the long length, for > the code seems to force to convert to ns with times*100 nomatter how long > the data is,if it convert to timestamp with incorrect length, we can raise > the error. > {code:java} > // // converting seconds to us > private[this] def longToTimestamp(t: Long): Long = t * 100L{code} > > Thanks! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31710) result is the not the same when query and execute jobs
[ https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31710: Assignee: Apache Spark > result is the not the same when query and execute jobs > -- > > Key: SPARK-31710 > URL: https://issues.apache.org/jira/browse/SPARK-31710 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5 > Environment: hdp:2.7.7 > spark:2.4.5 >Reporter: philipse >Assignee: Apache Spark >Priority: Major > > Hi Team > Steps to reproduce. > {code:java} > create table test(id bigint); > insert into test select 1586318188000; > create table test1(id bigint) partitioned by (year string); > insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) > from test; > {code} > let's check the result. > Case 1: > *select * from test1;* > 234 | 52238-06-04 13:06:400.0 > --the result is wrong > Case 2: > *select 234,cast(id as TIMESTAMP) from test;* > > java.lang.IllegalArgumentException: Timestamp format must be -mm-dd > hh:mm:ss[.f] > at java.sql.Timestamp.valueOf(Timestamp.java:237) > at > org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441) > at > org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421) > at > org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530) > at org.apache.hive.beeline.Rows$Row.(Rows.java:166) > at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43) > at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756) > at org.apache.hive.beeline.Commands.execute(Commands.java:826) > at org.apache.hive.beeline.Commands.sql(Commands.java:670) > at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974) > at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810) > at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767) > at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480) > at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:226) > at org.apache.hadoop.util.RunJar.main(RunJar.java:141) > Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0) > > I try hive,it works well,and the convert is fine and correct > {code:java} > select 234,cast(id as TIMESTAMP) from test; > 234 2020-04-08 11:56:28 > {code} > Two questions: > q1: > if we forbid this convert,should we keep all cases the same? > q2: > if we allow the convert in some cases, should we decide the long length, for > the code seems to force to convert to ns with times*100 nomatter how long > the data is,if it convert to timestamp with incorrect length, we can raise > the error. > {code:java} > // // converting seconds to us > private[this] def longToTimestamp(t: Long): Long = t * 100L{code} > > Thanks! > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29358) Make unionByName optionally fill missing columns with nulls
[ https://issues.apache.org/jira/browse/SPARK-29358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107775#comment-17107775 ] Andreas Neumann commented on SPARK-29358: - I would like to put a vote for this feature * It makes life so much easier when you have multiple inputs with slightly varying schema, which is quite common for data that evolved over time. * The work-around described at the top where you explicitly add the missing columns is really cumbersome if the schema is large. * With the approach of an extra argument the compatibility concerns should be lifted. > Make unionByName optionally fill missing columns with nulls > --- > > Key: SPARK-29358 > URL: https://issues.apache.org/jira/browse/SPARK-29358 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Mukul Murthy >Priority: Major > > Currently, unionByName requires two DataFrames to have the same set of > columns (even though the order can be different). It would be good to add > either an option to unionByName or a new type of union which fills in missing > columns with nulls. > {code:java} > val df1 = Seq(1, 2, 3).toDF("x") > val df2 = Seq("a", "b", "c").toDF("y") > df1.unionByName(df2){code} > This currently throws > {code:java} > org.apache.spark.sql.AnalysisException: Cannot resolve column name "x" among > (y); > {code} > Ideally, there would be a way to make this return a DataFrame containing: > {code:java} > +++ > | x| y| > +++ > | 1|null| > | 2|null| > | 3|null| > |null| a| > |null| b| > |null| c| > +++ > {code} > Currently the workaround to make this possible is by using unionByName, but > this is clunky: > {code:java} > df1.withColumn("y", lit(null)).unionByName(df2.withColumn("x", lit(null))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31713) Make test-dependencies.sh detect version string correctly
[ https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107712#comment-17107712 ] Apache Spark commented on SPARK-31713: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28532 > Make test-dependencies.sh detect version string correctly > - > > Key: SPARK-31713 > URL: https://issues.apache.org/jira/browse/SPARK-31713 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Currently, SBT jobs are broken like the following. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console > {code} > [error] running > /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh > ; received return code 1 > Build step 'Execute shell' marked build as failure > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31713) Make test-dependencies.sh detect version string correctly
[ https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31713: Assignee: (was: Apache Spark) > Make test-dependencies.sh detect version string correctly > - > > Key: SPARK-31713 > URL: https://issues.apache.org/jira/browse/SPARK-31713 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Currently, SBT jobs are broken like the following. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console > {code} > [error] running > /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh > ; received return code 1 > Build step 'Execute shell' marked build as failure > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31713) Make test-dependencies.sh detect version string correctly
[ https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107711#comment-17107711 ] Apache Spark commented on SPARK-31713: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28532 > Make test-dependencies.sh detect version string correctly > - > > Key: SPARK-31713 > URL: https://issues.apache.org/jira/browse/SPARK-31713 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Currently, SBT jobs are broken like the following. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console > {code} > [error] running > /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh > ; received return code 1 > Build step 'Execute shell' marked build as failure > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31713) Make test-dependencies.sh detect version string correctly
[ https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31713: Assignee: Apache Spark > Make test-dependencies.sh detect version string correctly > - > > Key: SPARK-31713 > URL: https://issues.apache.org/jira/browse/SPARK-31713 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > Currently, SBT jobs are broken like the following. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console > {code} > [error] running > /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh > ; received return code 1 > Build step 'Execute shell' marked build as failure > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31713) Make test-dependencies.sh detect version string correctly
[ https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31713: -- Summary: Make test-dependencies.sh detect version string correctly (was: Make test-dependencies.sh detect version string only) > Make test-dependencies.sh detect version string correctly > - > > Key: SPARK-31713 > URL: https://issues.apache.org/jira/browse/SPARK-31713 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.4.6, 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Currently, SBT jobs are broken like the following. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console > {code} > [error] running > /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh > ; received return code 1 > Build step 'Execute shell' marked build as failure > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31713) Make test-dependencies.sh detect version string only
Dongjoon Hyun created SPARK-31713: - Summary: Make test-dependencies.sh detect version string only Key: SPARK-31713 URL: https://issues.apache.org/jira/browse/SPARK-31713 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 2.4.6, 3.0.0 Reporter: Dongjoon Hyun Currently, SBT jobs are broken like the following. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console {code} [error] running /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh ; received return code 1 Build step 'Execute shell' marked build as failure {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31693) Investigate AmpLab Jenkins server network issue
[ https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107641#comment-17107641 ] Shane Knapp commented on SPARK-31693: - filed https://issues.apache.org/jira/browse/INFRA-20267 i don't think it's us. i could be wrong as IANANE (i am not a network engineer). :) > Investigate AmpLab Jenkins server network issue > --- > > Key: SPARK-31693 > URL: https://issues.apache.org/jira/browse/SPARK-31693 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Priority: Critical > > Given the series of failures in Spark packaging Jenkins job, it seems that > there is a network issue in AmbLab Jenkins cluster. > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/ > - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay. > - The node failed to download the maven mirror. (SPARK-31691) -> The primary > host is okay. > - The node failed to communicate repository.apache.org. (Current master > branch Jenkins job failure) > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) > on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve > remote metadata > org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could > not transfer metadata > org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to > apache.snapshots.https > (https://repository.apache.org/content/repositories/snapshots): Transfer > failed for > https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml: > Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] > failed: Connection timed out (Connection timed out) -> [Help 1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01
[ https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31712: Assignee: Apache Spark > Check casting timestamps to byte/short/int/long before 1970-01-01 > - > > Key: SPARK-31712 > URL: https://issues.apache.org/jira/browse/SPARK-31712 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > There are tests for casting timestamps to byte/short/int/long after the epoch > 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but > there are no test for "negative" timestamps before the epoch. The ticket aims > to add such tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01
[ https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31712: Assignee: (was: Apache Spark) > Check casting timestamps to byte/short/int/long before 1970-01-01 > - > > Key: SPARK-31712 > URL: https://issues.apache.org/jira/browse/SPARK-31712 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > There are tests for casting timestamps to byte/short/int/long after the epoch > 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but > there are no test for "negative" timestamps before the epoch. The ticket aims > to add such tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01
[ https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107625#comment-17107625 ] Apache Spark commented on SPARK-31712: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/28531 > Check casting timestamps to byte/short/int/long before 1970-01-01 > - > > Key: SPARK-31712 > URL: https://issues.apache.org/jira/browse/SPARK-31712 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0, 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > There are tests for casting timestamps to byte/short/int/long after the epoch > 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but > there are no test for "negative" timestamps before the epoch. The ticket aims > to add such tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01
Maxim Gekk created SPARK-31712: -- Summary: Check casting timestamps to byte/short/int/long before 1970-01-01 Key: SPARK-31712 URL: https://issues.apache.org/jira/browse/SPARK-31712 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0, 3.1.0 Reporter: Maxim Gekk There are tests for casting timestamps to byte/short/int/long after the epoch 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but there are no test for "negative" timestamps before the epoch. The ticket aims to add such tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31579) Replace floorDiv by / in localRebaseGregorianToJulianDays()
[ https://issues.apache.org/jira/browse/SPARK-31579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107609#comment-17107609 ] Sudharshann D. commented on SPARK-31579: Just a small update. I have the design for the issue. Its something similar to this: # Write a duplicate of localRebaseGregorianToJulianDays() with / instead of floor div and with extra parameters days: Int, tz: TimeZone, hr: Int. # Write test cases that iterate over all days, all timezones, and each hour and compare the result of floorDiv and /. # Send the PR with this modification. If you think its fine, I'll clear all the edits and replace floorDiv by / I have the code but there's something wrong with my dev environment...Figuring it out... > Replace floorDiv by / in localRebaseGregorianToJulianDays() > --- > > Key: SPARK-31579 > URL: https://issues.apache.org/jira/browse/SPARK-31579 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Minor > Labels: starter > > Most likely utcCal.getTimeInMillis % MILLIS_PER_DAY == 0 but need to check > that for all available time zones in the range of [0001, 2100] years with the > step of 1 hour or maybe smaller. If this hypothesis is confirmed, floorDiv > can be replaced by /, and this should improve performance of > RebaseDateTime.localRebaseGregorianToJulianDays. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id
[ https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107587#comment-17107587 ] Dongjoon Hyun commented on SPARK-31387: --- This is reverted from branch-3.0/master inevitability because this breaks all Maven jobs in both branches. Please see the comments on the original PR. > HiveThriftServer2Listener update methods fail with unknown operation/session > id > --- > > Key: SPARK-31387 > URL: https://issues.apache.org/jira/browse/SPARK-31387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Ali Smesseim >Priority: Major > > HiveThriftServer2Listener update methods, such as onSessionClosed and > onOperationError throw a NullPointerException (in Spark 3) or a > NoSuchElementException (in Spark 2) when the input session/operation id is > unknown. In Spark 2, this can cause control flow issues with the caller of > the listener. In Spark 3, the listener is called by a ListenerBus which > catches the exception, but it would still be nicer if an invalid update is > logged and does not throw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id
[ https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31387: Assignee: Apache Spark > HiveThriftServer2Listener update methods fail with unknown operation/session > id > --- > > Key: SPARK-31387 > URL: https://issues.apache.org/jira/browse/SPARK-31387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Ali Smesseim >Assignee: Apache Spark >Priority: Major > > HiveThriftServer2Listener update methods, such as onSessionClosed and > onOperationError throw a NullPointerException (in Spark 3) or a > NoSuchElementException (in Spark 2) when the input session/operation id is > unknown. In Spark 2, this can cause control flow issues with the caller of > the listener. In Spark 3, the listener is called by a ListenerBus which > catches the exception, but it would still be nicer if an invalid update is > logged and does not throw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id
[ https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31387: -- Fix Version/s: (was: 3.0.0) > HiveThriftServer2Listener update methods fail with unknown operation/session > id > --- > > Key: SPARK-31387 > URL: https://issues.apache.org/jira/browse/SPARK-31387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Ali Smesseim >Priority: Major > > HiveThriftServer2Listener update methods, such as onSessionClosed and > onOperationError throw a NullPointerException (in Spark 3) or a > NoSuchElementException (in Spark 2) when the input session/operation id is > unknown. In Spark 2, this can cause control flow issues with the caller of > the listener. In Spark 3, the listener is called by a ListenerBus which > catches the exception, but it would still be nicer if an invalid update is > logged and does not throw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id
[ https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31387: Assignee: (was: Apache Spark) > HiveThriftServer2Listener update methods fail with unknown operation/session > id > --- > > Key: SPARK-31387 > URL: https://issues.apache.org/jira/browse/SPARK-31387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Ali Smesseim >Priority: Major > > HiveThriftServer2Listener update methods, such as onSessionClosed and > onOperationError throw a NullPointerException (in Spark 3) or a > NoSuchElementException (in Spark 2) when the input session/operation id is > unknown. In Spark 2, this can cause control flow issues with the caller of > the listener. In Spark 3, the listener is called by a ListenerBus which > catches the exception, but it would still be nicer if an invalid update is > logged and does not throw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id
[ https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-31387: --- Assignee: (was: Ali Smesseim) > HiveThriftServer2Listener update methods fail with unknown operation/session > id > --- > > Key: SPARK-31387 > URL: https://issues.apache.org/jira/browse/SPARK-31387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Ali Smesseim >Priority: Major > Fix For: 3.0.0 > > > HiveThriftServer2Listener update methods, such as onSessionClosed and > onOperationError throw a NullPointerException (in Spark 3) or a > NoSuchElementException (in Spark 2) when the input session/operation id is > unknown. In Spark 2, this can cause control flow issues with the caller of > the listener. In Spark 3, the listener is called by a ListenerBus which > catches the exception, but it would still be nicer if an invalid update is > logged and does not throw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31696) Support spark.kubernetes.driver.service.annotation
[ https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31696: -- Fix Version/s: (was: 3.1.0) 3.0.0 > Support spark.kubernetes.driver.service.annotation > -- > > Key: SPARK-31696 > URL: https://issues.apache.org/jira/browse/SPARK-31696 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31696) Support spark.kubernetes.driver.service.annotation
[ https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31696: -- Affects Version/s: (was: 3.1.0) 3.0.0 > Support spark.kubernetes.driver.service.annotation > -- > > Key: SPARK-31696 > URL: https://issues.apache.org/jira/browse/SPARK-31696 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31696) Support spark.kubernetes.driver.service.annotation
[ https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107483#comment-17107483 ] Apache Spark commented on SPARK-31696: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28530 > Support spark.kubernetes.driver.service.annotation > -- > > Key: SPARK-31696 > URL: https://issues.apache.org/jira/browse/SPARK-31696 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory
[ https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107478#comment-17107478 ] Apache Spark commented on SPARK-31692: -- User 'karuppayya' has created a pull request for this issue: https://github.com/apache/spark/pull/28529 > Hadoop confs passed via spark config are not set in URLStream Handler Factory > - > > Key: SPARK-31692 > URL: https://issues.apache.org/jira/browse/SPARK-31692 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karuppayya >Assignee: Karuppayya >Priority: Major > Fix For: 3.0.0 > > > Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in > URLStreamHandlerFactory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-31681: - Affects Version/s: (was: 3.1.0) 3.0.0 > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Labels: release-notes > Fix For: 3.0.0 > > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31681. -- Fix Version/s: 3.0.0 Assignee: Huaxin Gao Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28503 > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Labels: release-notes > Fix For: 3.0.0 > > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-31681: - Docs Text: In Spark 3.0, a multiclass logistic regression in Pyspark will now (correctly) return LogisticRegressionSummary, not the subclass BinaryLogisticRegressionSummary. The BinaryLogisticRegressionSummary would not work in this case anyway. Description: {code:java} def evaluate(self, dataset): .. java_blr_summary = self._call_java("evaluate", dataset) return BinaryLogisticRegressionSummary(java_blr_summary) {code} We should return LogisticRegressionSummary instead of BinaryLogisticRegressionSummary for multiclass LogisticRegression was: {code:java} def evaluate(self, dataset): .. java_blr_summary = self._call_java("evaluate", dataset) return BinaryLogisticRegressionSummary(java_blr_summary) {code} We should return LogisticRegressionSummary instead of BinaryLogisticRegressionSummary for multiclass LogisticRegression > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Minor > Labels: release-notes > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-31681: - Docs Text: In Spark 3.0, a multiclass logistic regression in Pyspark will now (correctly) return LogisticRegressionSummary, not the subclass BinaryLogisticRegressionSummary. The additional methods exposed by BinaryLogisticRegressionSummary would not work in this case anyway. (was: In Spark 3.0, a multiclass logistic regression in Pyspark will now (correctly) return LogisticRegressionSummary, not the subclass BinaryLogisticRegressionSummary. The BinaryLogisticRegressionSummary would not work in this case anyway.) > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Minor > Labels: release-notes > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-31681: - Labels: release-notes (was: ) > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Minor > Labels: release-notes > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER
[ https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107376#comment-17107376 ] Rafael commented on SPARK-20427: Hey guys, I encountered an issue related to precision issues. Now the code expects the Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? > Issue with Spark interpreting Oracle datatype NUMBER > > > Key: SPARK-20427 > URL: https://issues.apache.org/jira/browse/SPARK-20427 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Alexander Andrushenko >Assignee: Yuming Wang >Priority: Major > Fix For: 2.3.0 > > > In Oracle exists data type NUMBER. When defining a filed in a table of type > NUMBER the field has two components, precision and scale. > For example, NUMBER(p,s) has precision p and scale s. > Precision can range from 1 to 38. > Scale can range from -84 to 127. > When reading such a filed Spark can create numbers with precision exceeding > 38. In our case it has created fields with precision 44, > calculated as sum of the precision (in our case 34 digits) and the scale (10): > "...java.lang.IllegalArgumentException: requirement failed: Decimal precision > 44 exceeds max precision 38...". > The result was, that a data frame was read from a table on one schema but > could not be inserted in the identical table on other schema. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-30100) Decimal Precision Inferred from JDBC via Spark
[ https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107365#comment-17107365 ] Rafael edited comment on SPARK-30100 at 5/14/20, 2:48 PM: -- Hey guys, I encountered an issue related to the precision issues. Now the code expects the for the Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? was (Author: kyrdan): Hey guys, I encountered an issue related to the precision issues. Now the code expects the for the Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13001869] > Decimal Precision Inferred from JDBC via Spark > -- > > Key: SPARK-30100 > URL: https://issues.apache.org/jira/browse/SPARK-30100 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.0 >Reporter: Joby Joje >Priority: Major > > When trying to load data from JDBC(Oracle) into Spark, there seems to be > precision loss in the decimal field, as per my understanding Spark supports > *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark > rounds off the last four digits making it a precision of DECIMAL(38,10). This > is happening to few fields in the dataframe where the column is fetched using > a CASE statement whereas in the same query another field populates the right > schema. > Tried to pass the > {code:java} > spark.sql.decimalOperations.allowPrecisionLoss=false{code} > conf in the Spark-submit though didn't get the desired results. > {code:java} > jdbcDF = spark.read \ > .format("jdbc") \ > .option("url", "ORACLE") \ > .option("dbtable", "QUERY") \ > .option("user", "USERNAME") \ > .option("password", "PASSWORD") \ > .load(){code} > So considering that the Spark infers the schema from a sample records, how > does this work here? Does it use the results of the query i.e (SELECT * FROM > TABLE_NAME JOIN ...) or does it take a different route to guess the schema > for itself? Can someone throw some light on this and advise how to achieve > the right decimal precision on this regards without manipulating the query as > doing a CAST on the query does solve the issue, but would prefer to get some > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30100) Decimal Precision Inferred from JDBC via Spark
[ https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107365#comment-17107365 ] Rafael commented on SPARK-30100: Hey guys, I encountered an issue related to the precision issues. Now the code expects the for the Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13001869] > Decimal Precision Inferred from JDBC via Spark > -- > > Key: SPARK-30100 > URL: https://issues.apache.org/jira/browse/SPARK-30100 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.0 >Reporter: Joby Joje >Priority: Major > > When trying to load data from JDBC(Oracle) into Spark, there seems to be > precision loss in the decimal field, as per my understanding Spark supports > *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark > rounds off the last four digits making it a precision of DECIMAL(38,10). This > is happening to few fields in the dataframe where the column is fetched using > a CASE statement whereas in the same query another field populates the right > schema. > Tried to pass the > {code:java} > spark.sql.decimalOperations.allowPrecisionLoss=false{code} > conf in the Spark-submit though didn't get the desired results. > {code:java} > jdbcDF = spark.read \ > .format("jdbc") \ > .option("url", "ORACLE") \ > .option("dbtable", "QUERY") \ > .option("user", "USERNAME") \ > .option("password", "PASSWORD") \ > .load(){code} > So considering that the Spark infers the schema from a sample records, how > does this work here? Does it use the results of the query i.e (SELECT * FROM > TABLE_NAME JOIN ...) or does it take a different route to guess the schema > for itself? Can someone throw some light on this and advise how to achieve > the right decimal precision on this regards without manipulating the query as > doing a CAST on the query does solve the issue, but would prefer to get some > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
[ https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31676. -- Fix Version/s: 2.4.7 3.0.0 Assignee: Weichen Xu Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28498 > QuantileDiscretizer raise error parameter splits given invalid value (splits > array includes -0.0 and 0.0) > - > > Key: SPARK-31676 > URL: https://issues.apache.org/jira/browse/SPARK-31676 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.5, 3.0.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Fix For: 3.0.0, 2.4.7 > > > Reproduce code > {code} > import scala.util.Random > val rng = new Random(3) > val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ > Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) > import spark.implicits._ > val df1 = sc.parallelize(a1, 2).toDF("id") > import org.apache.spark.ml.feature.QuantileDiscretizer > val qd = new > QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) > val model = qd.fit(df1) > {code} > Raise error like: > at org.apache.spark.ml.param.Param.validate(params.scala:76) > at org.apache.spark.ml.param.ParamPair.(params.scala:634) > at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) > at org.apache.spark.ml.param.Params.set(params.scala:713) > at org.apache.spark.ml.param.Params.set$(params.scala:712) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) > at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) > at > org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) > ... 49 elided > java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 > parameter splits given invalid value [-Infinity,-0.9986765732730827,..., > -0.0, 0.0, ..., 0.9907184077958491,Infinity] > 0.0 > -0.0 is False, which break the paremater validation check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-31686) Return of String instead of array in function get_json_object
[ https://issues.apache.org/jira/browse/SPARK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Touopi Touopi reopened SPARK-31686: --- I don't really understand the purpose to change the return type. {code:sql} select v1.brandedcustomernumber as brandedcustomernumber from uniquecustomer.UniqueCustomer lateral view explode(from_json(get_json_object(string(brandedCustomerInfoAggregate), '$.brandedCustomers[*].customerNumber'), 'array')) v1 as brandedcustomernumber {code} Look this example, Since i am using the wilcard [*] it means that i can have 0..n elements returned. Lucky my brandedCustomerInfoAggregate object has more than one brandedCustomers elements so the result of the get_json_object function will be ["customer1","customer2"] for instance. So now the function explode is waiting an array,what will happens if in any case i have just one brandedCustomers filled ? the Object like String (actually i discover the " characters added on the chain) will be return liked this "customer1" an the function from_json will break. I am expecting that during the parsing and selection of node if we have [*] we should return an array. Actually when One element is returned for another query,i am converting to array and cast to string (from_json(cast(array(get_json_object(string(customer),'$.addresses[*].location')) as string),'array')) But the result are not good when more elements are returned > Return of String instead of array in function get_json_object > - > > Key: SPARK-31686 > URL: https://issues.apache.org/jira/browse/SPARK-31686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5 > Environment: {code:json} > // code placeholder > { > customer:{ > addesses:[ { {code} > location : arizona > } > ] > } > } > get_json_object(string(customer),'$addresses[*].location') > return "arizona" > result expected should be > ["arizona"] >Reporter: Touopi Touopi >Priority: Major > > when we selecting a node of a json object that is array, > When the array contains One element , the get_json_object return a String > with " characters instead of an array of One element. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-31686) Return of String instead of array in function get_json_object
[ https://issues.apache.org/jira/browse/SPARK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Touopi Touopi updated SPARK-31686: -- Comment: was deleted (was: I don't really understand the purpose to change the return type. {code:sql} select v1.brandedcustomernumber as brandedcustomernumber from uniquecustomer.UniqueCustomer lateral view explode(from_json(get_json_object(string(brandedCustomerInfoAggregate), '$.brandedCustomers[*].customerNumber'), 'array')) v1 as brandedcustomernumber {code} Look this example, Since i am using the wilcard [*] it means that i can have 0..n elements returned. Lucky my brandedCustomerInfoAggregate object has more than one brandedCustomers elements so the result of the get_json_object function will be ["customer1","customer2"] for instance. So now the function explode is waiting an array,what will happens if in any case i have just one brandedCustomers filled ? the Object like String (actually i discover the " characters added on the chain) will be return liked this "customer1" an the function from_json will break. I am expecting that during the parsing and selection of node if we have [*] we should return an array. Actually when One element is returned for another query,i am converting to array and cast to string (from_json(cast(array(get_json_object(string(customer),'$.addresses[*].location')) as string),'array')) But the result are not good when more elements are returned) > Return of String instead of array in function get_json_object > - > > Key: SPARK-31686 > URL: https://issues.apache.org/jira/browse/SPARK-31686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5 > Environment: {code:json} > // code placeholder > { > customer:{ > addesses:[ { {code} > location : arizona > } > ] > } > } > get_json_object(string(customer),'$addresses[*].location') > return "arizona" > result expected should be > ["arizona"] >Reporter: Touopi Touopi >Priority: Major > > when we selecting a node of a json object that is array, > When the array contains One element , the get_json_object return a String > with " characters instead of an array of One element. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31686) Return of String instead of array in function get_json_object
[ https://issues.apache.org/jira/browse/SPARK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107344#comment-17107344 ] Touopi Touopi commented on SPARK-31686: --- I don't really understand the purpose to change the return type. {code:sql} select v1.brandedcustomernumber as brandedcustomernumber from uniquecustomer.UniqueCustomer lateral view explode(from_json(get_json_object(string(brandedCustomerInfoAggregate), '$.brandedCustomers[*].customerNumber'), 'array')) v1 as brandedcustomernumber {code} Look this example, Since i am using the wilcard [*] it means that i can have 0..n elements returned. Lucky my brandedCustomerInfoAggregate object has more than one brandedCustomers elements so the result of the get_json_object function will be ["customer1","customer2"] for instance. So now the function explode is waiting an array,what will happens if in any case i have just one brandedCustomers filled ? the Object like String (actually i discover the " characters added on the chain) will be return liked this "customer1" an the function from_json will break. I am expecting that during the parsing and selection of node if we have [*] we should return an array. Actually when One element is returned for another query,i am converting to array and cast to string (from_json(cast(array(get_json_object(string(customer),'$.addresses[*].location')) as string),'array')) But the result are not good when more elements are returned > Return of String instead of array in function get_json_object > - > > Key: SPARK-31686 > URL: https://issues.apache.org/jira/browse/SPARK-31686 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.5 > Environment: {code:json} > // code placeholder > { > customer:{ > addesses:[ { {code} > location : arizona > } > ] > } > } > get_json_object(string(customer),'$addresses[*].location') > return "arizona" > result expected should be > ["arizona"] >Reporter: Touopi Touopi >Priority: Major > > when we selecting a node of a json object that is array, > When the array contains One element , the get_json_object return a String > with " characters instead of an array of One element. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17351) Refactor JDBCRDD to expose JDBC -> SparkSQL conversion functionality
[ https://issues.apache.org/jira/browse/SPARK-17351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107338#comment-17107338 ] Rafael commented on SPARK-17351: Hey guys, I know that it is a very old ticket but I encountered an issue related to these changes so let me ask my question it here. Now the code expects the for Decimal type we need to have in JDBC metadata precision and scale. [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414] I found out that in the OracleDB it is valid to have Decimal without these data. When I do a query read metadata for such column I'm getting DATA_PRECISION = Null, and DATA_SCALE = Null. Then when I run the `spark-sql` I'm getting such error: {code:java} java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 exceeds max precision 38 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407) {code} Do you have a work around how spark-sql can work with such cases? > Refactor JDBCRDD to expose JDBC -> SparkSQL conversion functionality > > > Key: SPARK-17351 > URL: https://issues.apache.org/jira/browse/SPARK-17351 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > Fix For: 2.1.0 > > > It would be useful if more of JDBCRDD's JDBC -> Spark SQL functionality was > usable from outside of JDBCRDD; this would make it easier to write test > harnesses comparing Spark output against other JDBC databases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31711) Register the executor source with the metrics system when running in local mode.
[ https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31711: Assignee: Apache Spark > Register the executor source with the metrics system when running in local > mode. > > > Key: SPARK-31711 > URL: https://issues.apache.org/jira/browse/SPARK-31711 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Luca Canali >Assignee: Apache Spark >Priority: Minor > > The Apache Spark metrics system provides many useful insights on the Spark > workload. In particular, the executor source metrics > (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) > provide detailed info, including the number of active tasks, some I/O > metrics, and task metrics details. Executor source metrics, contrary to other > sources (for example ExecutorMetrics source), are not yet available when > running in local mode. > This JIRA proposes to register the executor source with the Spark metrics > system when running in local mode, as this can be very useful when testing > and troubleshooting Spark workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31711) Register the executor source with the metrics system when running in local mode.
[ https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31711: Assignee: (was: Apache Spark) > Register the executor source with the metrics system when running in local > mode. > > > Key: SPARK-31711 > URL: https://issues.apache.org/jira/browse/SPARK-31711 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Luca Canali >Priority: Minor > > The Apache Spark metrics system provides many useful insights on the Spark > workload. In particular, the executor source metrics > (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) > provide detailed info, including the number of active tasks, some I/O > metrics, and task metrics details. Executor source metrics, contrary to other > sources (for example ExecutorMetrics source), are not yet available when > running in local mode. > This JIRA proposes to register the executor source with the Spark metrics > system when running in local mode, as this can be very useful when testing > and troubleshooting Spark workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31711) Register the executor source with the metrics system when running in local mode.
[ https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107336#comment-17107336 ] Apache Spark commented on SPARK-31711: -- User 'LucaCanali' has created a pull request for this issue: https://github.com/apache/spark/pull/28528 > Register the executor source with the metrics system when running in local > mode. > > > Key: SPARK-31711 > URL: https://issues.apache.org/jira/browse/SPARK-31711 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Luca Canali >Priority: Minor > > The Apache Spark metrics system provides many useful insights on the Spark > workload. In particular, the executor source metrics > (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) > provide detailed info, including the number of active tasks, some I/O > metrics, and task metrics details. Executor source metrics, contrary to other > sources (for example ExecutorMetrics source), are not yet available when > running in local mode. > This JIRA proposes to register the executor source with the Spark metrics > system when running in local mode, as this can be very useful when testing > and troubleshooting Spark workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31711) Register the executor source with the metrics system when running in local mode.
[ https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-31711: Description: The Apache Spark metrics system provides many useful insights on the Spark workload. In particular, the executor source metrics (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) provide detailed info, including the number of active tasks, some I/O metrics, and task metrics details. Executor source metrics, contrary to other sources (for example ExecutorMetrics source), are not yet available when running in local mode. This JIRA proposes to register the executor source with the Spark metrics system when running in local mode, as this can be very useful when testing and troubleshooting Spark workloads. was: The Apache Spark metrics system provides many useful insights on the Spark workload. In particular, the [executor source metrics](https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) provide detailed info, including number of active tasks, some I/O metrics, and task metrics details. Executor source, contrary to other sources (for example ExecutorMetrics source), are not yet available when running in local mode. This JIRA proposes to register the executor source with the Spark metrics system when running in local mode, as this can be very useful when testing and troubleshooting Spark workloads. > Register the executor source with the metrics system when running in local > mode. > > > Key: SPARK-31711 > URL: https://issues.apache.org/jira/browse/SPARK-31711 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Luca Canali >Priority: Minor > > The Apache Spark metrics system provides many useful insights on the Spark > workload. In particular, the executor source metrics > (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) > provide detailed info, including the number of active tasks, some I/O > metrics, and task metrics details. Executor source metrics, contrary to other > sources (for example ExecutorMetrics source), are not yet available when > running in local mode. > This JIRA proposes to register the executor source with the Spark metrics > system when running in local mode, as this can be very useful when testing > and troubleshooting Spark workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30973) ScriptTransformationExec should wait for the termination of process when scriptOutputReader hasNext return false
[ https://issues.apache.org/jira/browse/SPARK-30973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-30973: --- Assignee: Sun Ke > ScriptTransformationExec should wait for the termination of process when > scriptOutputReader hasNext return false > > > Key: SPARK-30973 > URL: https://issues.apache.org/jira/browse/SPARK-30973 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 2.4.5 >Reporter: Sun Ke >Assignee: Sun Ke >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30973) ScriptTransformationExec should wait for the termination of process when scriptOutputReader hasNext return false
[ https://issues.apache.org/jira/browse/SPARK-30973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30973. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 27724 [https://github.com/apache/spark/pull/27724] > ScriptTransformationExec should wait for the termination of process when > scriptOutputReader hasNext return false > > > Key: SPARK-30973 > URL: https://issues.apache.org/jira/browse/SPARK-30973 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 2.4.5 >Reporter: Sun Ke >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31711) Register the executor source with the metrics system when running in local mode.
Luca Canali created SPARK-31711: --- Summary: Register the executor source with the metrics system when running in local mode. Key: SPARK-31711 URL: https://issues.apache.org/jira/browse/SPARK-31711 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Luca Canali The Apache Spark metrics system provides many useful insights on the Spark workload. In particular, the [executor source metrics](https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor) provide detailed info, including number of active tasks, some I/O metrics, and task metrics details. Executor source, contrary to other sources (for example ExecutorMetrics source), are not yet available when running in local mode. This JIRA proposes to register the executor source with the Spark metrics system when running in local mode, as this can be very useful when testing and troubleshooting Spark workloads. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31338) Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for NOT NULL table definition of partition key.
[ https://issues.apache.org/jira/browse/SPARK-31338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107242#comment-17107242 ] Oleg Kuznetsov commented on SPARK-31338: [~minfa] query = “table where ...” Will generate “select * from table where ...” > Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for > NOT NULL table definition of partition key. > -- > > Key: SPARK-31338 > URL: https://issues.apache.org/jira/browse/SPARK-31338 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.5 >Reporter: Mohit Dave >Priority: Major > > h2. *Our Use-case Details:* > While reading from a jdbc source using spark sql, we are using below read > format : > jdbc(url: String, table: String, columnName: String, lowerBound: Long, > upperBound: Long, numPartitions: Int, connectionProperties: Properties). > *Table defination :* > postgres=> \d lineitem_sf1000 > Table "public.lineitem_sf1000" > Column | Type | Modifiers > -++-- > *l_orderkey | bigint | not null* > l_partkey | bigint | not null > l_suppkey | bigint | not null > l_linenumber | bigint | not null > l_quantity | numeric(10,2) | not null > l_extendedprice | numeric(10,2) | not null > l_discount | numeric(10,2) | not null > l_tax | numeric(10,2) | not null > l_returnflag | character varying(1) | not null > l_linestatus | character varying(1) | not null > l_shipdate | character varying(29) | not null > l_commitdate | character varying(29) | not null > l_receiptdate | character varying(29) | not null > l_shipinstruct | character varying(25) | not null > l_shipmode | character varying(10) | not null > l_comment | character varying(44) | not null > Indexes: > "l_order_sf1000_idx" btree (l_orderkey) > > *Partition column* : l_orderkey > *numpartion* : 16 > h2. *Problem details :* > > {code:java} > SELECT > "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag" > FROM (SELECT > l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment > FROM public.lineitem_sf1000) query_alias WHERE l_orderkey >= 150001 AND > l_orderkey < 187501 {code} > 15 queries are generated with the above BETWEEN clauses. The last query looks > like this below: > {code:java} > SELECT > "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag" > FROM (SELECT > l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment > FROM public.lineitem_sf1000) query_alias WHERE l_orderkey < 37501 or > l_orderkey is null {code} > I*n the last query, we are trying to get the remaining records, along with > any data in the table for the partition key having NULL values.* > This hurts performance badly. While the first 15 SQLs took approximately 10 > minutes to execute, the last SQL with the NULL check takes 45 minutes because > it has to evaluate a second scan(OR clause) of the table for NULL values for > the partition key. > *Note that I have defined the partition key of the table to be NOT NULL, at > the database. Therefore, the SQL for the last partition need not have this > NULL check, Spark SQl should be able to avoid such condition and this Jira is > intended to fix this behavior.* > {code:java} > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31338) Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for NOT NULL table definition of partition key.
[ https://issues.apache.org/jira/browse/SPARK-31338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107203#comment-17107203 ] Mohit Dave commented on SPARK-31338: [~olkuznsmith] queries get generated by Spark framework for read : jdbcRead = spark.read .option("fetchsize", fetchSize) .jdbc( url = s"${connectionURL}", table = s"${query}", columnName = s"${partKey}", lowerBound = lBound, upperBound = hBound, numPartitions = numParts, connectionProperties = connProps); So we dont have control over what query to execute, this Jira is raised to fix the way query get generated for last partition as I mentioned in description. > Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for > NOT NULL table definition of partition key. > -- > > Key: SPARK-31338 > URL: https://issues.apache.org/jira/browse/SPARK-31338 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.5 >Reporter: Mohit Dave >Priority: Major > > h2. *Our Use-case Details:* > While reading from a jdbc source using spark sql, we are using below read > format : > jdbc(url: String, table: String, columnName: String, lowerBound: Long, > upperBound: Long, numPartitions: Int, connectionProperties: Properties). > *Table defination :* > postgres=> \d lineitem_sf1000 > Table "public.lineitem_sf1000" > Column | Type | Modifiers > -++-- > *l_orderkey | bigint | not null* > l_partkey | bigint | not null > l_suppkey | bigint | not null > l_linenumber | bigint | not null > l_quantity | numeric(10,2) | not null > l_extendedprice | numeric(10,2) | not null > l_discount | numeric(10,2) | not null > l_tax | numeric(10,2) | not null > l_returnflag | character varying(1) | not null > l_linestatus | character varying(1) | not null > l_shipdate | character varying(29) | not null > l_commitdate | character varying(29) | not null > l_receiptdate | character varying(29) | not null > l_shipinstruct | character varying(25) | not null > l_shipmode | character varying(10) | not null > l_comment | character varying(44) | not null > Indexes: > "l_order_sf1000_idx" btree (l_orderkey) > > *Partition column* : l_orderkey > *numpartion* : 16 > h2. *Problem details :* > > {code:java} > SELECT > "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag" > FROM (SELECT > l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment > FROM public.lineitem_sf1000) query_alias WHERE l_orderkey >= 150001 AND > l_orderkey < 187501 {code} > 15 queries are generated with the above BETWEEN clauses. The last query looks > like this below: > {code:java} > SELECT > "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag" > FROM (SELECT > l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment > FROM public.lineitem_sf1000) query_alias WHERE l_orderkey < 37501 or > l_orderkey is null {code} > I*n the last query, we are trying to get the remaining records, along with > any data in the table for the partition key having NULL values.* > This hurts performance badly. While the first 15 SQLs took approximately 10 > minutes to execute, the last SQL with the NULL check takes 45 minutes because > it has to evaluate a second scan(OR clause) of the table for NULL values for > the partition key. > *Note that I have defined the partition key of the table to be NOT NULL, at > the database. Therefore, the SQL for the last partition need not have this > NULL check, Spark SQl should be able to avoid such condition and this Jira is > intended to fix this behavior.* > {code:java} > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31338) Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for NOT NULL table definition of partition key.
[ https://issues.apache.org/jira/browse/SPARK-31338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107198#comment-17107198 ] Mohit Dave commented on SPARK-31338: [~hyukjin.kwon] it can be reproduced at will with given details. Let me know what is missing in given info, I can help you to get the details. > Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for > NOT NULL table definition of partition key. > -- > > Key: SPARK-31338 > URL: https://issues.apache.org/jira/browse/SPARK-31338 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.5 >Reporter: Mohit Dave >Priority: Major > > h2. *Our Use-case Details:* > While reading from a jdbc source using spark sql, we are using below read > format : > jdbc(url: String, table: String, columnName: String, lowerBound: Long, > upperBound: Long, numPartitions: Int, connectionProperties: Properties). > *Table defination :* > postgres=> \d lineitem_sf1000 > Table "public.lineitem_sf1000" > Column | Type | Modifiers > -++-- > *l_orderkey | bigint | not null* > l_partkey | bigint | not null > l_suppkey | bigint | not null > l_linenumber | bigint | not null > l_quantity | numeric(10,2) | not null > l_extendedprice | numeric(10,2) | not null > l_discount | numeric(10,2) | not null > l_tax | numeric(10,2) | not null > l_returnflag | character varying(1) | not null > l_linestatus | character varying(1) | not null > l_shipdate | character varying(29) | not null > l_commitdate | character varying(29) | not null > l_receiptdate | character varying(29) | not null > l_shipinstruct | character varying(25) | not null > l_shipmode | character varying(10) | not null > l_comment | character varying(44) | not null > Indexes: > "l_order_sf1000_idx" btree (l_orderkey) > > *Partition column* : l_orderkey > *numpartion* : 16 > h2. *Problem details :* > > {code:java} > SELECT > "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag" > FROM (SELECT > l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment > FROM public.lineitem_sf1000) query_alias WHERE l_orderkey >= 150001 AND > l_orderkey < 187501 {code} > 15 queries are generated with the above BETWEEN clauses. The last query looks > like this below: > {code:java} > SELECT > "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag" > FROM (SELECT > l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment > FROM public.lineitem_sf1000) query_alias WHERE l_orderkey < 37501 or > l_orderkey is null {code} > I*n the last query, we are trying to get the remaining records, along with > any data in the table for the partition key having NULL values.* > This hurts performance badly. While the first 15 SQLs took approximately 10 > minutes to execute, the last SQL with the NULL check takes 45 minutes because > it has to evaluate a second scan(OR clause) of the table for NULL values for > the partition key. > *Note that I have defined the partition key of the table to be NOT NULL, at > the database. Therefore, the SQL for the last partition need not have this > NULL check, Spark SQl should be able to avoid such condition and this Jira is > intended to fix this behavior.* > {code:java} > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31710) result is the not the same when query and execute jobs
[ https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] philipse updated SPARK-31710: - Description: Hi Team Steps to reproduce. {code:java} create table test(id bigint); insert into test select 1586318188000; create table test1(id bigint) partitioned by (year string); insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) from test; {code} let's check the result. Case 1: *select * from test1;* 234 | 52238-06-04 13:06:400.0 --the result is wrong Case 2: *select 234,cast(id as TIMESTAMP) from test;* java.lang.IllegalArgumentException: Timestamp format must be -mm-dd hh:mm:ss[.f] at java.sql.Timestamp.valueOf(Timestamp.java:237) at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441) at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421) at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530) at org.apache.hive.beeline.Rows$Row.(Rows.java:166) at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43) at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756) at org.apache.hive.beeline.Commands.execute(Commands.java:826) at org.apache.hive.beeline.Commands.sql(Commands.java:670) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:226) at org.apache.hadoop.util.RunJar.main(RunJar.java:141) Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0) I try hive,it works well,and the convert is fine and correct {code:java} select 234,cast(id as TIMESTAMP) from test; 234 2020-04-08 11:56:28 {code} Two questions: q1: if we forbid this convert,should we keep all cases the same? q2: if we allow the convert in some cases, should we decide the long length, for the code seems to force to convert to ns with times*100 nomatter how long the data is,if it convert to timestamp with incorrect length, we can raise the error. {code:java} // // converting seconds to us private[this] def longToTimestamp(t: Long): Long = t * 100L{code} Thanks! was: Hi Team Steps to reproduce. {code:java} create table test(id bigint); insert into test select 1586318188000; create table test1(id bigint) partitioned by (year string); insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) from test; {code} let's check the result. Case 1: *select * from test1;* 234 | 52238-06-04 13:06:400.0 Case 2: *select 234,cast(id as TIMESTAMP) from test;* java.lang.IllegalArgumentException: Timestamp format must be -mm-dd hh:mm:ss[.f] at java.sql.Timestamp.valueOf(Timestamp.java:237) at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441) at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421) at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530) at org.apache.hive.beeline.Rows$Row.(Rows.java:166) at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43) at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756) at org.apache.hive.beeline.Commands.execute(Commands.java:826) at org.apache.hive.beeline.Commands.sql(Commands.java:670) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:226) at org.apache.hadoop.util.RunJar.main(RunJar.java:141) Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0) I try hive,it works well,and the convert is correct Two questions: q1: if we forbid this convert,should we keep all cases the same? q2: if we allow the convert in some cases, should we decide the long length, for the code seems to force to convert to ns with times*100 nomatter how long the data is,if it convert to timestamp with
[jira] [Created] (SPARK-31710) result is the not the same when query and execute jobs
philipse created SPARK-31710: Summary: result is the not the same when query and execute jobs Key: SPARK-31710 URL: https://issues.apache.org/jira/browse/SPARK-31710 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.5 Environment: hdp:2.7.7 spark:2.4.5 Reporter: philipse Hi Team Steps to reproduce. {code:java} create table test(id bigint); insert into test select 1586318188000; create table test1(id bigint) partitioned by (year string); insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) from test; {code} let's check the result. Case 1: *select * from test1;* 234 | 52238-06-04 13:06:400.0 Case 2: *select 234,cast(id as TIMESTAMP) from test;* java.lang.IllegalArgumentException: Timestamp format must be -mm-dd hh:mm:ss[.f] at java.sql.Timestamp.valueOf(Timestamp.java:237) at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441) at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421) at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530) at org.apache.hive.beeline.Rows$Row.(Rows.java:166) at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43) at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756) at org.apache.hive.beeline.Commands.execute(Commands.java:826) at org.apache.hive.beeline.Commands.sql(Commands.java:670) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:226) at org.apache.hadoop.util.RunJar.main(RunJar.java:141) Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0) I try hive,it works well,and the convert is correct Two questions: q1: if we forbid this convert,should we keep all cases the same? q2: if we allow the convert in some cases, should we decide the long length, for the code seems to force to convert to ns with times*100 nomatter how long the data is,if it convert to timestamp with incorrect length, we can raise the error. {code:java} // // converting seconds to us private[this] def longToTimestamp(t: Long): Long = t * 100L{code} Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31709) Proper base path for location when it is a relative path
[ https://issues.apache.org/jira/browse/SPARK-31709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107144#comment-17107144 ] Apache Spark commented on SPARK-31709: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/28527 > Proper base path for location when it is a relative path > > > Key: SPARK-31709 > URL: https://issues.apache.org/jira/browse/SPARK-31709 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > Currently, the user home directory is used as the base path for the database > and table locations when their location is specified with a relative path, > e.g. > {code:sql} > > set spark.sql.warehouse.dir; > spark.sql.warehouse.dir > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/ > spark-sql> create database loctest location 'loctestdbdir'; > spark-sql> desc database loctest; > Database Name loctest > Comment > Location > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir > Owner kentyao > spark-sql> create table loctest(id int) location 'loctestdbdir'; > spark-sql> desc formatted loctest; > idint NULL > # Detailed Table Information > Database default > Table loctest > Owner kentyao > Created Time Thu May 14 16:29:05 CST 2020 > Last Access UNKNOWN > Created BySpark 3.1.0-SNAPSHOT > Type EXTERNAL > Provider parquet > Location > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir > Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe > InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat > {code} > The user home is not always warehouse-related, unchangeable in runtime, and > shared both by database and table as the parent directory. Meanwhile, we use > the table path as the parent directory for relative partition locations. > the config `spark.sql.warehouse.dir` represents the default location for > managed databases and tables. For databases, the case above seems not to > follow its semantics. For tables it is right but here I suggest enriching its > meaning that is also for external tables with relative paths for locations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31709) Proper base path for location when it is a relative path
[ https://issues.apache.org/jira/browse/SPARK-31709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31709: Assignee: Apache Spark > Proper base path for location when it is a relative path > > > Key: SPARK-31709 > URL: https://issues.apache.org/jira/browse/SPARK-31709 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > Currently, the user home directory is used as the base path for the database > and table locations when their location is specified with a relative path, > e.g. > {code:sql} > > set spark.sql.warehouse.dir; > spark.sql.warehouse.dir > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/ > spark-sql> create database loctest location 'loctestdbdir'; > spark-sql> desc database loctest; > Database Name loctest > Comment > Location > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir > Owner kentyao > spark-sql> create table loctest(id int) location 'loctestdbdir'; > spark-sql> desc formatted loctest; > idint NULL > # Detailed Table Information > Database default > Table loctest > Owner kentyao > Created Time Thu May 14 16:29:05 CST 2020 > Last Access UNKNOWN > Created BySpark 3.1.0-SNAPSHOT > Type EXTERNAL > Provider parquet > Location > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir > Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe > InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat > {code} > The user home is not always warehouse-related, unchangeable in runtime, and > shared both by database and table as the parent directory. Meanwhile, we use > the table path as the parent directory for relative partition locations. > the config `spark.sql.warehouse.dir` represents the default location for > managed databases and tables. For databases, the case above seems not to > follow its semantics. For tables it is right but here I suggest enriching its > meaning that is also for external tables with relative paths for locations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31709) Proper base path for location when it is a relative path
[ https://issues.apache.org/jira/browse/SPARK-31709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31709: Assignee: (was: Apache Spark) > Proper base path for location when it is a relative path > > > Key: SPARK-31709 > URL: https://issues.apache.org/jira/browse/SPARK-31709 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Major > > Currently, the user home directory is used as the base path for the database > and table locations when their location is specified with a relative path, > e.g. > {code:sql} > > set spark.sql.warehouse.dir; > spark.sql.warehouse.dir > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/ > spark-sql> create database loctest location 'loctestdbdir'; > spark-sql> desc database loctest; > Database Name loctest > Comment > Location > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir > Owner kentyao > spark-sql> create table loctest(id int) location 'loctestdbdir'; > spark-sql> desc formatted loctest; > idint NULL > # Detailed Table Information > Database default > Table loctest > Owner kentyao > Created Time Thu May 14 16:29:05 CST 2020 > Last Access UNKNOWN > Created BySpark 3.1.0-SNAPSHOT > Type EXTERNAL > Provider parquet > Location > file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir > Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe > InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat > {code} > The user home is not always warehouse-related, unchangeable in runtime, and > shared both by database and table as the parent directory. Meanwhile, we use > the table path as the parent directory for relative partition locations. > the config `spark.sql.warehouse.dir` represents the default location for > managed databases and tables. For databases, the case above seems not to > follow its semantics. For tables it is right but here I suggest enriching its > meaning that is also for external tables with relative paths for locations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31709) Proper base path for location when it is a relative path
Kent Yao created SPARK-31709: Summary: Proper base path for location when it is a relative path Key: SPARK-31709 URL: https://issues.apache.org/jira/browse/SPARK-31709 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.5, 3.0.0, 3.1.0 Reporter: Kent Yao Currently, the user home directory is used as the base path for the database and table locations when their location is specified with a relative path, e.g. {code:sql} > set spark.sql.warehouse.dir; spark.sql.warehouse.dir file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/ spark-sql> create database loctest location 'loctestdbdir'; spark-sql> desc database loctest; Database Name loctest Comment Location file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir Owner kentyao spark-sql> create table loctest(id int) location 'loctestdbdir'; spark-sql> desc formatted loctest; id int NULL # Detailed Table Information Databasedefault Table loctest Owner kentyao Created TimeThu May 14 16:29:05 CST 2020 Last Access UNKNOWN Created By Spark 3.1.0-SNAPSHOT TypeEXTERNAL Providerparquet Location file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormatorg.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat {code} The user home is not always warehouse-related, unchangeable in runtime, and shared both by database and table as the parent directory. Meanwhile, we use the table path as the parent directory for relative partition locations. the config `spark.sql.warehouse.dir` represents the default location for managed databases and tables. For databases, the case above seems not to follow its semantics. For tables it is right but here I suggest enriching its meaning that is also for external tables with relative paths for locations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29436) Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario.
[ https://issues.apache.org/jira/browse/SPARK-29436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-29436: Issue Type: Improvement (was: New Feature) > Support executor for selecting scheduler through scheduler name in the case > of k8s multi-scheduler scenario. > > > Key: SPARK-29436 > URL: https://issues.apache.org/jira/browse/SPARK-29436 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: merrily01 >Assignee: merrily01 >Priority: Minor > Fix For: 3.0.0 > > > In the case of k8s multi-scheduler, support executor for selecting scheduler > through scheduler name. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25876) Simplify configuration types in k8s backend
[ https://issues.apache.org/jira/browse/SPARK-25876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-25876: --- Assignee: Marcelo Masiero Vanzin > Simplify configuration types in k8s backend > --- > > Key: SPARK-25876 > URL: https://issues.apache.org/jira/browse/SPARK-25876 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Marcelo Masiero Vanzin >Assignee: Marcelo Masiero Vanzin >Priority: Major > Fix For: 3.0.0 > > > This is a child of SPARK-25874 to deal with the current issues with the > different configuration objects used in the k8s backend. Please refer to the > parent for further discussion of what this means. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31405) fail by default when read/write datetime values and not sure if they need rebase or not
[ https://issues.apache.org/jira/browse/SPARK-31405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106958#comment-17106958 ] Apache Spark commented on SPARK-31405: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/28526 > fail by default when read/write datetime values and not sure if they need > rebase or not > --- > > Key: SPARK-31405 > URL: https://issues.apache.org/jira/browse/SPARK-31405 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory
[ https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31692. --- Fix Version/s: 3.0.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/28516 > Hadoop confs passed via spark config are not set in URLStream Handler Factory > - > > Key: SPARK-31692 > URL: https://issues.apache.org/jira/browse/SPARK-31692 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karuppayya >Priority: Major > Fix For: 3.0.0 > > > Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in > URLStreamHandlerFactory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory
[ https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31692: - Assignee: Karuppayya > Hadoop confs passed via spark config are not set in URLStream Handler Factory > - > > Key: SPARK-31692 > URL: https://issues.apache.org/jira/browse/SPARK-31692 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Karuppayya >Assignee: Karuppayya >Priority: Major > Fix For: 3.0.0 > > > Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in > URLStreamHandlerFactory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org