date:20200514

[jira] [Updated] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31716:
--
Description: 
Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs 
except JDK11 Jenkins jobs which don't have old Spark releases supporting JDK11.

{code}
HiveExternalCatalogVersionsSuite:
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
  Exception encountered when invoking run on a nested suite - Fail to get the 
lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180)
{code}

  was:
Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs 
except JDK11 Jenkins jobs which don't have old Spark releases supporting JDK11.

{code}
HiveExternalCatalogVersionsSuite:
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
  Exception encountered when invoking run on a nested suite - Fail to get the 
lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180)
{code]


> Use a fallback version in HiveExternalCatalogVersionsSuite
> --
>
> Key: SPARK-31716
> URL: https://issues.apache.org/jira/browse/SPARK-31716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs 
> except JDK11 Jenkins jobs which don't have old Spark releases supporting 
> JDK11.
> {code}
> HiveExternalCatalogVersionsSuite:
> org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
>   Exception encountered when invoking run on a nested suite - Fail to get the 
> lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31716:
--
Description: 
Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs 
except JDK11 Jenkins jobs which don't have old Spark releases supporting JDK11.

{code}
HiveExternalCatalogVersionsSuite:
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
  Exception encountered when invoking run on a nested suite - Fail to get the 
lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180)
{code]

> Use a fallback version in HiveExternalCatalogVersionsSuite
> --
>
> Key: SPARK-31716
> URL: https://issues.apache.org/jira/browse/SPARK-31716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Currently, HiveExternalCatalogVersionsSuite is aborted in all Jenkins jobs 
> except JDK11 Jenkins jobs which don't have old Spark releases supporting 
> JDK11.
> {code}
> HiveExternalCatalogVersionsSuite:
> org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
>   Exception encountered when invoking run on a nested suite - Fail to get the 
> lates Spark versions to test. (HiveExternalCatalogVersionsSuite.scala:180)
> {code]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31717) Remove a fallback version of HiveExternalCatalogVersionsSuite

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31717:
-

Assignee: Dongjoon Hyun

> Remove a fallback version of HiveExternalCatalogVersionsSuite
> -
>
> Key: SPARK-31717
> URL: https://issues.apache.org/jira/browse/SPARK-31717
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> After we verify that there is not network issue, this issue aims to decide if 
> we will revert SPARK-31716 or not.
> We may find another robust way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31717) Remove a fallback version of HiveExternalCatalogVersionsSuite

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31717:
--
Description: 
After we verify that there is not network issue, this issue aims to decide if 
we will revert SPARK-31716 or not.
We may find another robust way.

  was:After we verify that there is not network issue, this issue aims to 
revert SPARK-31716.


> Remove a fallback version of HiveExternalCatalogVersionsSuite
> -
>
> Key: SPARK-31717
> URL: https://issues.apache.org/jira/browse/SPARK-31717
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> After we verify that there is not network issue, this issue aims to decide if 
> we will revert SPARK-31716 or not.
> We may find another robust way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31717) Remove a fallback version of HiveExternalCatalogVersionsSuite

2020-05-14 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-31717:
-

 Summary: Remove a fallback version of 
HiveExternalCatalogVersionsSuite
 Key: SPARK-31717
 URL: https://issues.apache.org/jira/browse/SPARK-31717
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 2.4.6, 3.0.0, 3.1.0
Reporter: Dongjoon Hyun


After we verify that there is not network issue, this issue aims to revert 
SPARK-31716.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31716:


Assignee: (was: Apache Spark)

> Use a fallback version in HiveExternalCatalogVersionsSuite
> --
>
> Key: SPARK-31716
> URL: https://issues.apache.org/jira/browse/SPARK-31716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107963#comment-17107963
 ] 

Apache Spark commented on SPARK-31716:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28536

> Use a fallback version in HiveExternalCatalogVersionsSuite
> --
>
> Key: SPARK-31716
> URL: https://issues.apache.org/jira/browse/SPARK-31716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31716:


Assignee: Apache Spark

> Use a fallback version in HiveExternalCatalogVersionsSuite
> --
>
> Key: SPARK-31716
> URL: https://issues.apache.org/jira/browse/SPARK-31716
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31716) Use a fallback version in HiveExternalCatalogVersionsSuite

2020-05-14 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-31716:
-

 Summary: Use a fallback version in HiveExternalCatalogVersionsSuite
 Key: SPARK-31716
 URL: https://issues.apache.org/jira/browse/SPARK-31716
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 2.4.6, 3.0.0, 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01

2020-05-14 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31712.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28531
[https://github.com/apache/spark/pull/28531]

> Check casting timestamps to byte/short/int/long before 1970-01-01
> -
>
> Key: SPARK-31712
> URL: https://issues.apache.org/jira/browse/SPARK-31712
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> There are tests for casting timestamps to byte/short/int/long after the epoch 
> 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but 
> there are no test for "negative" timestamps before the epoch. The ticket aims 
> to add such tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01

2020-05-14 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31712:
---

Assignee: Maxim Gekk

> Check casting timestamps to byte/short/int/long before 1970-01-01
> -
>
> Key: SPARK-31712
> URL: https://issues.apache.org/jira/browse/SPARK-31712
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> There are tests for casting timestamps to byte/short/int/long after the epoch 
> 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but 
> there are no test for "negative" timestamps before the epoch. The ticket aims 
> to add such tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31715:


Assignee: Apache Spark

> Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance 
> standard
> ---
>
> Key: SPARK-31715
> URL: https://issues.apache.org/jira/browse/SPARK-31715
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> Caused by: sbt.ForkMain$ForkError: 
> org.apache.derby.iapi.error.StandardException: Another instance of Derby may 
> have already booted the database 
> /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db.
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown
>  Source)
>   at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown 
> Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown
>  Source)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
>   at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown
>  Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
>  Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
>  Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown 
> Source)
>   ... 138 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31715:


Assignee: (was: Apache Spark)

> Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance 
> standard
> ---
>
> Key: SPARK-31715
> URL: https://issues.apache.org/jira/browse/SPARK-31715
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> Caused by: sbt.ForkMain$ForkError: 
> org.apache.derby.iapi.error.StandardException: Another instance of Derby may 
> have already booted the database 
> /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db.
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown
>  Source)
>   at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown 
> Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown
>  Source)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
>   at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown
>  Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
>  Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
>  Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown 
> Source)
>   ... 138 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107907#comment-17107907
 ] 

Apache Spark commented on SPARK-31715:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28537

> Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance 
> standard
> ---
>
> Key: SPARK-31715
> URL: https://issues.apache.org/jira/browse/SPARK-31715
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> Caused by: sbt.ForkMain$ForkError: 
> org.apache.derby.iapi.error.StandardException: Another instance of Derby may 
> have already booted the database 
> /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db.
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown
>  Source)
>   at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown 
> Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown
>  Source)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
>   at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown
>  Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
>  Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
>  Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown 
> Source)
>   ... 138 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm

2020-05-14 Thread zhengruifeng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107905#comment-17107905
 ] 

zhengruifeng commented on SPARK-31714:
--

test code:
{code:java}
test("performance: gemv vs dot") {
  for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 
1024, 4096)) {
val rng = new Random(123)
val matrix = Matrices.dense(numRows, numCols,
  Array.fill(numRows * numCols)(rng.nextDouble)).toDense
val vectors = matrix.rowIter.toArray
val vector = Vectors.dense(Array.fill(numCols)(rng.nextDouble))

val start1 = System.nanoTime
Seq.range(0, 100).foreach { _ => matrix.multiply(vector) }
val dur1 = System.nanoTime - start1

val start2 = System.nanoTime
Seq.range(0, 100).foreach { _ => vectors.map(vector.dot) }
val dur2 = System.nanoTime - start2

println(s"numRows=$numRows, numCols=$numCols, gemv: $dur1, dot: $dur2, " +
  s"dot/gemv: ${dur2.toDouble / dur1}")
  }
}

test("performance: gemv vs foreachNonZero") {
  for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 
1024, 4096)) {
val rng = new Random(123)
val matrix = Matrices.dense(numRows, numCols,
  Array.fill(numRows * numCols)(rng.nextDouble)).toDense
val vectors = matrix.rowIter.toArray
val coefVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble))
val coefArr = coefVec.toArray

val start1 = System.nanoTime
Seq.range(0, 100).foreach { _ => matrix.multiply(coefVec) }
val dur1 = System.nanoTime - start1

val start2 = System.nanoTime
Seq.range(0, 100).foreach { _ =>
  vectors.map { vector =>
var sum = 0.0
vector.foreachNonZero((i, v) => sum += coefArr(i) * v)
sum
  }
}
val dur2 = System.nanoTime - start2

println(s"numRows=$numRows, numCols=$numCols, gemv: $dur1, foreachNonZero: 
$dur2, " +
  s"foreachNonZero/gemv: ${dur2.toDouble / dur1}")
  }
}

test("performance: gemv vs foreachNonZero(std)") {
  for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 
1024, 4096)) {
val rng = new Random(123)
val matrix = Matrices.dense(numRows, numCols,
  Array.fill(numRows * numCols)(rng.nextDouble)).toDense
val vectors = matrix.rowIter.toArray
val coefVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble))
val coefArr = coefVec.toArray
val stdVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble))
val stdArr = stdVec.toArray

val start1 = System.nanoTime
Seq.range(0, 100).foreach { _ => matrix.multiply(coefVec) }
val dur1 = System.nanoTime - start1

val start2 = System.nanoTime
Seq.range(0, 100).foreach { _ =>
  vectors.map { vector =>
var sum = 0.0
vector.foreachNonZero { (i, v) =>
  val std = stdArr(i)
  if (std != 0) sum += coefArr(i) * v
}
sum
  }
}
val dur2 = System.nanoTime - start2

println(s"numRows=$numRows, numCols=$numCols, gemv: $dur1, 
foreachNonZero(std): $dur2, " +
  s"foreachNonZero(std)/gemv: ${dur2.toDouble / dur1}")
  }
}

test("performance: gemv vs while") {
  for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 
1024, 4096)) {
val rng = new Random(123)
val matrix = Matrices.dense(numRows, numCols,
  Array.fill(numRows * numCols)(rng.nextDouble)).toDense
val vectors = matrix.rowIter.toArray
val coefVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble))
val coefArr = coefVec.toArray

val start1 = System.nanoTime
Seq.range(0, 100).foreach { _ => matrix.multiply(coefVec) }
val dur1 = System.nanoTime - start1

val start2 = System.nanoTime
Seq.range(0, 100).foreach { _ =>
  vectors.map {
case DenseVector(values) =>
  var sum = 0.0
  var i = 0
  while (i < values.length) {
sum += values(i) * coefArr(i)
i += 1
  }
  sum
  }
}
val dur2 = System.nanoTime - start2

println(s"numRows=$numRows, numCols=$numCols, gemv: $dur1, while: $dur2, " +
  s"while/gemv: ${dur2.toDouble / dur1}")
  }
}

test("performance: gemv vs while(std)") {
  for (numRows <- Seq(16, 64, 256, 1024, 4096); numCols <- Seq(16, 64, 256, 
1024, 4096)) {
val rng = new Random(123)
val matrix = Matrices.dense(numRows, numCols,
  Array.fill(numRows * numCols)(rng.nextDouble)).toDense
val vectors = matrix.rowIter.toArray
val coefVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble))
val coefArr = coefVec.toArray
val stdVec = Vectors.dense(Array.fill(numCols)(rng.nextDouble))
val stdArr = stdVec.toArray

val start1 = System.nanoTime
Seq.range(0, 100).foreach { _ => matrix.multiply(coefVec) }
val dur1 = System.nanoTime - start1

val start2 = System.nanoTime
Seq.range(0, 100).foreach { _ =>
  vectors.map {
case DenseVector(values) =>
  var sum =

[jira] [Updated] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm

2020-05-14 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-31714:
-
Attachment: blas-perf

> Performance test on java vectorization vs dot vs gemv vs gemm
> -
>
> Key: SPARK-31714
> URL: https://issues.apache.org/jira/browse/SPARK-31714
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Attachments: BLASSuite.scala, blas-perf
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm

2020-05-14 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-31714:
-
Attachment: BLASSuite.scala

> Performance test on java vectorization vs dot vs gemv vs gemm
> -
>
> Key: SPARK-31714
> URL: https://issues.apache.org/jira/browse/SPARK-31714
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Attachments: BLASSuite.scala, blas-perf
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance standard

2020-05-14 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-31715:
-
Summary: Fix flaky SparkSQLEnvSuite that sometimes varies single derby 
instance standard  (was: Fix flaky SparkSQLEnvSuite that sometimes via single 
derby instance standard)

> Fix flaky SparkSQLEnvSuite that sometimes varies single derby instance 
> standard
> ---
>
> Key: SPARK-31715
> URL: https://issues.apache.org/jira/browse/SPARK-31715
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> Caused by: sbt.ForkMain$ForkError: 
> org.apache.derby.iapi.error.StandardException: Another instance of Derby may 
> have already booted the database 
> /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db.
>   at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown
>  Source)
>   at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown 
> Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown
>  Source)
>   at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>   at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
>   at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
>   at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>   at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown
>  Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
>  Source)
>   at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
>  Source)
>   at 
> org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown 
> Source)
>   ... 138 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31715) Fix flaky SparkSQLEnvSuite that sometimes via single derby instance standard

2020-05-14 Thread Kent Yao (Jira)

Kent Yao created SPARK-31715:


 Summary: Fix flaky SparkSQLEnvSuite that sometimes via single 
derby instance standard
 Key: SPARK-31715
 URL: https://issues.apache.org/jira/browse/SPARK-31715
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.0.0, 3.1.0
Reporter: Kent Yao



{code:java}
Caused by: sbt.ForkMain$ForkError: 
org.apache.derby.iapi.error.StandardException: Another instance of Derby may 
have already booted the database 
/home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/metastore_db.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown
 Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown 
Source)
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown
 Source)
at 
org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown 
Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
 Source)
at 
org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
 Source)
at 
org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown 
Source)
... 138 more
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm

2020-05-14 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-31714:


Assignee: zhengruifeng

> Performance test on java vectorization vs dot vs gemv vs gemm
> -
>
> Key: SPARK-31714
> URL: https://issues.apache.org/jira/browse/SPARK-31714
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31714) Performance test on java vectorization vs dot vs gemv vs gemm

2020-05-14 Thread zhengruifeng (Jira)

zhengruifeng created SPARK-31714:


 Summary: Performance test on java vectorization vs dot vs gemv vs 
gemm
 Key: SPARK-31714
 URL: https://issues.apache.org/jira/browse/SPARK-31714
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 3.1.0
Reporter: zhengruifeng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31713) Make test-dependencies.sh detect version string correctly

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31713.
---
Fix Version/s: 3.0.0
   2.4.6
   Resolution: Fixed

Issue resolved by pull request 28532
[https://github.com/apache/spark/pull/28532]

> Make test-dependencies.sh detect version string correctly
> -
>
> Key: SPARK-31713
> URL: https://issues.apache.org/jira/browse/SPARK-31713
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.4.6, 3.0.0
>
>
> Currently, SBT jobs are broken like the following.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
> {code}
> [error] running 
> /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
>  ; received return code 1
> Build step 'Execute shell' marked build as failure
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31713) Make test-dependencies.sh detect version string correctly

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31713:
-

Assignee: Dongjoon Hyun

> Make test-dependencies.sh detect version string correctly
> -
>
> Key: SPARK-31713
> URL: https://issues.apache.org/jira/browse/SPARK-31713
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, SBT jobs are broken like the following.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
> {code}
> [error] running 
> /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
>  ; received return code 1
> Build step 'Execute shell' marked build as failure
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-14 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107856#comment-17107856
 ] 

Dongjoon Hyun edited comment on SPARK-31693 at 5/15/20, 2:24 AM:
-

Thank you, [~shaneknapp]. 

Although this is another issue, `amp-jenkins-worker-05` has a corrupted Maven 
local repo and fails consistently.
{code}
Using `mvn` from path: 
/home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.3/bin/mvn
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on 
project spark-parent_2.12: ArtifactInstallerException: Failed to install 
metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: Could not parse 
metadata 
/home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml:
 in epilog non whitespace content is not allowed but got > (position: END_TAG 
seen ...\n>... @13:2) -> [Help 1]
{code}

Could you nuke the local Maven repository directory in this machine. And maybe 
the other machine which fails consistently, too.


was (Author: dongjoon):
Thank you, [~shaneknapp]. 

Although this is another issue, `amp-jenkins-worker-05` has a corrupted Maven 
local repo and fails consistently.
{code}
Using `mvn` from path: 
/home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.3/bin/mvn
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on 
project spark-parent_2.12: ArtifactInstallerException: Failed to install 
metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: Could not parse 
metadata 
/home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml:
 in epilog non whitespace content is not allowed but got > (position: END_TAG 
seen ...\n>... @13:2) -> [Help 1]
{code}

Could you nuke all local Maven repository directory in Spark workers?

> Investigate AmpLab Jenkins server network issue
> ---
>
> Key: SPARK-31693
> URL: https://issues.apache.org/jira/browse/SPARK-31693
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> Given the series of failures in Spark packaging Jenkins job, it seems that 
> there is a network issue in AmbLab Jenkins cluster.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay.
> - The node failed to download the maven mirror. (SPARK-31691) -> The primary 
> host is okay.
> - The node failed to communicate repository.apache.org. (Current master 
> branch Jenkins job failure)
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
> on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
> remote metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could 
> not transfer metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Transfer 
> failed for 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
>  Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
> failed: Connection timed out (Connection timed out) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-14 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107856#comment-17107856
 ] 

Dongjoon Hyun commented on SPARK-31693:
---

Thank you, [~shaneknapp]. 

Although this is another issue, `amp-jenkins-worker-05` has a corrupted Maven 
local repo and fails consistently.
{code}
Using `mvn` from path: 
/home/jenkins/workspace/SparkPullRequestBuilder/build/apache-maven-3.6.3/bin/mvn
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on 
project spark-parent_2.12: ArtifactInstallerException: Failed to install 
metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: Could not parse 
metadata 
/home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml:
 in epilog non whitespace content is not allowed but got > (position: END_TAG 
seen ...\n>... @13:2) -> [Help 1]
{code}

Could you nuke all local Maven repository directory in Spark workers?

> Investigate AmpLab Jenkins server network issue
> ---
>
> Key: SPARK-31693
> URL: https://issues.apache.org/jira/browse/SPARK-31693
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> Given the series of failures in Spark packaging Jenkins job, it seems that 
> there is a network issue in AmbLab Jenkins cluster.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay.
> - The node failed to download the maven mirror. (SPARK-31691) -> The primary 
> host is okay.
> - The node failed to communicate repository.apache.org. (Current master 
> branch Jenkins job failure)
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
> on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
> remote metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could 
> not transfer metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Transfer 
> failed for 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
>  Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
> failed: Connection timed out (Connection timed out) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31710) result is the not the same when query and execute jobs

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31710:


Assignee: (was: Apache Spark)

> result is the not the same when query and execute jobs
> --
>
> Key: SPARK-31710
> URL: https://issues.apache.org/jira/browse/SPARK-31710
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: hdp:2.7.7
> spark:2.4.5
>Reporter: philipse
>Priority: Major
>
> Hi Team
> Steps to reproduce.
> {code:java}
> create table test(id bigint);
> insert into test select 1586318188000;
> create table test1(id bigint) partitioned by (year string);
> insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
> from test;
> {code}
> let's check the result. 
> Case 1:
> *select * from test1;*
> 234 | 52238-06-04 13:06:400.0
> --the result is wrong
> Case 2:
> *select 234,cast(id as TIMESTAMP) from test;*
>  
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
>  at java.sql.Timestamp.valueOf(Timestamp.java:237)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
>  at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
>  at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
>  at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
>  at org.apache.hive.beeline.Commands.execute(Commands.java:826)
>  at org.apache.hive.beeline.Commands.sql(Commands.java:670)
>  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
>  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
>  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
>  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
>  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
>  Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)
>  
> I try hive,it works well,and the convert is fine and correct
> {code:java}
> select 234,cast(id as TIMESTAMP) from test;
>  234   2020-04-08 11:56:28
> {code}
> Two questions:
> q1:
> if we forbid this convert,should we keep all cases the same?
> q2:
> if we allow the convert in some cases, should we decide the long length, for 
> the code seems to force to convert to ns with times*100 nomatter how long 
> the data is,if it convert to timestamp with incorrect length, we can raise 
> the error.
> {code:java}
> // // converting seconds to us
> private[this] def longToTimestamp(t: Long): Long = t * 100L{code}
>  
> Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31710) result is the not the same when query and execute jobs

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107823#comment-17107823
 ] 

Apache Spark commented on SPARK-31710:
--

User 'TJX2014' has created a pull request for this issue:
https://github.com/apache/spark/pull/28534

> result is the not the same when query and execute jobs
> --
>
> Key: SPARK-31710
> URL: https://issues.apache.org/jira/browse/SPARK-31710
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: hdp:2.7.7
> spark:2.4.5
>Reporter: philipse
>Priority: Major
>
> Hi Team
> Steps to reproduce.
> {code:java}
> create table test(id bigint);
> insert into test select 1586318188000;
> create table test1(id bigint) partitioned by (year string);
> insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
> from test;
> {code}
> let's check the result. 
> Case 1:
> *select * from test1;*
> 234 | 52238-06-04 13:06:400.0
> --the result is wrong
> Case 2:
> *select 234,cast(id as TIMESTAMP) from test;*
>  
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
>  at java.sql.Timestamp.valueOf(Timestamp.java:237)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
>  at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
>  at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
>  at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
>  at org.apache.hive.beeline.Commands.execute(Commands.java:826)
>  at org.apache.hive.beeline.Commands.sql(Commands.java:670)
>  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
>  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
>  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
>  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
>  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
>  Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)
>  
> I try hive,it works well,and the convert is fine and correct
> {code:java}
> select 234,cast(id as TIMESTAMP) from test;
>  234   2020-04-08 11:56:28
> {code}
> Two questions:
> q1:
> if we forbid this convert,should we keep all cases the same?
> q2:
> if we allow the convert in some cases, should we decide the long length, for 
> the code seems to force to convert to ns with times*100 nomatter how long 
> the data is,if it convert to timestamp with incorrect length, we can raise 
> the error.
> {code:java}
> // // converting seconds to us
> private[this] def longToTimestamp(t: Long): Long = t * 100L{code}
>  
> Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31710) result is the not the same when query and execute jobs

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31710:


Assignee: Apache Spark

> result is the not the same when query and execute jobs
> --
>
> Key: SPARK-31710
> URL: https://issues.apache.org/jira/browse/SPARK-31710
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: hdp:2.7.7
> spark:2.4.5
>Reporter: philipse
>Assignee: Apache Spark
>Priority: Major
>
> Hi Team
> Steps to reproduce.
> {code:java}
> create table test(id bigint);
> insert into test select 1586318188000;
> create table test1(id bigint) partitioned by (year string);
> insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
> from test;
> {code}
> let's check the result. 
> Case 1:
> *select * from test1;*
> 234 | 52238-06-04 13:06:400.0
> --the result is wrong
> Case 2:
> *select 234,cast(id as TIMESTAMP) from test;*
>  
> java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
> hh:mm:ss[.f]
>  at java.sql.Timestamp.valueOf(Timestamp.java:237)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
>  at 
> org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
>  at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
>  at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
>  at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
>  at org.apache.hive.beeline.Commands.execute(Commands.java:826)
>  at org.apache.hive.beeline.Commands.sql(Commands.java:670)
>  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
>  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
>  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
>  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
>  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
>  Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)
>  
> I try hive,it works well,and the convert is fine and correct
> {code:java}
> select 234,cast(id as TIMESTAMP) from test;
>  234   2020-04-08 11:56:28
> {code}
> Two questions:
> q1:
> if we forbid this convert,should we keep all cases the same?
> q2:
> if we allow the convert in some cases, should we decide the long length, for 
> the code seems to force to convert to ns with times*100 nomatter how long 
> the data is,if it convert to timestamp with incorrect length, we can raise 
> the error.
> {code:java}
> // // converting seconds to us
> private[this] def longToTimestamp(t: Long): Long = t * 100L{code}
>  
> Thanks!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29358) Make unionByName optionally fill missing columns with nulls

2020-05-14 Thread Andreas Neumann (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107775#comment-17107775
 ] 

Andreas Neumann commented on SPARK-29358:
-

I would like to put a vote for this feature
 * It makes life so much easier when you have multiple inputs with slightly 
varying schema, which is quite common for data that evolved over time.
 * The work-around described at the top where you explicitly add the missing 
columns is really cumbersome if the schema is large.
 * With the approach of an extra argument the compatibility concerns should be 
lifted. 

> Make unionByName optionally fill missing columns with nulls
> ---
>
> Key: SPARK-29358
> URL: https://issues.apache.org/jira/browse/SPARK-29358
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Mukul Murthy
>Priority: Major
>
> Currently, unionByName requires two DataFrames to have the same set of 
> columns (even though the order can be different). It would be good to add 
> either an option to unionByName or a new type of union which fills in missing 
> columns with nulls. 
> {code:java}
> val df1 = Seq(1, 2, 3).toDF("x")
> val df2 = Seq("a", "b", "c").toDF("y")
> df1.unionByName(df2){code}
> This currently throws 
> {code:java}
> org.apache.spark.sql.AnalysisException: Cannot resolve column name "x" among 
> (y);
> {code}
> Ideally, there would be a way to make this return a DataFrame containing:
> {code:java}
> +++ 
> | x| y| 
> +++ 
> | 1|null| 
> | 2|null| 
> | 3|null| 
> |null| a| 
> |null| b| 
> |null| c| 
> +++
> {code}
> Currently the workaround to make this possible is by using unionByName, but 
> this is clunky:
> {code:java}
> df1.withColumn("y", lit(null)).unionByName(df2.withColumn("x", lit(null)))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31713) Make test-dependencies.sh detect version string correctly

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107712#comment-17107712
 ] 

Apache Spark commented on SPARK-31713:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28532

> Make test-dependencies.sh detect version string correctly
> -
>
> Key: SPARK-31713
> URL: https://issues.apache.org/jira/browse/SPARK-31713
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Currently, SBT jobs are broken like the following.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
> {code}
> [error] running 
> /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
>  ; received return code 1
> Build step 'Execute shell' marked build as failure
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31713) Make test-dependencies.sh detect version string correctly

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31713:


Assignee: (was: Apache Spark)

> Make test-dependencies.sh detect version string correctly
> -
>
> Key: SPARK-31713
> URL: https://issues.apache.org/jira/browse/SPARK-31713
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Currently, SBT jobs are broken like the following.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
> {code}
> [error] running 
> /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
>  ; received return code 1
> Build step 'Execute shell' marked build as failure
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31713) Make test-dependencies.sh detect version string correctly

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107711#comment-17107711
 ] 

Apache Spark commented on SPARK-31713:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28532

> Make test-dependencies.sh detect version string correctly
> -
>
> Key: SPARK-31713
> URL: https://issues.apache.org/jira/browse/SPARK-31713
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Currently, SBT jobs are broken like the following.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
> {code}
> [error] running 
> /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
>  ; received return code 1
> Build step 'Execute shell' marked build as failure
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31713) Make test-dependencies.sh detect version string correctly

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31713:


Assignee: Apache Spark

> Make test-dependencies.sh detect version string correctly
> -
>
> Key: SPARK-31713
> URL: https://issues.apache.org/jira/browse/SPARK-31713
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> Currently, SBT jobs are broken like the following.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
> {code}
> [error] running 
> /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
>  ; received return code 1
> Build step 'Execute shell' marked build as failure
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31713) Make test-dependencies.sh detect version string correctly

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31713:
--
Summary: Make test-dependencies.sh detect version string correctly  (was: 
Make test-dependencies.sh detect version string only)

> Make test-dependencies.sh detect version string correctly
> -
>
> Key: SPARK-31713
> URL: https://issues.apache.org/jira/browse/SPARK-31713
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Currently, SBT jobs are broken like the following.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
> {code}
> [error] running 
> /home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
>  ; received return code 1
> Build step 'Execute shell' marked build as failure
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31713) Make test-dependencies.sh detect version string only

2020-05-14 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-31713:
-

 Summary: Make test-dependencies.sh detect version string only
 Key: SPARK-31713
 URL: https://issues.apache.org/jira/browse/SPARK-31713
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 2.4.6, 3.0.0
Reporter: Dongjoon Hyun


Currently, SBT jobs are broken like the following.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/476/console
{code}
[error] running 
/home/jenkins/workspace/spark-branch-3.0-test-sbt-hadoop-3.2-hive-2.3/dev/test-dependencies.sh
 ; received return code 1
Build step 'Execute shell' marked build as failure
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-14 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107641#comment-17107641
 ] 

Shane Knapp commented on SPARK-31693:
-

filed https://issues.apache.org/jira/browse/INFRA-20267

i don't think it's us.  i could be wrong as IANANE (i am not a network 
engineer).  :)

> Investigate AmpLab Jenkins server network issue
> ---
>
> Key: SPARK-31693
> URL: https://issues.apache.org/jira/browse/SPARK-31693
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> Given the series of failures in Spark packaging Jenkins job, it seems that 
> there is a network issue in AmbLab Jenkins cluster.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay.
> - The node failed to download the maven mirror. (SPARK-31691) -> The primary 
> host is okay.
> - The node failed to communicate repository.apache.org. (Current master 
> branch Jenkins job failure)
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
> on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
> remote metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could 
> not transfer metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Transfer 
> failed for 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
>  Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
> failed: Connection timed out (Connection timed out) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31712:


Assignee: Apache Spark

> Check casting timestamps to byte/short/int/long before 1970-01-01
> -
>
> Key: SPARK-31712
> URL: https://issues.apache.org/jira/browse/SPARK-31712
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> There are tests for casting timestamps to byte/short/int/long after the epoch 
> 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but 
> there are no test for "negative" timestamps before the epoch. The ticket aims 
> to add such tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31712:


Assignee: (was: Apache Spark)

> Check casting timestamps to byte/short/int/long before 1970-01-01
> -
>
> Key: SPARK-31712
> URL: https://issues.apache.org/jira/browse/SPARK-31712
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> There are tests for casting timestamps to byte/short/int/long after the epoch 
> 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but 
> there are no test for "negative" timestamps before the epoch. The ticket aims 
> to add such tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107625#comment-17107625
 ] 

Apache Spark commented on SPARK-31712:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28531

> Check casting timestamps to byte/short/int/long before 1970-01-01
> -
>
> Key: SPARK-31712
> URL: https://issues.apache.org/jira/browse/SPARK-31712
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> There are tests for casting timestamps to byte/short/int/long after the epoch 
> 1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but 
> there are no test for "negative" timestamps before the epoch. The ticket aims 
> to add such tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31712) Check casting timestamps to byte/short/int/long before 1970-01-01

2020-05-14 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31712:
--

 Summary: Check casting timestamps to byte/short/int/long before 
1970-01-01
 Key: SPARK-31712
 URL: https://issues.apache.org/jira/browse/SPARK-31712
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0, 3.1.0
Reporter: Maxim Gekk


There are tests for casting timestamps to byte/short/int/long after the epoch 
1970-01-01 00:00:00Z. Actually, the tests checks casting positive values but 
there are no test for "negative" timestamps before the epoch. The ticket aims 
to add such tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31579) Replace floorDiv by / in localRebaseGregorianToJulianDays()

2020-05-14 Thread Sudharshann D. (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107609#comment-17107609
 ] 

Sudharshann D. commented on SPARK-31579:


Just a small update. I have the design for the issue. Its something similar to 
this:
 # Write a duplicate of localRebaseGregorianToJulianDays() with / instead of 
floor div and with extra parameters days: Int,
 tz: TimeZone, hr: Int.
 # Write test cases that iterate over all days, all timezones, and each hour 
and compare the result of floorDiv and /.
 # Send the PR with this modification. If you think its fine, I'll clear all 
the edits and replace floorDiv by /

I have the code but there's something wrong with my dev environment...Figuring 
it out...

> Replace floorDiv by / in localRebaseGregorianToJulianDays()
> ---
>
> Key: SPARK-31579
> URL: https://issues.apache.org/jira/browse/SPARK-31579
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Minor
>  Labels: starter
>
> Most likely utcCal.getTimeInMillis % MILLIS_PER_DAY == 0 but need to check 
> that for all available time zones in the range of [0001, 2100] years with the 
> step of 1 hour or maybe smaller. If this hypothesis is confirmed, floorDiv 
> can be replaced by /, and this should improve performance of 
> RebaseDateTime.localRebaseGregorianToJulianDays.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id

2020-05-14 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107587#comment-17107587
 ] 

Dongjoon Hyun commented on SPARK-31387:
---

This is reverted from branch-3.0/master inevitability because this breaks all 
Maven jobs in both branches. Please see the comments on the original PR.

> HiveThriftServer2Listener update methods fail with unknown operation/session 
> id
> ---
>
> Key: SPARK-31387
> URL: https://issues.apache.org/jira/browse/SPARK-31387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Ali Smesseim
>Priority: Major
>
> HiveThriftServer2Listener update methods, such as onSessionClosed and 
> onOperationError throw a NullPointerException (in Spark 3) or a 
> NoSuchElementException (in Spark 2) when the input session/operation id is 
> unknown. In Spark 2, this can cause control flow issues with the caller of 
> the listener. In Spark 3, the listener is called by a ListenerBus which 
> catches the exception, but it would still be nicer if an invalid update is 
> logged and does not throw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31387:


Assignee: Apache Spark

> HiveThriftServer2Listener update methods fail with unknown operation/session 
> id
> ---
>
> Key: SPARK-31387
> URL: https://issues.apache.org/jira/browse/SPARK-31387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Ali Smesseim
>Assignee: Apache Spark
>Priority: Major
>
> HiveThriftServer2Listener update methods, such as onSessionClosed and 
> onOperationError throw a NullPointerException (in Spark 3) or a 
> NoSuchElementException (in Spark 2) when the input session/operation id is 
> unknown. In Spark 2, this can cause control flow issues with the caller of 
> the listener. In Spark 3, the listener is called by a ListenerBus which 
> catches the exception, but it would still be nicer if an invalid update is 
> logged and does not throw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31387:
--
Fix Version/s: (was: 3.0.0)

> HiveThriftServer2Listener update methods fail with unknown operation/session 
> id
> ---
>
> Key: SPARK-31387
> URL: https://issues.apache.org/jira/browse/SPARK-31387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Ali Smesseim
>Priority: Major
>
> HiveThriftServer2Listener update methods, such as onSessionClosed and 
> onOperationError throw a NullPointerException (in Spark 3) or a 
> NoSuchElementException (in Spark 2) when the input session/operation id is 
> unknown. In Spark 2, this can cause control flow issues with the caller of 
> the listener. In Spark 3, the listener is called by a ListenerBus which 
> catches the exception, but it would still be nicer if an invalid update is 
> logged and does not throw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31387:


Assignee: (was: Apache Spark)

> HiveThriftServer2Listener update methods fail with unknown operation/session 
> id
> ---
>
> Key: SPARK-31387
> URL: https://issues.apache.org/jira/browse/SPARK-31387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Ali Smesseim
>Priority: Major
>
> HiveThriftServer2Listener update methods, such as onSessionClosed and 
> onOperationError throw a NullPointerException (in Spark 3) or a 
> NoSuchElementException (in Spark 2) when the input session/operation id is 
> unknown. In Spark 2, this can cause control flow issues with the caller of 
> the listener. In Spark 3, the listener is called by a ListenerBus which 
> catches the exception, but it would still be nicer if an invalid update is 
> logged and does not throw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-31387:
---
  Assignee: (was: Ali Smesseim)

> HiveThriftServer2Listener update methods fail with unknown operation/session 
> id
> ---
>
> Key: SPARK-31387
> URL: https://issues.apache.org/jira/browse/SPARK-31387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Ali Smesseim
>Priority: Major
> Fix For: 3.0.0
>
>
> HiveThriftServer2Listener update methods, such as onSessionClosed and 
> onOperationError throw a NullPointerException (in Spark 3) or a 
> NoSuchElementException (in Spark 2) when the input session/operation id is 
> unknown. In Spark 2, this can cause control flow issues with the caller of 
> the listener. In Spark 3, the listener is called by a ListenerBus which 
> catches the exception, but it would still be nicer if an invalid update is 
> logged and does not throw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31696) Support spark.kubernetes.driver.service.annotation

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31696:
--
Fix Version/s: (was: 3.1.0)
   3.0.0

> Support spark.kubernetes.driver.service.annotation
> --
>
> Key: SPARK-31696
> URL: https://issues.apache.org/jira/browse/SPARK-31696
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31696) Support spark.kubernetes.driver.service.annotation

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31696:
--
Affects Version/s: (was: 3.1.0)
   3.0.0

> Support spark.kubernetes.driver.service.annotation
> --
>
> Key: SPARK-31696
> URL: https://issues.apache.org/jira/browse/SPARK-31696
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31696) Support spark.kubernetes.driver.service.annotation

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107483#comment-17107483
 ] 

Apache Spark commented on SPARK-31696:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28530

> Support spark.kubernetes.driver.service.annotation
> --
>
> Key: SPARK-31696
> URL: https://issues.apache.org/jira/browse/SPARK-31696
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107478#comment-17107478
 ] 

Apache Spark commented on SPARK-31692:
--

User 'karuppayya' has created a pull request for this issue:
https://github.com/apache/spark/pull/28529

> Hadoop confs passed via spark config are not set in URLStream Handler Factory
> -
>
> Key: SPARK-31692
> URL: https://issues.apache.org/jira/browse/SPARK-31692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karuppayya
>Assignee: Karuppayya
>Priority: Major
> Fix For: 3.0.0
>
>
> Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in 
> URLStreamHandlerFactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-14 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31681:
-
Affects Version/s: (was: 3.1.0)
   3.0.0

> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-14 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31681.
--
Fix Version/s: 3.0.0
 Assignee: Huaxin Gao
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28503

> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-14 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31681:
-
  Docs Text: In Spark 3.0, a multiclass logistic regression in Pyspark will 
now (correctly) return LogisticRegressionSummary, not the subclass 
BinaryLogisticRegressionSummary. The BinaryLogisticRegressionSummary would not 
work in this case anyway.
Description: 
{code:java}
def evaluate(self, dataset):
..
java_blr_summary = self._call_java("evaluate", dataset)
return BinaryLogisticRegressionSummary(java_blr_summary)
{code}

We should return LogisticRegressionSummary instead of 
BinaryLogisticRegressionSummary for multiclass LogisticRegression

  was:

{code:java}
def evaluate(self, dataset):
..
java_blr_summary = self._call_java("evaluate", dataset)
return BinaryLogisticRegressionSummary(java_blr_summary)
{code}

We should return LogisticRegressionSummary instead of 
BinaryLogisticRegressionSummary for multiclass LogisticRegression


> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Minor
>  Labels: release-notes
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-14 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31681:
-
Docs Text: In Spark 3.0, a multiclass logistic regression in Pyspark will 
now (correctly) return LogisticRegressionSummary, not the subclass 
BinaryLogisticRegressionSummary. The additional methods exposed by 
BinaryLogisticRegressionSummary would not work in this case anyway.  (was: In 
Spark 3.0, a multiclass logistic regression in Pyspark will now (correctly) 
return LogisticRegressionSummary, not the subclass 
BinaryLogisticRegressionSummary. The BinaryLogisticRegressionSummary would not 
work in this case anyway.)

> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Minor
>  Labels: release-notes
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-14 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31681:
-
Labels: release-notes  (was: )

> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Minor
>  Labels: release-notes
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20427) Issue with Spark interpreting Oracle datatype NUMBER

2020-05-14 Thread Rafael (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107376#comment-17107376
 ] 

Rafael commented on SPARK-20427:


Hey guys, 
I encountered an issue related to precision issues.

Now the code expects the Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?

> Issue with Spark interpreting Oracle datatype NUMBER
> 
>
> Key: SPARK-20427
> URL: https://issues.apache.org/jira/browse/SPARK-20427
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Alexander Andrushenko
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.3.0
>
>
> In Oracle exists data type NUMBER. When defining a filed in a table of type 
> NUMBER the field has two components, precision and scale.
> For example, NUMBER(p,s) has precision p and scale s. 
> Precision can range from 1 to 38.
> Scale can range from -84 to 127.
> When reading such a filed Spark can create numbers with precision exceeding 
> 38. In our case it has created fields with precision 44,
> calculated as sum of the precision (in our case 34 digits) and the scale (10):
> "...java.lang.IllegalArgumentException: requirement failed: Decimal precision 
> 44 exceeds max precision 38...".
> The result was, that a data frame was read from a table on one schema but 
> could not be inserted in the identical table on other schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-30100) Decimal Precision Inferred from JDBC via Spark

2020-05-14 Thread Rafael (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107365#comment-17107365
 ] 

Rafael edited comment on SPARK-30100 at 5/14/20, 2:48 PM:
--

Hey guys, 
 I encountered an issue related to the precision issues.

Now the code expects the for the Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?


was (Author: kyrdan):
Hey guys, 
I encountered an issue related to the precision issues.

Now the code expects the for the Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?
 * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13001869]

> Decimal Precision Inferred from JDBC via Spark
> --
>
> Key: SPARK-30100
> URL: https://issues.apache.org/jira/browse/SPARK-30100
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Joby Joje
>Priority: Major
>
> When trying to load data from JDBC(Oracle) into Spark, there seems to be 
> precision loss in the decimal field, as per my understanding Spark supports 
> *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark 
> rounds off the last four digits making it a precision of DECIMAL(38,10). This 
> is happening to few fields in the dataframe where the column is fetched using 
> a CASE statement whereas in the same query another field populates the right 
> schema.
> Tried to pass the
> {code:java}
> spark.sql.decimalOperations.allowPrecisionLoss=false{code}
> conf in the Spark-submit though didn't get the desired results.
> {code:java}
> jdbcDF = spark.read \ 
> .format("jdbc") \ 
> .option("url", "ORACLE") \ 
> .option("dbtable", "QUERY") \ 
> .option("user", "USERNAME") \ 
> .option("password", "PASSWORD") \ 
> .load(){code}
> So considering that the Spark infers the schema from a sample records, how 
> does this work here? Does it use the results of the query i.e (SELECT * FROM 
> TABLE_NAME JOIN ...) or does it take a different route to guess the schema 
> for itself? Can someone throw some light on this and advise how to achieve 
> the right decimal precision on this regards without manipulating the query as 
> doing a CAST on the query does solve the issue, but would prefer to get some 
> alternatives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30100) Decimal Precision Inferred from JDBC via Spark

2020-05-14 Thread Rafael (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107365#comment-17107365
 ] 

Rafael commented on SPARK-30100:


Hey guys, 
I encountered an issue related to the precision issues.

Now the code expects the for the Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?
 * [|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13001869]

> Decimal Precision Inferred from JDBC via Spark
> --
>
> Key: SPARK-30100
> URL: https://issues.apache.org/jira/browse/SPARK-30100
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Joby Joje
>Priority: Major
>
> When trying to load data from JDBC(Oracle) into Spark, there seems to be 
> precision loss in the decimal field, as per my understanding Spark supports 
> *DECIMAL(38,18)*. The field from the Oracle is DECIMAL(38,14), whereas Spark 
> rounds off the last four digits making it a precision of DECIMAL(38,10). This 
> is happening to few fields in the dataframe where the column is fetched using 
> a CASE statement whereas in the same query another field populates the right 
> schema.
> Tried to pass the
> {code:java}
> spark.sql.decimalOperations.allowPrecisionLoss=false{code}
> conf in the Spark-submit though didn't get the desired results.
> {code:java}
> jdbcDF = spark.read \ 
> .format("jdbc") \ 
> .option("url", "ORACLE") \ 
> .option("dbtable", "QUERY") \ 
> .option("user", "USERNAME") \ 
> .option("password", "PASSWORD") \ 
> .load(){code}
> So considering that the Spark infers the schema from a sample records, how 
> does this work here? Does it use the results of the query i.e (SELECT * FROM 
> TABLE_NAME JOIN ...) or does it take a different route to guess the schema 
> for itself? Can someone throw some light on this and advise how to achieve 
> the right decimal precision on this regards without manipulating the query as 
> doing a CAST on the query does solve the issue, but would prefer to get some 
> alternatives.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-14 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31676.
--
Fix Version/s: 2.4.7
   3.0.0
 Assignee: Weichen Xu
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28498

> QuantileDiscretizer raise error parameter splits given invalid value (splits 
> array includes -0.0 and 0.0)
> -
>
> Key: SPARK-31676
> URL: https://issues.apache.org/jira/browse/SPARK-31676
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.0.0, 2.4.7
>
>
> Reproduce code
> {code}
> import scala.util.Random
> val rng = new Random(3)
> val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
> Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
> import spark.implicits._
> val df1 = sc.parallelize(a1, 2).toDF("id")
> import org.apache.spark.ml.feature.QuantileDiscretizer
> val qd = new 
> QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)
> val model = qd.fit(df1)
> {code}
> Raise error like:
>   at org.apache.spark.ml.param.Param.validate(params.scala:76)
>   at org.apache.spark.ml.param.ParamPair.(params.scala:634)
>   at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
>   at org.apache.spark.ml.param.Params.set(params.scala:713)
>   at org.apache.spark.ml.param.Params.set$(params.scala:712)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
>   at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
>   at 
> org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
>   ... 49 elided
> java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 
> parameter splits given invalid value [-Infinity,-0.9986765732730827,..., 
> -0.0, 0.0, ..., 0.9907184077958491,Infinity]
> 0.0 > -0.0 is False, which break the paremater validation check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-31686) Return of String instead of array in function get_json_object

2020-05-14 Thread Touopi Touopi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Touopi Touopi reopened SPARK-31686:
---

I don't really understand the purpose to change the return type.
{code:sql}
select
v1.brandedcustomernumber as brandedcustomernumber
from
uniquecustomer.UniqueCustomer
lateral view 
explode(from_json(get_json_object(string(brandedCustomerInfoAggregate), 
'$.brandedCustomers[*].customerNumber'), 'array')) v1 as 
brandedcustomernumber
{code}
Look this example,
 Since i am using the wilcard [*] it means that i can have 0..n elements 
returned.
 Lucky my brandedCustomerInfoAggregate object has more than one 
brandedCustomers elements so the result of the get_json_object function will be 
["customer1","customer2"] for instance.

So now the function explode is waiting an array,what will happens if in any 
case i have just one brandedCustomers filled ?

the Object like String (actually i discover the " characters added on the 
chain) will be return liked this "customer1" an the function from_json will 
break.

I am expecting that during the parsing and selection of node if we have [*] we 
should return an array.
 Actually when One element is returned for another query,i am converting to 
array and cast to string
 
(from_json(cast(array(get_json_object(string(customer),'$.addresses[*].location'))
 as string),'array'))

But the result are not good when more elements are returned

> Return of String instead of array in function get_json_object
> -
>
> Key: SPARK-31686
> URL: https://issues.apache.org/jira/browse/SPARK-31686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: {code:json}
> // code placeholder
> {
> customer:{ 
>  addesses:[ { {code}
>                   location :  arizona
>                   }
>                ]
> }
> }
>  get_json_object(string(customer),'$addresses[*].location')
> return "arizona"
> result expected should be
> ["arizona"]
>Reporter: Touopi Touopi
>Priority: Major
>
> when we selecting a node of a json object that is array,
> When the array contains One element , the get_json_object return a String 
> with " characters instead of an array of One element.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-31686) Return of String instead of array in function get_json_object

2020-05-14 Thread Touopi Touopi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Touopi Touopi updated SPARK-31686:
--
Comment: was deleted

(was: I don't really understand the purpose to change the return type.
{code:sql}
select
v1.brandedcustomernumber as brandedcustomernumber
from
uniquecustomer.UniqueCustomer
lateral view 
explode(from_json(get_json_object(string(brandedCustomerInfoAggregate), 
'$.brandedCustomers[*].customerNumber'), 'array')) v1 as 
brandedcustomernumber
{code}
Look this example,
 Since i am using the wilcard [*] it means that i can have 0..n elements 
returned.
 Lucky my brandedCustomerInfoAggregate object has more than one 
brandedCustomers elements so the result of the get_json_object function will be 
["customer1","customer2"] for instance.


 So now the function explode is waiting an array,what will happens if in any 
case i have just one brandedCustomers filled ?

the Object like String (actually i discover the " characters added on the 
chain) will be return liked this "customer1" an the function from_json will 
break.


I am expecting that during the parsing and selection of node if we have [*] we 
should return an array.
 Actually when One element is returned for another query,i am converting to 
array and cast to string
 
(from_json(cast(array(get_json_object(string(customer),'$.addresses[*].location'))
 as string),'array'))

But the result are not good when more elements are returned)

> Return of String instead of array in function get_json_object
> -
>
> Key: SPARK-31686
> URL: https://issues.apache.org/jira/browse/SPARK-31686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: {code:json}
> // code placeholder
> {
> customer:{ 
>  addesses:[ { {code}
>                   location :  arizona
>                   }
>                ]
> }
> }
>  get_json_object(string(customer),'$addresses[*].location')
> return "arizona"
> result expected should be
> ["arizona"]
>Reporter: Touopi Touopi
>Priority: Major
>
> when we selecting a node of a json object that is array,
> When the array contains One element , the get_json_object return a String 
> with " characters instead of an array of One element.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31686) Return of String instead of array in function get_json_object

2020-05-14 Thread Touopi Touopi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107344#comment-17107344
 ] 

Touopi Touopi commented on SPARK-31686:
---

I don't really understand the purpose to change the return type.
{code:sql}
select
v1.brandedcustomernumber as brandedcustomernumber
from
uniquecustomer.UniqueCustomer
lateral view 
explode(from_json(get_json_object(string(brandedCustomerInfoAggregate), 
'$.brandedCustomers[*].customerNumber'), 'array')) v1 as 
brandedcustomernumber
{code}
Look this example,
 Since i am using the wilcard [*] it means that i can have 0..n elements 
returned.
 Lucky my brandedCustomerInfoAggregate object has more than one 
brandedCustomers elements so the result of the get_json_object function will be 
["customer1","customer2"] for instance.


 So now the function explode is waiting an array,what will happens if in any 
case i have just one brandedCustomers filled ?

the Object like String (actually i discover the " characters added on the 
chain) will be return liked this "customer1" an the function from_json will 
break.


I am expecting that during the parsing and selection of node if we have [*] we 
should return an array.
 Actually when One element is returned for another query,i am converting to 
array and cast to string
 
(from_json(cast(array(get_json_object(string(customer),'$.addresses[*].location'))
 as string),'array'))

But the result are not good when more elements are returned

> Return of String instead of array in function get_json_object
> -
>
> Key: SPARK-31686
> URL: https://issues.apache.org/jira/browse/SPARK-31686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: {code:json}
> // code placeholder
> {
> customer:{ 
>  addesses:[ { {code}
>                   location :  arizona
>                   }
>                ]
> }
> }
>  get_json_object(string(customer),'$addresses[*].location')
> return "arizona"
> result expected should be
> ["arizona"]
>Reporter: Touopi Touopi
>Priority: Major
>
> when we selecting a node of a json object that is array,
> When the array contains One element , the get_json_object return a String 
> with " characters instead of an array of One element.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17351) Refactor JDBCRDD to expose JDBC -> SparkSQL conversion functionality

2020-05-14 Thread Rafael (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-17351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107338#comment-17107338
 ] 

Rafael commented on SPARK-17351:


Hey guys, 
I know that it is a very old ticket but I encountered an issue related to these 
changes so let me ask my question it here.

Now the code expects the for Decimal type we need to have in JDBC metadata 
precision and scale. 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L402-L414]

 

I found out that in the OracleDB it is valid to have Decimal without these 
data. When I do a query read metadata for such column I'm getting 
DATA_PRECISION = Null, and DATA_SCALE = Null.

Then when I run the `spark-sql` I'm getting such error:
{code:java}
java.lang.IllegalArgumentException: requirement failed: Decimal precision 45 
exceeds max precision 38
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:465)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$3$$anonfun$12.apply(JdbcUtils.scala:407)
{code}
Do you have a work around how spark-sql can work with such cases?

> Refactor JDBCRDD to expose JDBC -> SparkSQL conversion functionality
> 
>
> Key: SPARK-17351
> URL: https://issues.apache.org/jira/browse/SPARK-17351
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
> Fix For: 2.1.0
>
>
> It would be useful if more of JDBCRDD's JDBC -> Spark SQL functionality was 
> usable from outside of JDBCRDD; this would make it easier to write test 
> harnesses comparing Spark output against other JDBC databases. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31711) Register the executor source with the metrics system when running in local mode.

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31711:


Assignee: Apache Spark

> Register the executor source with the metrics system when running in local 
> mode.
> 
>
> Key: SPARK-31711
> URL: https://issues.apache.org/jira/browse/SPARK-31711
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Apache Spark
>Priority: Minor
>
> The Apache Spark metrics system provides many useful insights on the Spark 
> workload. In particular, the executor source metrics 
> (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
>  provide detailed info, including the number of active tasks, some I/O 
> metrics, and task metrics details. Executor source metrics, contrary to other 
> sources (for example ExecutorMetrics source), are not yet available when 
> running in local mode.
> This JIRA proposes to register the executor source with the Spark metrics 
> system when running in local mode, as this can be very useful when testing 
> and troubleshooting Spark workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31711) Register the executor source with the metrics system when running in local mode.

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31711:


Assignee: (was: Apache Spark)

> Register the executor source with the metrics system when running in local 
> mode.
> 
>
> Key: SPARK-31711
> URL: https://issues.apache.org/jira/browse/SPARK-31711
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Priority: Minor
>
> The Apache Spark metrics system provides many useful insights on the Spark 
> workload. In particular, the executor source metrics 
> (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
>  provide detailed info, including the number of active tasks, some I/O 
> metrics, and task metrics details. Executor source metrics, contrary to other 
> sources (for example ExecutorMetrics source), are not yet available when 
> running in local mode.
> This JIRA proposes to register the executor source with the Spark metrics 
> system when running in local mode, as this can be very useful when testing 
> and troubleshooting Spark workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31711) Register the executor source with the metrics system when running in local mode.

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107336#comment-17107336
 ] 

Apache Spark commented on SPARK-31711:
--

User 'LucaCanali' has created a pull request for this issue:
https://github.com/apache/spark/pull/28528

> Register the executor source with the metrics system when running in local 
> mode.
> 
>
> Key: SPARK-31711
> URL: https://issues.apache.org/jira/browse/SPARK-31711
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Priority: Minor
>
> The Apache Spark metrics system provides many useful insights on the Spark 
> workload. In particular, the executor source metrics 
> (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
>  provide detailed info, including the number of active tasks, some I/O 
> metrics, and task metrics details. Executor source metrics, contrary to other 
> sources (for example ExecutorMetrics source), are not yet available when 
> running in local mode.
> This JIRA proposes to register the executor source with the Spark metrics 
> system when running in local mode, as this can be very useful when testing 
> and troubleshooting Spark workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31711) Register the executor source with the metrics system when running in local mode.

2020-05-14 Thread Luca Canali (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-31711:

Description: 
The Apache Spark metrics system provides many useful insights on the Spark 
workload. In particular, the executor source metrics 
(https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
 provide detailed info, including the number of active tasks, some I/O metrics, 
and task metrics details. Executor source metrics, contrary to other sources 
(for example ExecutorMetrics source), are not yet available when running in 
local mode.

This JIRA proposes to register the executor source with the Spark metrics 
system when running in local mode, as this can be very useful when testing and 
troubleshooting Spark workloads.


  was:
The Apache Spark metrics system provides many useful insights on the Spark 
workload. In particular, the [executor source 
metrics](https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
 provide detailed info, including number of active tasks, some I/O metrics, and 
task metrics details. Executor source, contrary to other sources (for example 
ExecutorMetrics source), are not yet available when running in local mode.

This JIRA proposes to register the executor source with the Spark metrics 
system when running in local mode, as this can be very useful when testing and 
troubleshooting Spark workloads.



> Register the executor source with the metrics system when running in local 
> mode.
> 
>
> Key: SPARK-31711
> URL: https://issues.apache.org/jira/browse/SPARK-31711
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Priority: Minor
>
> The Apache Spark metrics system provides many useful insights on the Spark 
> workload. In particular, the executor source metrics 
> (https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
>  provide detailed info, including the number of active tasks, some I/O 
> metrics, and task metrics details. Executor source metrics, contrary to other 
> sources (for example ExecutorMetrics source), are not yet available when 
> running in local mode.
> This JIRA proposes to register the executor source with the Spark metrics 
> system when running in local mode, as this can be very useful when testing 
> and troubleshooting Spark workloads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30973) ScriptTransformationExec should wait for the termination of process when scriptOutputReader hasNext return false

2020-05-14 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30973:
---

Assignee: Sun Ke

> ScriptTransformationExec should wait for the termination of process when 
> scriptOutputReader hasNext return false
> 
>
> Key: SPARK-30973
> URL: https://issues.apache.org/jira/browse/SPARK-30973
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 2.4.5
>Reporter: Sun Ke
>Assignee: Sun Ke
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30973) ScriptTransformationExec should wait for the termination of process when scriptOutputReader hasNext return false

2020-05-14 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30973.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27724
[https://github.com/apache/spark/pull/27724]

> ScriptTransformationExec should wait for the termination of process when 
> scriptOutputReader hasNext return false
> 
>
> Key: SPARK-30973
> URL: https://issues.apache.org/jira/browse/SPARK-30973
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 2.4.5
>Reporter: Sun Ke
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31711) Register the executor source with the metrics system when running in local mode.

2020-05-14 Thread Luca Canali (Jira)

Luca Canali created SPARK-31711:
---

 Summary: Register the executor source with the metrics system when 
running in local mode.
 Key: SPARK-31711
 URL: https://issues.apache.org/jira/browse/SPARK-31711
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Luca Canali


The Apache Spark metrics system provides many useful insights on the Spark 
workload. In particular, the [executor source 
metrics](https://github.com/apache/spark/blob/master/docs/monitoring.md#component-instance--executor)
 provide detailed info, including number of active tasks, some I/O metrics, and 
task metrics details. Executor source, contrary to other sources (for example 
ExecutorMetrics source), are not yet available when running in local mode.

This JIRA proposes to register the executor source with the Spark metrics 
system when running in local mode, as this can be very useful when testing and 
troubleshooting Spark workloads.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31338) Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for NOT NULL table definition of partition key.

2020-05-14 Thread Oleg Kuznetsov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107242#comment-17107242
 ] 

Oleg Kuznetsov commented on SPARK-31338:


[~minfa] query = “table where ...”

Will generate “select * from table where ...”

> Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for 
> NOT NULL table definition of partition key.
> --
>
> Key: SPARK-31338
> URL: https://issues.apache.org/jira/browse/SPARK-31338
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5
>Reporter: Mohit Dave
>Priority: Major
>
> h2. *Our Use-case Details:*
> While reading from a jdbc source using spark sql, we are using below read 
> format :
> jdbc(url: String, table: String, columnName: String, lowerBound: Long, 
> upperBound: Long, numPartitions: Int, connectionProperties: Properties).
> *Table defination :* 
>  postgres=> \d lineitem_sf1000
>  Table "public.lineitem_sf1000"
>  Column | Type | Modifiers
>  -++--
>  *l_orderkey | bigint | not null*
>  l_partkey | bigint | not null
>  l_suppkey | bigint | not null
>  l_linenumber | bigint | not null
>  l_quantity | numeric(10,2) | not null
>  l_extendedprice | numeric(10,2) | not null
>  l_discount | numeric(10,2) | not null
>  l_tax | numeric(10,2) | not null
>  l_returnflag | character varying(1) | not null
>  l_linestatus | character varying(1) | not null
>  l_shipdate | character varying(29) | not null
>  l_commitdate | character varying(29) | not null
>  l_receiptdate | character varying(29) | not null
>  l_shipinstruct | character varying(25) | not null
>  l_shipmode | character varying(10) | not null
>  l_comment | character varying(44) | not null
>  Indexes:
>  "l_order_sf1000_idx" btree (l_orderkey)
>  
> *Partition column* : l_orderkey 
> *numpartion* : 16 
> h2. *Problem details :* 
>  
> {code:java}
> SELECT 
> "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag"
>  FROM (SELECT 
> l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
>  FROM public.lineitem_sf1000) query_alias WHERE l_orderkey >= 150001 AND 
> l_orderkey < 187501 {code}
> 15 queries are generated with the above BETWEEN clauses. The last query looks 
> like this below:
> {code:java}
> SELECT 
> "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag"
>  FROM (SELECT 
> l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
>  FROM public.lineitem_sf1000) query_alias WHERE l_orderkey < 37501 or 
> l_orderkey is null {code}
> I*n the last query, we are trying to get the remaining records, along with 
> any data in the table for the partition key having NULL values.*
> This hurts performance badly. While the first 15 SQLs took approximately 10 
> minutes to execute, the last SQL with the NULL check takes 45 minutes because 
> it has to evaluate a second scan(OR clause) of the table for NULL values for 
> the partition key.
> *Note that I have defined the partition key of the table to be NOT NULL, at 
> the database. Therefore, the SQL for the last partition need not have this 
> NULL check, Spark SQl should be able to avoid such condition and this Jira is 
> intended to fix this behavior.*
> {code:java}
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31338) Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for NOT NULL table definition of partition key.

2020-05-14 Thread Mohit Dave (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107203#comment-17107203
 ] 

Mohit Dave commented on SPARK-31338:


[~olkuznsmith] queries get generated by Spark framework for read :

jdbcRead = spark.read
 .option("fetchsize", fetchSize)
 .jdbc(
 url = s"${connectionURL}",
 table = s"${query}",
 columnName = s"${partKey}",
 lowerBound = lBound,
 upperBound = hBound,
 numPartitions = numParts,
 connectionProperties = connProps);

 

So we dont have control over what query to execute, this Jira is raised to fix 
the way query get generated for last partition as I mentioned in description.

> Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for 
> NOT NULL table definition of partition key.
> --
>
> Key: SPARK-31338
> URL: https://issues.apache.org/jira/browse/SPARK-31338
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5
>Reporter: Mohit Dave
>Priority: Major
>
> h2. *Our Use-case Details:*
> While reading from a jdbc source using spark sql, we are using below read 
> format :
> jdbc(url: String, table: String, columnName: String, lowerBound: Long, 
> upperBound: Long, numPartitions: Int, connectionProperties: Properties).
> *Table defination :* 
>  postgres=> \d lineitem_sf1000
>  Table "public.lineitem_sf1000"
>  Column | Type | Modifiers
>  -++--
>  *l_orderkey | bigint | not null*
>  l_partkey | bigint | not null
>  l_suppkey | bigint | not null
>  l_linenumber | bigint | not null
>  l_quantity | numeric(10,2) | not null
>  l_extendedprice | numeric(10,2) | not null
>  l_discount | numeric(10,2) | not null
>  l_tax | numeric(10,2) | not null
>  l_returnflag | character varying(1) | not null
>  l_linestatus | character varying(1) | not null
>  l_shipdate | character varying(29) | not null
>  l_commitdate | character varying(29) | not null
>  l_receiptdate | character varying(29) | not null
>  l_shipinstruct | character varying(25) | not null
>  l_shipmode | character varying(10) | not null
>  l_comment | character varying(44) | not null
>  Indexes:
>  "l_order_sf1000_idx" btree (l_orderkey)
>  
> *Partition column* : l_orderkey 
> *numpartion* : 16 
> h2. *Problem details :* 
>  
> {code:java}
> SELECT 
> "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag"
>  FROM (SELECT 
> l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
>  FROM public.lineitem_sf1000) query_alias WHERE l_orderkey >= 150001 AND 
> l_orderkey < 187501 {code}
> 15 queries are generated with the above BETWEEN clauses. The last query looks 
> like this below:
> {code:java}
> SELECT 
> "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag"
>  FROM (SELECT 
> l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
>  FROM public.lineitem_sf1000) query_alias WHERE l_orderkey < 37501 or 
> l_orderkey is null {code}
> I*n the last query, we are trying to get the remaining records, along with 
> any data in the table for the partition key having NULL values.*
> This hurts performance badly. While the first 15 SQLs took approximately 10 
> minutes to execute, the last SQL with the NULL check takes 45 minutes because 
> it has to evaluate a second scan(OR clause) of the table for NULL values for 
> the partition key.
> *Note that I have defined the partition key of the table to be NOT NULL, at 
> the database. Therefore, the SQL for the last partition need not have this 
> NULL check, Spark SQl should be able to avoid such condition and this Jira is 
> intended to fix this behavior.*
> {code:java}
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31338) Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for NOT NULL table definition of partition key.

2020-05-14 Thread Mohit Dave (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107198#comment-17107198
 ] 

Mohit Dave commented on SPARK-31338:


[~hyukjin.kwon] it can be reproduced at will with given details. Let me know 
what is missing in given info, I can help you to get the details.

> Spark SQL JDBC Data Source partitioned read : Spark SQL does not honor for 
> NOT NULL table definition of partition key.
> --
>
> Key: SPARK-31338
> URL: https://issues.apache.org/jira/browse/SPARK-31338
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5
>Reporter: Mohit Dave
>Priority: Major
>
> h2. *Our Use-case Details:*
> While reading from a jdbc source using spark sql, we are using below read 
> format :
> jdbc(url: String, table: String, columnName: String, lowerBound: Long, 
> upperBound: Long, numPartitions: Int, connectionProperties: Properties).
> *Table defination :* 
>  postgres=> \d lineitem_sf1000
>  Table "public.lineitem_sf1000"
>  Column | Type | Modifiers
>  -++--
>  *l_orderkey | bigint | not null*
>  l_partkey | bigint | not null
>  l_suppkey | bigint | not null
>  l_linenumber | bigint | not null
>  l_quantity | numeric(10,2) | not null
>  l_extendedprice | numeric(10,2) | not null
>  l_discount | numeric(10,2) | not null
>  l_tax | numeric(10,2) | not null
>  l_returnflag | character varying(1) | not null
>  l_linestatus | character varying(1) | not null
>  l_shipdate | character varying(29) | not null
>  l_commitdate | character varying(29) | not null
>  l_receiptdate | character varying(29) | not null
>  l_shipinstruct | character varying(25) | not null
>  l_shipmode | character varying(10) | not null
>  l_comment | character varying(44) | not null
>  Indexes:
>  "l_order_sf1000_idx" btree (l_orderkey)
>  
> *Partition column* : l_orderkey 
> *numpartion* : 16 
> h2. *Problem details :* 
>  
> {code:java}
> SELECT 
> "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag"
>  FROM (SELECT 
> l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
>  FROM public.lineitem_sf1000) query_alias WHERE l_orderkey >= 150001 AND 
> l_orderkey < 187501 {code}
> 15 queries are generated with the above BETWEEN clauses. The last query looks 
> like this below:
> {code:java}
> SELECT 
> "l_orderkey","l_shipinstruct","l_quantity","l_partkey","l_discount","l_commitdate","l_receiptdate","l_comment","l_shipmode","l_linestatus","l_suppkey","l_shipdate","l_tax","l_extendedprice","l_linenumber","l_returnflag"
>  FROM (SELECT 
> l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
>  FROM public.lineitem_sf1000) query_alias WHERE l_orderkey < 37501 or 
> l_orderkey is null {code}
> I*n the last query, we are trying to get the remaining records, along with 
> any data in the table for the partition key having NULL values.*
> This hurts performance badly. While the first 15 SQLs took approximately 10 
> minutes to execute, the last SQL with the NULL check takes 45 minutes because 
> it has to evaluate a second scan(OR clause) of the table for NULL values for 
> the partition key.
> *Note that I have defined the partition key of the table to be NOT NULL, at 
> the database. Therefore, the SQL for the last partition need not have this 
> NULL check, Spark SQl should be able to avoid such condition and this Jira is 
> intended to fix this behavior.*
> {code:java}
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31710) result is the not the same when query and execute jobs

2020-05-14 Thread philipse (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

philipse updated SPARK-31710:
-
Description: 
Hi Team

Steps to reproduce.
{code:java}
create table test(id bigint);
insert into test select 1586318188000;
create table test1(id bigint) partitioned by (year string);
insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
from test;
{code}
let's check the result. 

Case 1:

*select * from test1;*

234 | 52238-06-04 13:06:400.0

--the result is wrong

Case 2:

*select 234,cast(id as TIMESTAMP) from test;*

 

java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
hh:mm:ss[.f]
 at java.sql.Timestamp.valueOf(Timestamp.java:237)
 at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
 at 
org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
 at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
 at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
 at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
 at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
 at org.apache.hive.beeline.Commands.execute(Commands.java:826)
 at org.apache.hive.beeline.Commands.sql(Commands.java:670)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
 at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
 Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)

 

I try hive,it works well,and the convert is fine and correct
{code:java}
select 234,cast(id as TIMESTAMP) from test;
 234   2020-04-08 11:56:28
{code}
Two questions:

q1:

if we forbid this convert,should we keep all cases the same?

q2:

if we allow the convert in some cases, should we decide the long length, for 
the code seems to force to convert to ns with times*100 nomatter how long 
the data is,if it convert to timestamp with incorrect length, we can raise the 
error.
{code:java}
// // converting seconds to us
private[this] def longToTimestamp(t: Long): Long = t * 100L{code}
 

Thanks!

 

  was:
Hi Team 

Steps to reproduce.
{code:java}
create table test(id bigint);
insert into test select 1586318188000;
create table test1(id bigint) partitioned by (year string);
insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
from test;
{code}
let's check the result. 

Case 1:

*select * from test1;*

234 | 52238-06-04 13:06:400.0

Case 2:

*select 234,cast(id as TIMESTAMP) from test;*

java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
hh:mm:ss[.f]
 at java.sql.Timestamp.valueOf(Timestamp.java:237)
 at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
 at 
org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
 at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
 at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
 at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
 at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
 at org.apache.hive.beeline.Commands.execute(Commands.java:826)
 at org.apache.hive.beeline.Commands.sql(Commands.java:670)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
 at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)

 

I try hive,it works well,and the convert is correct

Two questions:

q1:

if we forbid this convert,should we keep all cases the same?

q2:

if we allow the convert in some cases, should we decide the long length, for 
the code seems to force to convert to ns with times*100 nomatter how long 
the data is,if it convert to timestamp with

[jira] [Created] (SPARK-31710) result is the not the same when query and execute jobs

2020-05-14 Thread philipse (Jira)

philipse created SPARK-31710:


 Summary: result is the not the same when query and execute jobs
 Key: SPARK-31710
 URL: https://issues.apache.org/jira/browse/SPARK-31710
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5
 Environment: hdp:2.7.7

spark:2.4.5
Reporter: philipse


Hi Team 

Steps to reproduce.
{code:java}
create table test(id bigint);
insert into test select 1586318188000;
create table test1(id bigint) partitioned by (year string);
insert overwrite table test1 partition(year) select 234,cast(id as TIMESTAMP) 
from test;
{code}
let's check the result. 

Case 1:

*select * from test1;*

234 | 52238-06-04 13:06:400.0

Case 2:

*select 234,cast(id as TIMESTAMP) from test;*

java.lang.IllegalArgumentException: Timestamp format must be -mm-dd 
hh:mm:ss[.f]
 at java.sql.Timestamp.valueOf(Timestamp.java:237)
 at org.apache.hive.jdbc.HiveBaseResultSet.evaluate(HiveBaseResultSet.java:441)
 at 
org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:421)
 at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:530)
 at org.apache.hive.beeline.Rows$Row.(Rows.java:166)
 at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:43)
 at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1756)
 at org.apache.hive.beeline.Commands.execute(Commands.java:826)
 at org.apache.hive.beeline.Commands.sql(Commands.java:670)
 at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:974)
 at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:810)
 at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:767)
 at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:480)
 at org.apache.hive.beeline.BeeLine.main(BeeLine.java:463)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Error: Unrecognized column type:TIMESTAMP_TYPE (state=,code=0)

 

I try hive,it works well,and the convert is correct

Two questions:

q1:

if we forbid this convert,should we keep all cases the same?

q2:

if we allow the convert in some cases, should we decide the long length, for 
the code seems to force to convert to ns with times*100 nomatter how long 
the data is,if it convert to timestamp with incorrect length, we can raise the 
error.
{code:java}
// // converting seconds to us
private[this] def longToTimestamp(t: Long): Long = t * 100L{code}
 

Thanks!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31709) Proper base path for location when it is a relative path

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107144#comment-17107144
 ] 

Apache Spark commented on SPARK-31709:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28527

> Proper base path for location when it is a relative path
> 
>
> Key: SPARK-31709
> URL: https://issues.apache.org/jira/browse/SPARK-31709
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> Currently, the user home directory is used as the base path for the database 
> and table locations when their location is specified with a relative path, 
> e.g.
> {code:sql}
> > set spark.sql.warehouse.dir;
> spark.sql.warehouse.dir   
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/
> spark-sql> create database loctest location 'loctestdbdir';
> spark-sql> desc database loctest;
> Database Name loctest
> Comment
> Location  
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
> Owner kentyao
> spark-sql> create table loctest(id int) location 'loctestdbdir';
> spark-sql> desc formatted loctest;
> idint NULL
> # Detailed Table Information
> Database  default
> Table loctest
> Owner kentyao
> Created Time  Thu May 14 16:29:05 CST 2020
> Last Access   UNKNOWN
> Created BySpark 3.1.0-SNAPSHOT
> Type  EXTERNAL
> Provider  parquet
> Location  
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
> Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat   org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat  org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> {code}
> The user home is not always warehouse-related, unchangeable in runtime, and 
> shared both by database and table as the parent directory. Meanwhile, we use 
> the table path as the parent directory for relative partition locations.
> the config `spark.sql.warehouse.dir` represents the default location for 
> managed databases and tables. For databases, the case above seems not to 
> follow its semantics. For tables it is right but here I suggest enriching its 
> meaning that is also for external tables with relative paths for locations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31709) Proper base path for location when it is a relative path

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31709:


Assignee: Apache Spark

> Proper base path for location when it is a relative path
> 
>
> Key: SPARK-31709
> URL: https://issues.apache.org/jira/browse/SPARK-31709
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> Currently, the user home directory is used as the base path for the database 
> and table locations when their location is specified with a relative path, 
> e.g.
> {code:sql}
> > set spark.sql.warehouse.dir;
> spark.sql.warehouse.dir   
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/
> spark-sql> create database loctest location 'loctestdbdir';
> spark-sql> desc database loctest;
> Database Name loctest
> Comment
> Location  
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
> Owner kentyao
> spark-sql> create table loctest(id int) location 'loctestdbdir';
> spark-sql> desc formatted loctest;
> idint NULL
> # Detailed Table Information
> Database  default
> Table loctest
> Owner kentyao
> Created Time  Thu May 14 16:29:05 CST 2020
> Last Access   UNKNOWN
> Created BySpark 3.1.0-SNAPSHOT
> Type  EXTERNAL
> Provider  parquet
> Location  
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
> Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat   org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat  org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> {code}
> The user home is not always warehouse-related, unchangeable in runtime, and 
> shared both by database and table as the parent directory. Meanwhile, we use 
> the table path as the parent directory for relative partition locations.
> the config `spark.sql.warehouse.dir` represents the default location for 
> managed databases and tables. For databases, the case above seems not to 
> follow its semantics. For tables it is right but here I suggest enriching its 
> meaning that is also for external tables with relative paths for locations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31709) Proper base path for location when it is a relative path

2020-05-14 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31709:


Assignee: (was: Apache Spark)

> Proper base path for location when it is a relative path
> 
>
> Key: SPARK-31709
> URL: https://issues.apache.org/jira/browse/SPARK-31709
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> Currently, the user home directory is used as the base path for the database 
> and table locations when their location is specified with a relative path, 
> e.g.
> {code:sql}
> > set spark.sql.warehouse.dir;
> spark.sql.warehouse.dir   
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/
> spark-sql> create database loctest location 'loctestdbdir';
> spark-sql> desc database loctest;
> Database Name loctest
> Comment
> Location  
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
> Owner kentyao
> spark-sql> create table loctest(id int) location 'loctestdbdir';
> spark-sql> desc formatted loctest;
> idint NULL
> # Detailed Table Information
> Database  default
> Table loctest
> Owner kentyao
> Created Time  Thu May 14 16:29:05 CST 2020
> Last Access   UNKNOWN
> Created BySpark 3.1.0-SNAPSHOT
> Type  EXTERNAL
> Provider  parquet
> Location  
> file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
> Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat   org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat  org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
> {code}
> The user home is not always warehouse-related, unchangeable in runtime, and 
> shared both by database and table as the parent directory. Meanwhile, we use 
> the table path as the parent directory for relative partition locations.
> the config `spark.sql.warehouse.dir` represents the default location for 
> managed databases and tables. For databases, the case above seems not to 
> follow its semantics. For tables it is right but here I suggest enriching its 
> meaning that is also for external tables with relative paths for locations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31709) Proper base path for location when it is a relative path

2020-05-14 Thread Kent Yao (Jira)

Kent Yao created SPARK-31709:


 Summary: Proper base path for location when it is a relative path
 Key: SPARK-31709
 URL: https://issues.apache.org/jira/browse/SPARK-31709
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.5, 3.0.0, 3.1.0
Reporter: Kent Yao


Currently, the user home directory is used as the base path for the database 
and table locations when their location is specified with a relative path, e.g.
{code:sql}
> set spark.sql.warehouse.dir;
spark.sql.warehouse.dir 
file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/spark-warehouse/
spark-sql> create database loctest location 'loctestdbdir';

spark-sql> desc database loctest;
Database Name   loctest
Comment
Location
file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
Owner   kentyao

spark-sql> create table loctest(id int) location 'loctestdbdir';
spark-sql> desc formatted loctest;
id  int NULL

# Detailed Table Information
Databasedefault
Table   loctest
Owner   kentyao
Created TimeThu May 14 16:29:05 CST 2020
Last Access UNKNOWN
Created By  Spark 3.1.0-SNAPSHOT
TypeEXTERNAL
Providerparquet
Location
file:/Users/kentyao/Downloads/spark/spark-3.1.0-SNAPSHOT-bin-20200512/loctestdbdir
Serde Library   org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormatorg.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
{code}


The user home is not always warehouse-related, unchangeable in runtime, and 
shared both by database and table as the parent directory. Meanwhile, we use 
the table path as the parent directory for relative partition locations.

the config `spark.sql.warehouse.dir` represents the default location for 
managed databases and tables. For databases, the case above seems not to follow 
its semantics. For tables it is right but here I suggest enriching its meaning 
that is also for external tables with relative paths for locations.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29436) Support executor for selecting scheduler through scheduler name in the case of k8s multi-scheduler scenario.

2020-05-14 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-29436:

Issue Type: Improvement  (was: New Feature)

> Support executor for selecting scheduler through scheduler name in the case 
> of k8s multi-scheduler scenario.
> 
>
> Key: SPARK-29436
> URL: https://issues.apache.org/jira/browse/SPARK-29436
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: merrily01
>Assignee: merrily01
>Priority: Minor
> Fix For: 3.0.0
>
>
> In the case of k8s multi-scheduler, support executor for selecting scheduler 
> through scheduler name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25876) Simplify configuration types in k8s backend

2020-05-14 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-25876:
---

Assignee: Marcelo Masiero Vanzin

> Simplify configuration types in k8s backend
> ---
>
> Key: SPARK-25876
> URL: https://issues.apache.org/jira/browse/SPARK-25876
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> This is a child of SPARK-25874 to deal with the current issues with the 
> different configuration objects used in the k8s backend. Please refer to the 
> parent for further discussion of what this means.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31405) fail by default when read/write datetime values and not sure if they need rebase or not

2020-05-14 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106958#comment-17106958
 ] 

Apache Spark commented on SPARK-31405:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/28526

> fail by default when read/write datetime values and not sure if they need 
> rebase or not
> ---
>
> Key: SPARK-31405
> URL: https://issues.apache.org/jira/browse/SPARK-31405
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31692.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/28516

> Hadoop confs passed via spark config are not set in URLStream Handler Factory
> -
>
> Key: SPARK-31692
> URL: https://issues.apache.org/jira/browse/SPARK-31692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karuppayya
>Priority: Major
> Fix For: 3.0.0
>
>
> Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in 
> URLStreamHandlerFactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory

2020-05-14 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31692:
-

Assignee: Karuppayya

> Hadoop confs passed via spark config are not set in URLStream Handler Factory
> -
>
> Key: SPARK-31692
> URL: https://issues.apache.org/jira/browse/SPARK-31692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karuppayya
>Assignee: Karuppayya
>Priority: Major
> Fix For: 3.0.0
>
>
> Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in 
> URLStreamHandlerFactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

84 matches

Mail list logo