[jira] [Updated] (SPARK-35011) Avoid Block Manager registerations when StopExecutor msg is in-flight.

2021-08-20 Thread Sumeet (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumeet updated SPARK-35011:
---
Fix Version/s: 3.0.4

> Avoid Block Manager registerations when StopExecutor msg is in-flight.
> --
>
> Key: SPARK-35011
> URL: https://issues.apache.org/jira/browse/SPARK-35011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Sumeet
>Assignee: Sumeet
>Priority: Major
>  Labels: BlockManager, core
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> *Note:* This is a follow-up on SPARK-34949, even after the heartbeat fix, 
> driver reports dead executors as alive.
> *Problem:*
> I was testing Dynamic Allocation on K8s with about 300 executors. While doing 
> so, when the executors were torn down due to 
> "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor 
> pods being removed from K8s, however, under the "Executors" tab in SparkUI, I 
> could see some executors listed as alive. 
> [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100]
>  also returned a value greater than 1. 
>  
> *Cause:*
>  * "CoarseGrainedSchedulerBackend" issues async "StopExecutor" on 
> executorEndpoint
>  * "CoarseGrainedSchedulerBackend" removes that executor from Driver's 
> internal data structures and publishes "SparkListenerExecutorRemoved" on the 
> "listenerBus".
>  * Executor has still not processed "StopExecutor" from the Driver
>  * Driver receives heartbeat from the Executor, since it cannot find the 
> "executorId" in its data structures, it responds with 
> "HeartbeatResponse(reregisterBlockManager = true)"
>  * "BlockManager" on the Executor reregisters with the "BlockManagerMaster" 
> and "SparkListenerBlockManagerAdded" is published on the "listenerBus"
>  * Executor starts processing the "StopExecutor" and exits
>  * "AppStatusListener" picks the "SparkListenerBlockManagerAdded" event and 
> updates "AppStatusStore"
>  * "statusTracker.getExecutorInfos" refers "AppStatusStore" to get the list 
> of executors which returns the dead executor as alive.
>  
> *Proposed Solution:*
> Maintain a Cache of recently removed executors on Driver. During the 
> registration in BlockManagerMasterEndpoint if the BlockManager belongs to a 
> recently removed executor, return None indicating the registration is ignored 
> since the executor will be shutting down soon.
> On BlockManagerHeartbeat, if the BlockManager belongs to a recently removed 
> executor, return true indicating the driver knows about it, thereby 
> preventing reregisteration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34415) Use randomization as a possibly better technique than grid search in optimizing hyperparameters

2021-08-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402436#comment-17402436
 ] 

Apache Spark commented on SPARK-34415:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/33800

> Use randomization as a possibly better technique than grid search in 
> optimizing hyperparameters
> ---
>
> Key: SPARK-34415
> URL: https://issues.apache.org/jira/browse/SPARK-34415
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 3.0.1
>Reporter: Phillip Henry
>Assignee: Phillip Henry
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.0
>
>
> Randomization can be a more effective techinique than a grid search in 
> finding optimal hyperparameters since min/max points can fall between the 
> grid lines and never be found. Randomisation is not so restricted although 
> the probability of finding minima/maxima is dependent on the number of 
> attempts.
> Alice Zheng has an accessible description on how this technique works at 
> [https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html]
> (Note that I have a PR for this work outstanding at 
> [https://github.com/apache/spark/pull/31535] )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36548) Throwing NoClassDefFoundError for Logging$class

2021-08-20 Thread sadagopan kalyanaraman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402394#comment-17402394
 ] 

sadagopan kalyanaraman edited comment on SPARK-36548 at 8/20/21, 7:37 PM:
--

[~sarutak]  i agree on the documentation but the no class def error?


was (Author: sadagopan):
[~sarutak]  i agree on the doc but the no class def error?

> Throwing NoClassDefFoundError for Logging$class
> ---
>
> Key: SPARK-36548
> URL: https://issues.apache.org/jira/browse/SPARK-36548
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.1
>Reporter: sadagopan kalyanaraman
>Priority: Major
>
> I'm getting NoClassDefFoundError. is it removed in 3.1.1?
>  
> https://spark.apache.org/docs/3.1.1/api/java/org/apache/spark/internal/Logging.html
>  
> *https://spark.apache.org/docs/2.4.7/api/java/org/apache/spark/internal/Logging.html*
>  
> Exception in thread "main" java.util.ServiceConfigurationError: 
> org.apache.spark.sql.sources.DataSourceRegister: Provider 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be 
> instantiatedException in thread "main" java.util.ServiceConfigurationError: 
> org.apache.spark.sql.sources.DataSourceRegister: Provider 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be 
> instantiated at java.base/java.util.ServiceLoader.fail(Unknown Source) at 
> java.base/java.util.ServiceLoader$ProviderImpl.newInstance(Unknown Source) at 
> java.base/java.util.ServiceLoader$ProviderImpl.get(Unknown Source) at 
> java.base/java.util.ServiceLoader$3.next(Unknown Source) at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44) at 
> scala.collection.Iterator.foreach(Iterator.scala:941) at 
> scala.collection.Iterator.foreach$(Iterator.scala:941) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at 
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
> scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255) at 
> scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249) at 
> scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at 
> scala.collection.TraversableLike.filter(TraversableLike.scala:347) at 
> scala.collection.TraversableLike.filter$(TraversableLike.scala:347) at 
> scala.collection.AbstractTraversable.filter(Traversable.scala:108) at 
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:659)
>  at 
> org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:209)
>  at 
> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:195)
>  at 
> com.xyz.xyz.xyz.AbstractCdcMessageToSqJob.runJob(AbstractCdcMessageToSqJob.java:94)
>  at 
> com.xyz.xyz.xyz.CdcMessageToSqlBillerJob.main(CdcMessageToSqlBillerJob.java:30)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
> Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
> Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class at 
> com.google.cloud.spark.bigquery.BigQueryUtilScala$.(BigQueryUtil.scala:34)
>  at 
> com.google.cloud.spark.bigquery.BigQueryUtilScala$.(BigQueryUtil.scala)
>  at 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider.(BigQueryRelationProvider.scala:43)
>  at 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider.(BigQueryRelationProvider.scala:50)
>  at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method) at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>  Source) at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>  Source) at 

[jira] [Commented] (SPARK-36548) Throwing NoClassDefFoundError for Logging$class

2021-08-20 Thread sadagopan kalyanaraman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402394#comment-17402394
 ] 

sadagopan kalyanaraman commented on SPARK-36548:


[~sarutak]  i agree on the doc but the no class def error?

> Throwing NoClassDefFoundError for Logging$class
> ---
>
> Key: SPARK-36548
> URL: https://issues.apache.org/jira/browse/SPARK-36548
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.1
>Reporter: sadagopan kalyanaraman
>Priority: Major
>
> I'm getting NoClassDefFoundError. is it removed in 3.1.1?
>  
> https://spark.apache.org/docs/3.1.1/api/java/org/apache/spark/internal/Logging.html
>  
> *https://spark.apache.org/docs/2.4.7/api/java/org/apache/spark/internal/Logging.html*
>  
> Exception in thread "main" java.util.ServiceConfigurationError: 
> org.apache.spark.sql.sources.DataSourceRegister: Provider 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be 
> instantiatedException in thread "main" java.util.ServiceConfigurationError: 
> org.apache.spark.sql.sources.DataSourceRegister: Provider 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be 
> instantiated at java.base/java.util.ServiceLoader.fail(Unknown Source) at 
> java.base/java.util.ServiceLoader$ProviderImpl.newInstance(Unknown Source) at 
> java.base/java.util.ServiceLoader$ProviderImpl.get(Unknown Source) at 
> java.base/java.util.ServiceLoader$3.next(Unknown Source) at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44) at 
> scala.collection.Iterator.foreach(Iterator.scala:941) at 
> scala.collection.Iterator.foreach$(Iterator.scala:941) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at 
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
> scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255) at 
> scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249) at 
> scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at 
> scala.collection.TraversableLike.filter(TraversableLike.scala:347) at 
> scala.collection.TraversableLike.filter$(TraversableLike.scala:347) at 
> scala.collection.AbstractTraversable.filter(Traversable.scala:108) at 
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:659)
>  at 
> org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:209)
>  at 
> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:195)
>  at 
> com.xyz.xyz.xyz.AbstractCdcMessageToSqJob.runJob(AbstractCdcMessageToSqJob.java:94)
>  at 
> com.xyz.xyz.xyz.CdcMessageToSqlBillerJob.main(CdcMessageToSqlBillerJob.java:30)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
> Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
> Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class at 
> com.google.cloud.spark.bigquery.BigQueryUtilScala$.(BigQueryUtil.scala:34)
>  at 
> com.google.cloud.spark.bigquery.BigQueryUtilScala$.(BigQueryUtil.scala)
>  at 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider.(BigQueryRelationProvider.scala:43)
>  at 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider.(BigQueryRelationProvider.scala:50)
>  at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method) at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>  Source) at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>  Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown 
> Source) ... 33 moreCaused by: java.lang.ClassNotFoundException: 
> org.apache.spark.internal.Logging$class at 
> 

[jira] [Commented] (SPARK-36374) Push-based shuffle documentation

2021-08-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402393#comment-17402393
 ] 

Apache Spark commented on SPARK-36374:
--

User 'venkata91' has created a pull request for this issue:
https://github.com/apache/spark/pull/33799

> Push-based shuffle documentation
> 
>
> Key: SPARK-36374
> URL: https://issues.apache.org/jira/browse/SPARK-36374
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Venkata krishnan Sowrirajan
>Assignee: Venkata krishnan Sowrirajan
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36544) Special strings for dates are no longer automatically converted without the "date" keyword

2021-08-20 Thread Kousuke Saruta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402366#comment-17402366
 ] 

Kousuke Saruta commented on SPARK-36544:


[~laurikoobas]
The behavior was changed. See SPARK-35581.

> Special strings for dates are no longer automatically converted without the 
> "date" keyword
> --
>
> Key: SPARK-36544
> URL: https://issues.apache.org/jira/browse/SPARK-36544
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
> Environment: Databricks Runtime 9.0
>Reporter: Lauri Koobas
>Priority: Major
>
> Comparison of a date type and a special string fail with DBR 9.0.
> Works in earlier versions, returns "false":
>  {{select current_date > 'today'}}
> With DBR 9.0 the same query returns "null"
> Version that works in DBR 9.0:
> {{select current_date > *date* 'today'}}
> It's especially bad because people have written a lot of queries without 
> using this keyword and now all of those queries just silently fail as the 
> comparison returns a NULL.
> Works (or rather fails) in the same way with all comparison operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36548) Throwing NoClassDefFoundError for Logging$class

2021-08-20 Thread Kousuke Saruta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402364#comment-17402364
 ] 

Kousuke Saruta commented on SPARK-36548:


[~sadagopan]
It was wrongly exposed before. Logging is for internal use.
As of 3.0.0, internally used classes are not in the doc.
See SPARK-30779.

> Throwing NoClassDefFoundError for Logging$class
> ---
>
> Key: SPARK-36548
> URL: https://issues.apache.org/jira/browse/SPARK-36548
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.1
>Reporter: sadagopan kalyanaraman
>Priority: Major
>
> I'm getting NoClassDefFoundError. is it removed in 3.1.1?
>  
> https://spark.apache.org/docs/3.1.1/api/java/org/apache/spark/internal/Logging.html
>  
> *https://spark.apache.org/docs/2.4.7/api/java/org/apache/spark/internal/Logging.html*
>  
> Exception in thread "main" java.util.ServiceConfigurationError: 
> org.apache.spark.sql.sources.DataSourceRegister: Provider 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be 
> instantiatedException in thread "main" java.util.ServiceConfigurationError: 
> org.apache.spark.sql.sources.DataSourceRegister: Provider 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider could not be 
> instantiated at java.base/java.util.ServiceLoader.fail(Unknown Source) at 
> java.base/java.util.ServiceLoader$ProviderImpl.newInstance(Unknown Source) at 
> java.base/java.util.ServiceLoader$ProviderImpl.get(Unknown Source) at 
> java.base/java.util.ServiceLoader$3.next(Unknown Source) at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44) at 
> scala.collection.Iterator.foreach(Iterator.scala:941) at 
> scala.collection.Iterator.foreach$(Iterator.scala:941) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at 
> scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
> scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
> scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255) at 
> scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249) at 
> scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at 
> scala.collection.TraversableLike.filter(TraversableLike.scala:347) at 
> scala.collection.TraversableLike.filter$(TraversableLike.scala:347) at 
> scala.collection.AbstractTraversable.filter(Traversable.scala:108) at 
> org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:659)
>  at 
> org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:209)
>  at 
> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:195)
>  at 
> com.xyz.xyz.xyz.AbstractCdcMessageToSqJob.runJob(AbstractCdcMessageToSqJob.java:94)
>  at 
> com.xyz.xyz.xyz.CdcMessageToSqlBillerJob.main(CdcMessageToSqlBillerJob.java:30)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
> Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
> Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging$class at 
> com.google.cloud.spark.bigquery.BigQueryUtilScala$.(BigQueryUtil.scala:34)
>  at 
> com.google.cloud.spark.bigquery.BigQueryUtilScala$.(BigQueryUtil.scala)
>  at 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider.(BigQueryRelationProvider.scala:43)
>  at 
> com.google.cloud.spark.bigquery.BigQueryRelationProvider.(BigQueryRelationProvider.scala:50)
>  at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method) at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
>  Source) at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
>  Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown 
> Source) ... 33 moreCaused by: 

[jira] [Commented] (SPARK-36552) varchar datatype behave differently on hive table and datasource table

2021-08-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1740#comment-1740
 ] 

Apache Spark commented on SPARK-36552:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/33798

> varchar datatype behave differently on  hive table  and datasource table
> 
>
> Key: SPARK-36552
> URL: https://issues.apache.org/jira/browse/SPARK-36552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 3.1.1, 3.1.2
>Reporter: ocean
>Priority: Major
>
> in spark 3.1.X, when set spark.sql.hive.convertMetastoreOrc=false,and 
> spark.sql.legacy.charVarcharAsString=true.
> Execute the following sql:
> CREATE TABLE t (col varchar(2)) stored as orc;
> INSERT INTO t SELECT 'aaa';
> select * from t;
> result is aa
>  
> But when set spark.sql.hive.convertMetastoreOrc=true,and 
> spark.sql.legacy.charVarcharAsString=true
> alse execute the sql, the result is "aaa"
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36552) varchar datatype behave differently on hive table and datasource table

2021-08-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36552:


Assignee: (was: Apache Spark)

> varchar datatype behave differently on  hive table  and datasource table
> 
>
> Key: SPARK-36552
> URL: https://issues.apache.org/jira/browse/SPARK-36552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 3.1.1, 3.1.2
>Reporter: ocean
>Priority: Major
>
> in spark 3.1.X, when set spark.sql.hive.convertMetastoreOrc=false,and 
> spark.sql.legacy.charVarcharAsString=true.
> Execute the following sql:
> CREATE TABLE t (col varchar(2)) stored as orc;
> INSERT INTO t SELECT 'aaa';
> select * from t;
> result is aa
>  
> But when set spark.sql.hive.convertMetastoreOrc=true,and 
> spark.sql.legacy.charVarcharAsString=true
> alse execute the sql, the result is "aaa"
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36552) varchar datatype behave differently on hive table and datasource table

2021-08-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36552:


Assignee: Apache Spark

> varchar datatype behave differently on  hive table  and datasource table
> 
>
> Key: SPARK-36552
> URL: https://issues.apache.org/jira/browse/SPARK-36552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 3.1.1, 3.1.2
>Reporter: ocean
>Assignee: Apache Spark
>Priority: Major
>
> in spark 3.1.X, when set spark.sql.hive.convertMetastoreOrc=false,and 
> spark.sql.legacy.charVarcharAsString=true.
> Execute the following sql:
> CREATE TABLE t (col varchar(2)) stored as orc;
> INSERT INTO t SELECT 'aaa';
> select * from t;
> result is aa
>  
> But when set spark.sql.hive.convertMetastoreOrc=true,and 
> spark.sql.legacy.charVarcharAsString=true
> alse execute the sql, the result is "aaa"
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30641) Project Matrix: Linear Models revisit and refactor

2021-08-20 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-30641:


Assignee: zhengruifeng

> Project Matrix: Linear Models revisit and refactor
> --
>
> Key: SPARK-30641
> URL: https://issues.apache.org/jira/browse/SPARK-30641
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0, 3.2.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
>
> We had been refactoring linear models for a long time, and there still are 
> some works in the future. After some discussions among [~huaxingao] [~srowen] 
> [~weichenxu123] [~mengxr] [~podongfeng] , we decide to gather related works 
> under a sub-project Matrix, it includes:
>  # *Blockification (vectorization of vectors)*
>  ** vectors are stacked into matrices, so that high-level BLAS can be used 
> for better performance. (about ~3x faster on sparse datasets, up to ~18x 
> faster on dense datasets, see SPARK-31783 for details).
>  ** Since 3.1.1, LoR/SVC/LiR/AFT supports blockification, and we need to 
> blockify KMeans in the future.
>  # *Standardization (virutal centering)*
>  ** Existing impl of standardization in linear models does NOT center the 
> vectors by removing the means, for the purpose of keeping dataset 
> _*sparsity*_. However, this will cause feature values with small var be 
> scaled to large values, and underlying solver like LBFGS can not efficiently 
> handle this case. see SPARK-34448 for details.
>  ** If internal vectors are centered (like famous GLMNET), the convergence 
> ratio will be better. In the case in SPARK-34448, the number of iteration to 
> convergence will be reduced from 93 to 6. Moreover, the final solution is 
> much more close to the one in GLMNET.
>  ** Luckily, we find a new way to _*virtually*_ center the vectors without 
> densifying the dataset. Good results had been observed in LoR, we will take 
> it into account in other linear models.
>  # _*Initialization (To be discussed)*_
>  ** Initializing model coef with a given model, should be beneficial to: 1, 
> convergence ratio (should reduce number of iterations); 2, model stability 
> (may obtain a new solution more close to the previous one);
>  # _*Early Stopping* *(To be discussed)*_
>  ** we can compute the test error in the procedure (like tree models), and 
> stop the training procedure if test error begin to increase;
>  
>   If you want to add other features in these models, please comment in 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36448) Exceptions in NoSuchItemException.scala have to be case classes to preserve specific exceptions

2021-08-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36448:
---

Assignee: Yesheng Ma

> Exceptions in NoSuchItemException.scala have to be case classes to preserve 
> specific exceptions
> ---
>
> Key: SPARK-36448
> URL: https://issues.apache.org/jira/browse/SPARK-36448
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
>
> Exceptions in NoSuchItemException.scala are not case classes. This is causing 
> issues because in Analyzer's 
> [executeAndCheck|https://github.com/apache/spark/blob/888f8f03c89ea7ee8997171eadf64c87e17c4efe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L196-L199]
>  method always calls the `copy` method on the exception. However, since these 
> exceptions are not case classes, the `copy` method was always delegated to 
> `AnalysisException::copy`, which is not the specialized version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36448) Exceptions in NoSuchItemException.scala have to be case classes to preserve specific exceptions

2021-08-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36448.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33673
[https://github.com/apache/spark/pull/33673]

> Exceptions in NoSuchItemException.scala have to be case classes to preserve 
> specific exceptions
> ---
>
> Key: SPARK-36448
> URL: https://issues.apache.org/jira/browse/SPARK-36448
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.3.0
>
>
> Exceptions in NoSuchItemException.scala are not case classes. This is causing 
> issues because in Analyzer's 
> [executeAndCheck|https://github.com/apache/spark/blob/888f8f03c89ea7ee8997171eadf64c87e17c4efe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L196-L199]
>  method always calls the `copy` method on the exception. However, since these 
> exceptions are not case classes, the `copy` method was always delegated to 
> `AnalysisException::copy`, which is not the specialized version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36532) Deadlock in CoarseGrainedExecutorBackend.onDisconnected

2021-08-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-36532:

Fix Version/s: 3.0.4
   3.1.3

> Deadlock in CoarseGrainedExecutorBackend.onDisconnected
> ---
>
> Key: SPARK-36532
> URL: https://issues.apache.org/jira/browse/SPARK-36532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>
> The deadlock has the exactly same root cause as SPARK-14180 but just happens 
> in a different code path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36551) Add sphinx-plotly-directive in Spark release Dockerfile

2021-08-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36551.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33797
[https://github.com/apache/spark/pull/33797]

> Add sphinx-plotly-directive in Spark release Dockerfile
> ---
>
> Key: SPARK-36551
> URL: https://issues.apache.org/jira/browse/SPARK-36551
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> After https://github.com/apache/spark/pull/32726, Python doc build requires 
> sphinx-plotly-directive.
> We should install it from spark-rm/Dockerfile to make sure 
> do-release-docker.sh can run successfully. 
> Also, we should mention it in the README of docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36553) KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced

2021-08-20 Thread Anders Rydbirk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anders Rydbirk updated SPARK-36553:
---
Description: 
We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
 at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
py4j.Gateway.invoke(Gateway.java:282) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:238) at 
java.base/java.lang.Thread.run(Unknown Source)
{code}
 

The issue is introduced by 
[#27758|#diff-725d4624ddf4db9cc51721c2ddaef50a1bc30e7b471e0439da28c5b5582efdfdR52]]
 which significantly reduces the maximum value of K. Snippit of line that 
throws error from [DistanceMeasure.scala:|#L52]]
{code:java}
val packedValues = Array.ofDim[Double](k * (k + 1) / 2)
{code}
 

*What we have tried:*
 * Reducing iterations
 * Reducing input volume
 * Reducing K

Only reducing K have yielded success.

 

*Possible workaround:*
 # Roll back to Spark 3.0.0 since a KMeansModel generated with 3.0.0 cannot be 
loaded in 3.1.1.
 # Reduce K. Currently trying with 45000.

 

*What we don't understand*:

Given the line of code above, we do not understand why we would get an integer 
overflow.

For K=50,000, packedValues should be allocated with the size of 1,250,025,000 < 
(2^31) and not result in a negative array size.

 

*Suggested resolution:*

I'm not strong in the inner workings on KMeans, but my immediate thought would 
be to add a fallback to previous logic for K larger than a set threshold if the 
optimisation is to stay in place, as it breaks compatibility from 3.0.0 to 
3.1.1 for edge cases.

 

Please let me know if more information is needed, this is my first time raising 
a bug for a OS.

  was:
We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 

[jira] [Updated] (SPARK-36553) KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced

2021-08-20 Thread Anders Rydbirk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anders Rydbirk updated SPARK-36553:
---
Description: 
We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
 at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
py4j.Gateway.invoke(Gateway.java:282) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:238) at 
java.base/java.lang.Thread.run(Unknown Source)
{code}
 

The issue is introduced by 
[#27758|#diff-725d4624ddf4db9cc51721c2ddaef50a1bc30e7b471e0439da28c5b5582efdfdR52]]
 which significantly reduces the maximum value of K. Snippit of line that 
throws error from [DistanceMeasure.scala:|#L52]]
{code:java}
val packedValues = Array.ofDim[Double](k * (k + 1) / 2)
{code}
 

*What we have tried:*
 * Reducing iterations
 * Reducing input volume
 * Reducing K

Only reducing K have yielded success.

 

*Possible workaround:*

Roll back to Spark 3.0.0 since a KMeansModel generated with 3.0.0 cannot be 
loaded in 3.1.1.

 

*What we don't understand*:

Given the line of code above, we do not understand why we would get an integer 
overflow:

For K=50,000, packedValues should be allocated with the size of 1,250,025,000 < 
(2^31) and not result in a negative array size.

Please let me know if more information is needed, this is my first time raising 
a bug for a OS.

 

*Suggested resolution:*

I'm not strong in the inner workings on KMeans, but my immediate thought would 
be to add a fallback to previous logic for K larger than a set threshold if the 
optimisation is to stay in place, as it breaks compatibility from 3.0.0 to 
3.1.1 for edge cases.

  was:
We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
 at 

[jira] [Updated] (SPARK-36553) KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced

2021-08-20 Thread Anders Rydbirk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anders Rydbirk updated SPARK-36553:
---
Description: 
We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
 at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
py4j.Gateway.invoke(Gateway.java:282) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:238) at 
java.base/java.lang.Thread.run(Unknown Source)
{code}
 

The issue is introduced by 
[#27758|#diff-725d4624ddf4db9cc51721c2ddaef50a1bc30e7b471e0439da28c5b5582efdfdR52]]
 which significantly reduces the maximum value of K. Snippit of line that 
throws error from [DistanceMeasure.scala:|#L52]]
{code:java}
val packedValues = Array.ofDim[Double](k * (k + 1) / 2)
{code}
 

*What we have tried:*
 * Reducing iterations
 * Reducing input volume
 * Reducing K

Only reducing K have yielded success.

 

*What we don't understand*:

Given the line of code above, we do not understand why we would get an integer 
overflow:

For K=50,000, packedValues should be allocated with the size of 1,250,025,000 < 
(2^31) and not result in a negative array size.

Please let me know if more information is needed, this is my first time raising 
a bug for a OS.

 

*Suggested resolution:*

I'm not strong in the inner workings on KMeans, but my immediate thought would 
be to add a fallback to previous logic for K larger than a set threshold if the 
optimisation is to stay in place, as it breaks compatibility from 3.0.0 to 
3.1.1 for edge cases.

  was:
We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
 at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at 

[jira] [Updated] (SPARK-36553) KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced

2021-08-20 Thread Anders Rydbirk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anders Rydbirk updated SPARK-36553:
---
Description: 
We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
 at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
py4j.Gateway.invoke(Gateway.java:282) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:238) at 
java.base/java.lang.Thread.run(Unknown Source)
{code}
 

The issue is introduced by 
[#27758|#diff-725d4624ddf4db9cc51721c2ddaef50a1bc30e7b471e0439da28c5b5582efdfdR52]]
 which significantly reduces the maximum value of K. Snippit of line that 
throws error from [DistanceMeasure.scala:|#L52]]
{code:java}
val packedValues = Array.ofDim[Double](k * (k + 1) / 2)
{code}
 

*What we have tried:*
 * Reducing iterations
 * Reducing input volume
 * Reducing K

Only reducing K have yielded success.

 

*What we don't understand*:

Given the line of code above, we do not understand why we would get an integer 
overflow:

For K=50,000, packedValues should be allocated with the size of 1,250,025,000 < 
(2^31) and not result in a negative array size.

Please let me know if more information is needed, this is my first time raising 
a bug for a OS.

  was:
We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
 at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 

[jira] [Created] (SPARK-36553) KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced

2021-08-20 Thread Anders Rydbirk (Jira)
Anders Rydbirk created SPARK-36553:
--

 Summary: KMeans fails with NegativeArraySizeException for K = 
5 after issue #27758 was introduced
 Key: SPARK-36553
 URL: https://issues.apache.org/jira/browse/SPARK-36553
 Project: Spark
  Issue Type: Bug
  Components: ML, MLlib, PySpark
Affects Versions: 3.1.1
Reporter: Anders Rydbirk


We are running KMeans on approximately 350M rows of x, y, z coordinates using 
the following configuration:
{code:java}
KMeans(
  featuresCol='features',
  predictionCol='centroid_id',
  k=5,
  initMode='k-means||',
  initSteps=2,
  tol=0.5,
  maxIter=20,
  seed=SEED,
  distanceMeasure='euclidean'
)
{code}
When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
consistently getting errors unless we reduce K.

Stacktrace:

 
{code:java}
An error occurred while calling o167.fit.An error occurred while calling 
o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
scala.Array$.ofDim(Array.scala:221) at 
org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
 at 
org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
 at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) at 
org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
 at scala.util.Try$.apply(Try.scala:213) at 
org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
 at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
py4j.Gateway.invoke(Gateway.java:282) at 
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at 
py4j.commands.CallCommand.execute(CallCommand.java:79) at 
py4j.GatewayConnection.run(GatewayConnection.java:238) at 
java.base/java.lang.Thread.run(Unknown Source)
{code}
 

The issue is introduced by 
[#27758|[https://github.com/apache/spark/pull/27758/files#diff-725d4624ddf4db9cc51721c2ddaef50a1bc30e7b471e0439da28c5b5582efdfdR52]]
 which significantly reduces the maximum value of K:

[DistanceMeasure.scala|[https://github.com/zhengruifeng/spark/blob/d31d488e0e48a82fd5b43c406f07b8c7d27dd53c/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala#L52]]
{code:java}
val packedValues = Array.ofDim[Double](k * (k + 1) / 2)
{code}
 

*What we have tried:*
 * Reducing iterations
 * Reducing input volume
 * Reducing K

Only reducing K have yielded success.

 

*What we don't understand*:

**Given the line of code above, we do not understand why we would get an 
integer overflow:

For K=50,000, packedValues should be allocated with the size of 1,250,025,000 < 
(2^31) and not result in a negative array size.


Please let me know if more information is needed, this is my first time raising 
a bug for a OS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-36434) Implement DataFrame.lookup

2021-08-20 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402151#comment-17402151
 ] 

dgd_contributor edited comment on SPARK-36434 at 8/20/21, 10:44 AM:


should we work on this? this docs show that dataframe.lookup is deprecated 
[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html]

 


was (Author: dc-heros):
should we work on this? this docs show dataframe.lookup is deprecated 
[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html]

 

> Implement DataFrame.lookup
> --
>
> Key: SPARK-36434
> URL: https://issues.apache.org/jira/browse/SPARK-36434
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36434) Implement DataFrame.lookup

2021-08-20 Thread dgd_contributor (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402151#comment-17402151
 ] 

dgd_contributor commented on SPARK-36434:
-

should we work on this? this docs show dataframe.lookup is deprecated 
[https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.lookup.html]

 

> Implement DataFrame.lookup
> --
>
> Key: SPARK-36434
> URL: https://issues.apache.org/jira/browse/SPARK-36434
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31336) Support Oracle Kerberos login in JDBC connector

2021-08-20 Thread ABHAYRAJ (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402134#comment-17402134
 ] 

ABHAYRAJ  commented on SPARK-31336:
---

Hi [~avsek] have you found the fix for the above. We are also facing the same 
issue.

> Support Oracle Kerberos login in JDBC connector
> ---
>
> Key: SPARK-31336
> URL: https://issues.apache.org/jira/browse/SPARK-31336
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36552) varchar datatype behave differently on hive table and datasource table

2021-08-20 Thread ocean (Jira)
ocean created SPARK-36552:
-

 Summary: varchar datatype behave differently on  hive table  and 
datasource table
 Key: SPARK-36552
 URL: https://issues.apache.org/jira/browse/SPARK-36552
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1, 3.1.0, 2.3.1
Reporter: ocean


in spark 3.1.X, when set spark.sql.hive.convertMetastoreOrc=false,and 

spark.sql.legacy.charVarcharAsString=true.

Execute the following sql:

CREATE TABLE t (col varchar(2)) stored as orc;
INSERT INTO t SELECT 'aaa';
select * from t;

result is aa

 

But when set spark.sql.hive.convertMetastoreOrc=true,and 

spark.sql.legacy.charVarcharAsString=true

alse execute the sql, the result is "aaa"

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36552) varchar datatype behave differently on hive table and datasource table

2021-08-20 Thread ocean (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ocean updated SPARK-36552:
--
Affects Version/s: (was: 3.1.0)
   3.1.2

> varchar datatype behave differently on  hive table  and datasource table
> 
>
> Key: SPARK-36552
> URL: https://issues.apache.org/jira/browse/SPARK-36552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 3.1.1, 3.1.2
>Reporter: ocean
>Priority: Major
>
> in spark 3.1.X, when set spark.sql.hive.convertMetastoreOrc=false,and 
> spark.sql.legacy.charVarcharAsString=true.
> Execute the following sql:
> CREATE TABLE t (col varchar(2)) stored as orc;
> INSERT INTO t SELECT 'aaa';
> select * from t;
> result is aa
>  
> But when set spark.sql.hive.convertMetastoreOrc=true,and 
> spark.sql.legacy.charVarcharAsString=true
> alse execute the sql, the result is "aaa"
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36551) Add sphinx-plotly-directive in Spark release Dockerfile

2021-08-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36551:


Assignee: Apache Spark  (was: Gengliang Wang)

> Add sphinx-plotly-directive in Spark release Dockerfile
> ---
>
> Key: SPARK-36551
> URL: https://issues.apache.org/jira/browse/SPARK-36551
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> After https://github.com/apache/spark/pull/32726, Python doc build requires 
> sphinx-plotly-directive.
> We should install it from spark-rm/Dockerfile to make sure 
> do-release-docker.sh can run successfully. 
> Also, we should mention it in the README of docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36551) Add sphinx-plotly-directive in Spark release Dockerfile

2021-08-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-36551:
---
Summary: Add sphinx-plotly-directive in Spark release Dockerfile  (was: Add 
sphinx-plotly-directive in Spark release docker script)

> Add sphinx-plotly-directive in Spark release Dockerfile
> ---
>
> Key: SPARK-36551
> URL: https://issues.apache.org/jira/browse/SPARK-36551
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> After https://github.com/apache/spark/pull/32726, Python doc build requires 
> sphinx-plotly-directive.
> We should install it from spark-rm/Dockerfile to make sure 
> do-release-docker.sh can run successfully. 
> Also, we should mention it in the README of docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36551) Add sphinx-plotly-directive in Spark release docker script

2021-08-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36551:


Assignee: Gengliang Wang  (was: Apache Spark)

> Add sphinx-plotly-directive in Spark release docker script
> --
>
> Key: SPARK-36551
> URL: https://issues.apache.org/jira/browse/SPARK-36551
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> After https://github.com/apache/spark/pull/32726, Python doc build requires 
> sphinx-plotly-directive.
> We should install it from spark-rm/Dockerfile to make sure 
> do-release-docker.sh can run successfully. 
> Also, we should mention it in the README of docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36551) Add sphinx-plotly-directive in Spark release docker script

2021-08-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36551:


Assignee: Apache Spark  (was: Gengliang Wang)

> Add sphinx-plotly-directive in Spark release docker script
> --
>
> Key: SPARK-36551
> URL: https://issues.apache.org/jira/browse/SPARK-36551
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> After https://github.com/apache/spark/pull/32726, Python doc build requires 
> sphinx-plotly-directive.
> We should install it from spark-rm/Dockerfile to make sure 
> do-release-docker.sh can run successfully. 
> Also, we should mention it in the README of docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36551) Add sphinx-plotly-directive in Spark release docker script

2021-08-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402126#comment-17402126
 ] 

Apache Spark commented on SPARK-36551:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33797

> Add sphinx-plotly-directive in Spark release docker script
> --
>
> Key: SPARK-36551
> URL: https://issues.apache.org/jira/browse/SPARK-36551
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> After https://github.com/apache/spark/pull/32726, Python doc build requires 
> sphinx-plotly-directive.
> We should install it from spark-rm/Dockerfile to make sure 
> do-release-docker.sh can run successfully. 
> Also, we should mention it in the README of docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36551) Add sphinx-plotly-directive in Spark release docker script

2021-08-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402125#comment-17402125
 ] 

Apache Spark commented on SPARK-36551:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/33797

> Add sphinx-plotly-directive in Spark release docker script
> --
>
> Key: SPARK-36551
> URL: https://issues.apache.org/jira/browse/SPARK-36551
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> After https://github.com/apache/spark/pull/32726, Python doc build requires 
> sphinx-plotly-directive.
> We should install it from spark-rm/Dockerfile to make sure 
> do-release-docker.sh can run successfully. 
> Also, we should mention it in the README of docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36551) Add sphinx-plotly-directive in Spark release docker script

2021-08-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-36551:
--

 Summary: Add sphinx-plotly-directive in Spark release docker script
 Key: SPARK-36551
 URL: https://issues.apache.org/jira/browse/SPARK-36551
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


After https://github.com/apache/spark/pull/32726, Python doc build requires 
sphinx-plotly-directive.
We should install it from spark-rm/Dockerfile to make sure do-release-docker.sh 
can run successfully. 
Also, we should mention it in the README of docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36550) Propagation cause when UDF reflection fails

2021-08-20 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-36550:
---
Description: 
Now when UDF reflection fails, InvocationTargetException is thrown, but it is 
not a specific exception.
{code:java}
Error in query: No handler for Hive UDF 'XXX': 
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
{code}

  was:Now when UDF reflection fails, InvocationTargetException is thrown, but 
it is not a specific exception.


> Propagation cause when UDF reflection fails
> ---
>
> Key: SPARK-36550
> URL: https://issues.apache.org/jira/browse/SPARK-36550
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Priority: Trivial
>
> Now when UDF reflection fails, InvocationTargetException is thrown, but it is 
> not a specific exception.
> {code:java}
> Error in query: No handler for Hive UDF 'XXX': 
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36550) Propagation cause when UDF reflection fails

2021-08-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36550:


Assignee: (was: Apache Spark)

> Propagation cause when UDF reflection fails
> ---
>
> Key: SPARK-36550
> URL: https://issues.apache.org/jira/browse/SPARK-36550
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Priority: Trivial
>
> Now when UDF reflection fails, InvocationTargetException is thrown, but it is 
> not a specific exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36550) Propagation cause when UDF reflection fails

2021-08-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402081#comment-17402081
 ] 

Apache Spark commented on SPARK-36550:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/33796

> Propagation cause when UDF reflection fails
> ---
>
> Key: SPARK-36550
> URL: https://issues.apache.org/jira/browse/SPARK-36550
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Priority: Trivial
>
> Now when UDF reflection fails, InvocationTargetException is thrown, but it is 
> not a specific exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36550) Propagation cause when UDF reflection fails

2021-08-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36550:


Assignee: Apache Spark

> Propagation cause when UDF reflection fails
> ---
>
> Key: SPARK-36550
> URL: https://issues.apache.org/jira/browse/SPARK-36550
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Trivial
>
> Now when UDF reflection fails, InvocationTargetException is thrown, but it is 
> not a specific exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36550) Propagation cause when UDF reflection fails

2021-08-20 Thread dzcxzl (Jira)
dzcxzl created SPARK-36550:
--

 Summary: Propagation cause when UDF reflection fails
 Key: SPARK-36550
 URL: https://issues.apache.org/jira/browse/SPARK-36550
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2
Reporter: dzcxzl


Now when UDF reflection fails, InvocationTargetException is thrown, but it is 
not a specific exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36532) Deadlock in CoarseGrainedExecutorBackend.onDisconnected

2021-08-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402071#comment-17402071
 ] 

Apache Spark commented on SPARK-36532:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/33795

> Deadlock in CoarseGrainedExecutorBackend.onDisconnected
> ---
>
> Key: SPARK-36532
> URL: https://issues.apache.org/jira/browse/SPARK-36532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0
>
>
> The deadlock has the exactly same root cause as SPARK-14180 but just happens 
> in a different code path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org