[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792355#comment-16792355
 ] 

Ajith S commented on SPARK-27122:
-

ping [~srowen] [~dongjoon] [~Gengliang.Wang]

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
> Attachments: image-2019-03-14-09-34-20-592.png, 
> image-2019-03-14-09-35-23-046.png
>
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27156) why is the "http://:18080/static" browse able?

2019-03-13 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792331#comment-16792331
 ] 

Shivu Sondur commented on SPARK-27156:
--

i am working on it

> why is the "http://:18080/static" browse able?
> 
>
> Key: SPARK-27156
> URL: https://issues.apache.org/jira/browse/SPARK-27156
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Web UI
>Affects Versions: 1.6.2
>Reporter: Jerry Garcia
>Priority: Minor
> Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png
>
>
> I would like to know is there a way to disable spark history server /static 
> folder ? Please do refer on the attachment provided. Reason for asking is for 
> security purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27157) Add Executor metrics to monitoring doc

2019-03-13 Thread Lantao Jin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792327#comment-16792327
 ] 

Lantao Jin commented on SPARK-27157:


I am going to prepare a PR

> Add Executor metrics to monitoring doc
> --
>
> Key: SPARK-27157
> URL: https://issues.apache.org/jira/browse/SPARK-27157
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.0
>Reporter: Lantao Jin
>Priority: Minor
>
> {{Executor Task Metrics}} exists in spark doc, we should add an {{Executor 
> Metrics}} section in monitoring.md.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27157) Add Executor metrics to monitoring doc

2019-03-13 Thread Lantao Jin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-27157:
---
Issue Type: Sub-task  (was: Documentation)
Parent: SPARK-23206

> Add Executor metrics to monitoring doc
> --
>
> Key: SPARK-27157
> URL: https://issues.apache.org/jira/browse/SPARK-27157
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.0
>Reporter: Lantao Jin
>Priority: Minor
>
> {{Executor Task Metrics}} exists in spark doc, we should add an {{Executor 
> Metrics}} section in monitoring.md.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27157) Add Executor metrics to monitoring doc

2019-03-13 Thread Lantao Jin (JIRA)
Lantao Jin created SPARK-27157:
--

 Summary: Add Executor metrics to monitoring doc
 Key: SPARK-27157
 URL: https://issues.apache.org/jira/browse/SPARK-27157
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.4.0
Reporter: Lantao Jin


{{Executor Task Metrics}} exists in spark doc, we should add an {{Executor 
Metrics}} section in monitoring.md.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318
 ] 

Ajith S edited comment on SPARK-27122 at 3/14/19 4:15 AM:
--

The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check 
attachment.

Here in  org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not 
shaded, it expects 
{color:#FF}org.eclipse.jetty.servlet.ServletContextHandler{color}
{code:java}
ui.getHandlers.map(_.getServletHandler()).foreach { h =>
  val holder = new FilterHolder(){code}
ui.getHandlers is in spark-core and its loaded from spark-core.jar which is 
shaded and hence refers to 
{color:#FF}org.spark_project.jetty.servlet.ServletContextHandler{color}

 And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!


was (Author: ajithshetty):
The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check 
attachment.

Here in  org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not 
shaded, it expects org.eclipse.jetty.servlet.ServletContextHandler
{code:java}
ui.getHandlers.map(_.getServletHandler()).foreach { h =>
  val holder = new FilterHolder(){code}
ui.getHandlers is in spark-core and its loaded from spark-core.jar which is 
shaded and hence refers to org.spark_project.jetty.servlet.ServletContextHandler

 And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
> Attachments: image-2019-03-14-09-34-20-592.png, 
> image-2019-03-14-09-35-23-046.png
>
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318
 ] 

Ajith S edited comment on SPARK-27122 at 3/14/19 4:14 AM:
--

The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check 
attachment.

Here in  org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not 
shaded, it expects org.eclipse.jetty.servlet.ServletContextHandler
{code:java}
ui.getHandlers.map(_.getServletHandler()).foreach { h =>
  val holder = new FilterHolder(){code}
ui.getHandlers is in spark-core and its loaded from spark-core.jar which is 
shaded and hence refers to org.spark_project.jetty.servlet.ServletContextHandler

 And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!


was (Author: ajithshetty):
The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check 
attachment.

 

And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
> Attachments: image-2019-03-14-09-34-20-592.png, 
> image-2019-03-14-09-35-23-046.png
>
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318
 ] 

Ajith S edited comment on SPARK-27122 at 3/14/19 4:07 AM:
--

The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check 
attachment.

 

And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!


was (Author: ajithshetty):
The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) 

 

And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
> Attachments: image-2019-03-14-09-34-20-592.png, 
> image-2019-03-14-09-35-23-046.png
>
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318
 ] 

Ajith S edited comment on SPARK-27122 at 3/14/19 4:06 AM:
--

The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) 

 

And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!


was (Author: ajithshetty):
The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) 

Here is test classpath info:

!image-2019-03-14-09-34-20-592.png!

 

And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
> Attachments: image-2019-03-14-09-34-20-592.png, 
> image-2019-03-14-09-35-23-046.png
>
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27122:

Attachment: image-2019-03-14-09-35-23-046.png

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
> Attachments: image-2019-03-14-09-34-20-592.png, 
> image-2019-03-14-09-35-23-046.png
>
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318
 ] 

Ajith S commented on SPARK-27122:
-

The problem seems to be shading of jetty package.

When we run test, the class path seems to be made from the classes 
folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar 
(resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) 

Here is test classpath info:

!image-2019-03-14-09-34-20-592.png!

 

And here is the javap command which shows the difference between 
org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder 
and classes folder

!image-2019-03-14-09-35-23-046.png!

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
> Attachments: image-2019-03-14-09-34-20-592.png, 
> image-2019-03-14-09-35-23-046.png
>
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27122:

Attachment: image-2019-03-14-09-34-20-592.png

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
> Attachments: image-2019-03-14-09-34-20-592.png
>
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?

2019-03-13 Thread Jerry Garcia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Garcia updated SPARK-27156:
-
Description: I would like to know is there a way to disable spark history 
server /static folder ? Please do refer on the attachment provide.   (was: I 
would like to know is there a way to disable spark history server /static 
folder ? Please do refer on the attachment provide. 
!image-2019-03-14-11-47-15-300.png!)

> why is the "http://:18080/static" browse able?
> 
>
> Key: SPARK-27156
> URL: https://issues.apache.org/jira/browse/SPARK-27156
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Web UI
>Affects Versions: 1.6.2
>Reporter: Jerry Garcia
>Priority: Minor
> Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png
>
>
> I would like to know is there a way to disable spark history server /static 
> folder ? Please do refer on the attachment provide. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?

2019-03-13 Thread Jerry Garcia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Garcia updated SPARK-27156:
-
Attachment: Screen Shot 2019-03-14 at 11.46.31 AM.png

> why is the "http://:18080/static" browse able?
> 
>
> Key: SPARK-27156
> URL: https://issues.apache.org/jira/browse/SPARK-27156
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Web UI
>Affects Versions: 1.6.2
>Reporter: Jerry Garcia
>Priority: Minor
> Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png
>
>
> I would like to know is there a way to disable spark history server /static 
> folder ? Please do refer on the attachment provide. 
> !image-2019-03-14-11-47-15-300.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?

2019-03-13 Thread Jerry Garcia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Garcia updated SPARK-27156:
-
Description: I would like to know is there a way to disable spark history 
server /static folder ? Please do refer on the attachment provided.   (was: I 
would like to know is there a way to disable spark history server /static 
folder ? Please do refer on the attachment provide. )

> why is the "http://:18080/static" browse able?
> 
>
> Key: SPARK-27156
> URL: https://issues.apache.org/jira/browse/SPARK-27156
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Web UI
>Affects Versions: 1.6.2
>Reporter: Jerry Garcia
>Priority: Minor
> Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png
>
>
> I would like to know is there a way to disable spark history server /static 
> folder ? Please do refer on the attachment provided. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?

2019-03-13 Thread Jerry Garcia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Garcia updated SPARK-27156:
-
Description: I would like to know is there a way to disable spark history 
server /static folder ? Please do refer on the attachment provided. Reason for 
asking is for security purposes.  (was: I would like to know is there a way to 
disable spark history server /static folder ? Please do refer on the attachment 
provided. )

> why is the "http://:18080/static" browse able?
> 
>
> Key: SPARK-27156
> URL: https://issues.apache.org/jira/browse/SPARK-27156
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Web UI
>Affects Versions: 1.6.2
>Reporter: Jerry Garcia
>Priority: Minor
> Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png
>
>
> I would like to know is there a way to disable spark history server /static 
> folder ? Please do refer on the attachment provided. Reason for asking is for 
> security purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27156) why is the "http://:18080/static" browse able?

2019-03-13 Thread Jerry Garcia (JIRA)
Jerry Garcia created SPARK-27156:


 Summary: why is the "http://:18080/static" browse 
able?
 Key: SPARK-27156
 URL: https://issues.apache.org/jira/browse/SPARK-27156
 Project: Spark
  Issue Type: Question
  Components: Spark Core, Web UI
Affects Versions: 1.6.2
Reporter: Jerry Garcia


I would like to know is there a way to disable spark history server /static 
folder ? Please do refer on the attachment provide. 
!image-2019-03-14-11-47-15-300.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792299#comment-16792299
 ] 

Ajith S commented on SPARK-27122:
-

I can reproduce this issue even in Java8. I would like to work on this.

 

> YARN test failures in Java 9+
> -
>
> Key: SPARK-27122
> URL: https://issues.apache.org/jira/browse/SPARK-27122
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Priority: Major
>
> Currently on Java 11:
> {code}
> YarnSchedulerBackendSuite:
> - RequestExecutors reflects node blacklist and is serializable
> - Respect user filters when adding AM IP filter *** FAILED ***
>   java.lang.ClassCastException: 
> org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to 
> org.eclipse.jetty.servlet.ServletContextHandler
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:237)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174)
>   at scala.Option.foreach(Option.scala:274)
>   ...
> {code}
> This looks like a classpath issue, probably ultimately related to the same 
> classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27155:


Assignee: (was: Apache Spark)

> Docker Oracle XE image docker image has been removed by DockerHub 
> --
>
> Key: SPARK-27155
> URL: https://issues.apache.org/jira/browse/SPARK-27155
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Zhu, Lipeng
>Priority: Major
> Attachments: image-2019-03-14-11-00-05-498.png
>
>
> Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
> by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
> which is the Oracle.
>  
> !image-2019-03-14-11-00-05-498.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub

2019-03-13 Thread Zhu, Lipeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu, Lipeng updated SPARK-27155:

Description: 
Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
which is the Oracle.

 

!image-2019-03-14-11-00-05-498.png!

  was:
Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
which is the Oracle.

 

!image-2019-03-14-10-59-31-099.png!


> Docker Oracle XE image docker image has been removed by DockerHub 
> --
>
> Key: SPARK-27155
> URL: https://issues.apache.org/jira/browse/SPARK-27155
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Zhu, Lipeng
>Priority: Major
> Attachments: image-2019-03-14-11-00-05-498.png
>
>
> Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
> by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
> which is the Oracle.
>  
> !image-2019-03-14-11-00-05-498.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub

2019-03-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792292#comment-16792292
 ] 

Apache Spark commented on SPARK-27155:
--

User 'lipzhu' has created a pull request for this issue:
https://github.com/apache/spark/pull/24086

> Docker Oracle XE image docker image has been removed by DockerHub 
> --
>
> Key: SPARK-27155
> URL: https://issues.apache.org/jira/browse/SPARK-27155
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Zhu, Lipeng
>Priority: Major
> Attachments: image-2019-03-14-11-00-05-498.png
>
>
> Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
> by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
> which is the Oracle.
>  
> !image-2019-03-14-11-00-05-498.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27155:


Assignee: Apache Spark

> Docker Oracle XE image docker image has been removed by DockerHub 
> --
>
> Key: SPARK-27155
> URL: https://issues.apache.org/jira/browse/SPARK-27155
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Zhu, Lipeng
>Assignee: Apache Spark
>Priority: Major
> Attachments: image-2019-03-14-11-00-05-498.png
>
>
> Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
> by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
> which is the Oracle.
>  
> !image-2019-03-14-11-00-05-498.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub

2019-03-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792293#comment-16792293
 ] 

Apache Spark commented on SPARK-27155:
--

User 'lipzhu' has created a pull request for this issue:
https://github.com/apache/spark/pull/24086

> Docker Oracle XE image docker image has been removed by DockerHub 
> --
>
> Key: SPARK-27155
> URL: https://issues.apache.org/jira/browse/SPARK-27155
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Zhu, Lipeng
>Priority: Major
> Attachments: image-2019-03-14-11-00-05-498.png
>
>
> Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
> by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
> which is the Oracle.
>  
> !image-2019-03-14-11-00-05-498.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub

2019-03-13 Thread Zhu, Lipeng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu, Lipeng updated SPARK-27155:

Attachment: image-2019-03-14-11-00-05-498.png

> Docker Oracle XE image docker image has been removed by DockerHub 
> --
>
> Key: SPARK-27155
> URL: https://issues.apache.org/jira/browse/SPARK-27155
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Zhu, Lipeng
>Priority: Major
> Attachments: image-2019-03-14-11-00-05-498.png
>
>
> Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
> by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
> which is the Oracle.
>  
> !image-2019-03-14-11-00-05-498.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub

2019-03-13 Thread Zhu, Lipeng (JIRA)
Zhu, Lipeng created SPARK-27155:
---

 Summary: Docker Oracle XE image docker image has been removed by 
DockerHub 
 Key: SPARK-27155
 URL: https://issues.apache.org/jira/browse/SPARK-27155
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.0
Reporter: Zhu, Lipeng


Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed 
by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner 
which is the Oracle.

 

!image-2019-03-14-10-59-31-099.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26555:


Assignee: Apache Spark

> Thread safety issue causes createDataset to fail with misleading errors
> ---
>
> Key: SPARK-26555
> URL: https://issues.apache.org/jira/browse/SPARK-26555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Martin Loncaric
>Assignee: Apache Spark
>Priority: Major
>
> This can be replicated (~2% of the time) with
> {code:scala}
> import java.sql.Timestamp
> import java.util.concurrent.{Executors, Future}
> import org.apache.spark.sql.SparkSession
> import scala.collection.mutable.ListBuffer
> import scala.concurrent.ExecutionContext
> import scala.util.Random
> object Main {
>   def main(args: Array[String]): Unit = {
> val sparkSession = SparkSession.builder
>   .getOrCreate()
> import sparkSession.implicits._
> val executor = Executors.newFixedThreadPool(1)
> try {
>   implicit val xc: ExecutionContext = 
> ExecutionContext.fromExecutorService(executor)
>   val futures = new ListBuffer[Future[_]]()
>   for (i <- 1 to 3) {
> futures += executor.submit(new Runnable {
>   override def run(): Unit = {
> val d = if (Random.nextInt(2) == 0) Some("d value") else None
> val e = if (Random.nextInt(2) == 0) Some(5.0) else None
> val f = if (Random.nextInt(2) == 0) Some(6.0) else None
> println("DEBUG", d, e, f)
> sparkSession.createDataset(Seq(
>   MyClass(new Timestamp(1L), "b", "c", d, e, f)
> ))
>   }
> })
>   }
>   futures.foreach(_.get())
> } finally {
>   println("SHUTDOWN")
>   executor.shutdown()
>   sparkSession.stop()
> }
>   }
>   case class MyClass(
> a: Timestamp,
> b: String,
> c: String,
> d: Option[String],
> e: Option[Double],
> f: Option[Double]
>   )
> }
> {code}
> So it will usually come up during
> {code:bash}
> for i in $(seq 1 200); do
>   echo $i
>   spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar
> done
> {code}
> causing a variety of possible errors, such as
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code}
> or
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26555:


Assignee: (was: Apache Spark)

> Thread safety issue causes createDataset to fail with misleading errors
> ---
>
> Key: SPARK-26555
> URL: https://issues.apache.org/jira/browse/SPARK-26555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Martin Loncaric
>Priority: Major
>
> This can be replicated (~2% of the time) with
> {code:scala}
> import java.sql.Timestamp
> import java.util.concurrent.{Executors, Future}
> import org.apache.spark.sql.SparkSession
> import scala.collection.mutable.ListBuffer
> import scala.concurrent.ExecutionContext
> import scala.util.Random
> object Main {
>   def main(args: Array[String]): Unit = {
> val sparkSession = SparkSession.builder
>   .getOrCreate()
> import sparkSession.implicits._
> val executor = Executors.newFixedThreadPool(1)
> try {
>   implicit val xc: ExecutionContext = 
> ExecutionContext.fromExecutorService(executor)
>   val futures = new ListBuffer[Future[_]]()
>   for (i <- 1 to 3) {
> futures += executor.submit(new Runnable {
>   override def run(): Unit = {
> val d = if (Random.nextInt(2) == 0) Some("d value") else None
> val e = if (Random.nextInt(2) == 0) Some(5.0) else None
> val f = if (Random.nextInt(2) == 0) Some(6.0) else None
> println("DEBUG", d, e, f)
> sparkSession.createDataset(Seq(
>   MyClass(new Timestamp(1L), "b", "c", d, e, f)
> ))
>   }
> })
>   }
>   futures.foreach(_.get())
> } finally {
>   println("SHUTDOWN")
>   executor.shutdown()
>   sparkSession.stop()
> }
>   }
>   case class MyClass(
> a: Timestamp,
> b: String,
> c: String,
> d: Option[String],
> e: Option[Double],
> f: Option[Double]
>   )
> }
> {code}
> So it will usually come up during
> {code:bash}
> for i in $(seq 1 200); do
>   echo $i
>   spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar
> done
> {code}
> causing a variety of possible errors, such as
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code}
> or
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27151) ClearCacheCommand extends IgnoreCahedData to avoid plan node copys

2019-03-13 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-27151:
-
Description: 
In SPARK-27011, we introduced IgnoreCahedData to avoid plan node copys in 
CacheManager.
Since ClearCacheCommand has no argument, it also can extend IgnoreCahedData.

  was:To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`.


> ClearCacheCommand extends IgnoreCahedData to avoid plan node copys
> --
>
> Key: SPARK-27151
> URL: https://issues.apache.org/jira/browse/SPARK-27151
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> In SPARK-27011, we introduced IgnoreCahedData to avoid plan node copys in 
> CacheManager.
> Since ClearCacheCommand has no argument, it also can extend IgnoreCahedData.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27151) ClearCacheCommand extends IgnoreCahedData to avoid plan node copys

2019-03-13 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-27151:
-
Summary: ClearCacheCommand extends IgnoreCahedData to avoid plan node copys 
 (was: ClearCacheCommand should be case-object to avoid copys)

> ClearCacheCommand extends IgnoreCahedData to avoid plan node copys
> --
>
> Key: SPARK-27151
> URL: https://issues.apache.org/jira/browse/SPARK-27151
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27152) Column equality does not work for aliased columns.

2019-03-13 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792248#comment-16792248
 ] 

Hyukjin Kwon commented on SPARK-27152:
--

What does this matter?

> Column equality does not work for aliased columns.
> --
>
> Key: SPARK-27152
> URL: https://issues.apache.org/jira/browse/SPARK-27152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Ryan Radtke
>Priority: Major
>
> assert($"zip".as("zip_code") equals $"zip".as("zip_code")) will return false
> assert($"zip" equals $"zip") will return true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27154) Incomplete Execution for Spark/dev/run-test-jenkins.py

2019-03-13 Thread Vaibhavd (JIRA)
Vaibhavd created SPARK-27154:


 Summary: Incomplete Execution for Spark/dev/run-test-jenkins.py
 Key: SPARK-27154
 URL: https://issues.apache.org/jira/browse/SPARK-27154
 Project: Spark
  Issue Type: Question
  Components: jenkins
Affects Versions: 2.4.0
 Environment: {code:java}
// code placeholder
{code}
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64" BUILD_DISPLAY_NAME="Jenkins 
build" BUILD_URL="xxx " 
GITHUB_PROJECT_URL="https://github.com/apache/spark; 
GITHUB_OAUTH_KEY="xxx" 
GITHUB_API_ENDPOINT="https://api.github.com/repos/apache/spark; 
AMPLAB_JENKINS_BUILD_TOOL="maven" AMPLAB_JENKINS="True" 
sha1="origin/pr/23560/merge" 
ghprbActualCommit="d73cfb51941f99516b7878acace26db35ea72076" 
ghprbActualCommitAuthor="jiafu.zh...@intel.com" 
ghprbActualCommitAuthorEmail="jiafu.zh...@intel.com" 
ghprbTriggerAuthor="Marcelo Vanzin" ghprbPullId=23560 
ghprbTargetBranch="master" ghprbSourceBranch="thread_conf_separation" 
GIT_BRANCH="thread_conf_separation" 
ghprbPullAuthorEmail="jiafu.zh...@intel.com" ghprbPullDescription="GitHub pull 
request #23560 of commit d73cfb51941f99516b7878acace26db35ea72076 automatically 
merged." ghprbPullTitle="[SPARK-26632][Core] Separate Thread Configurations of 
Driver and Executor" 
ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/23560
Reporter: Vaibhavd


When I run `Spark/dev/run-test-jenkins.py` with following env variables set. 
(Ref: 
[https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103464/parameters/])

The execution gets stuck at this point (build step),

 
{code:java}
[INFO] --- scala-maven-plugin:3.4.4:compile (scala-compile-first) @ 
spark-tags_2.12 ---
[INFO] Using zinc server for incremental compilation
[INFO] Toolchain in scala-maven-plugin: /usr/lib/jvm/java-8-openjdk-amd64
{code}
I am not sure what's going wrong. Am I missing some environment variable?

When I run `/dev/run-tests.py` there is no problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23264) Support interval values without INTERVAL clauses

2019-03-13 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-23264.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Resolved by https://github.com/apache/spark/pull/20433#

> Support interval values without INTERVAL clauses
> 
>
> Key: SPARK-23264
> URL: https://issues.apache.org/jira/browse/SPARK-23264
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.0
>
>
> The master currently cannot parse a SQL query below;
> {code:java}
> SELECT cast('2017-08-04' as date) + 1 days;
> {code}
> Since other dbms-like systems support this syntax (e.g., hive and mysql), it 
> might help to support in spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27093) Honor ParseMode in AvroFileFormat

2019-03-13 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792246#comment-16792246
 ] 

Hyukjin Kwon commented on SPARK-27093:
--

Yea, for second case, it sounds {{ignoreCorruptFiles}} should cover this case.
For third case, it looks like the file should not be able to read anyway. If 
the schema is known to be incompatible, those schema should not be used to 
write those Avro files, if that's intended to be read via Spark.
I think ORC or Parquet file formats don't also support those case and parse 
modes since it's quite unlikely to have few malformed rows in the entire 
dataset like CSV and JSON.

> Honor ParseMode in AvroFileFormat
> -
>
> Key: SPARK-27093
> URL: https://issues.apache.org/jira/browse/SPARK-27093
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 2.4.0
>Reporter: Tim Cerexhe
>Priority: Major
>
> The Avro reader is missing the ability to handle malformed or truncated files 
> like the JSON reader. Currently it throws exceptions when it encounters any 
> bad or truncated record in an Avro file, causing the entire Spark job to fail 
> from a single dodgy file. 
> Ideally the AvroFileFormat would accept a Permissive or DropMalformed 
> ParseMode like Spark's JSON format. This would enable the the Avro reader to 
> drop bad records and continue processing the good records rather than abort 
> the entire job. 
> Obviously the default could remain as FailFastMode, which is the current 
> effective behavior, so this wouldn’t break any existing users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors

2019-03-13 Thread Martin Loncaric (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792240#comment-16792240
 ] 

Martin Loncaric commented on SPARK-26555:
-

This is an existing issue with scala: https://github.com/scala/bug/issues/10766

> Thread safety issue causes createDataset to fail with misleading errors
> ---
>
> Key: SPARK-26555
> URL: https://issues.apache.org/jira/browse/SPARK-26555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Martin Loncaric
>Priority: Major
>
> This can be replicated (~2% of the time) with
> {code:scala}
> import java.sql.Timestamp
> import java.util.concurrent.{Executors, Future}
> import org.apache.spark.sql.SparkSession
> import scala.collection.mutable.ListBuffer
> import scala.concurrent.ExecutionContext
> import scala.util.Random
> object Main {
>   def main(args: Array[String]): Unit = {
> val sparkSession = SparkSession.builder
>   .getOrCreate()
> import sparkSession.implicits._
> val executor = Executors.newFixedThreadPool(1)
> try {
>   implicit val xc: ExecutionContext = 
> ExecutionContext.fromExecutorService(executor)
>   val futures = new ListBuffer[Future[_]]()
>   for (i <- 1 to 3) {
> futures += executor.submit(new Runnable {
>   override def run(): Unit = {
> val d = if (Random.nextInt(2) == 0) Some("d value") else None
> val e = if (Random.nextInt(2) == 0) Some(5.0) else None
> val f = if (Random.nextInt(2) == 0) Some(6.0) else None
> println("DEBUG", d, e, f)
> sparkSession.createDataset(Seq(
>   MyClass(new Timestamp(1L), "b", "c", d, e, f)
> ))
>   }
> })
>   }
>   futures.foreach(_.get())
> } finally {
>   println("SHUTDOWN")
>   executor.shutdown()
>   sparkSession.stop()
> }
>   }
>   case class MyClass(
> a: Timestamp,
> b: String,
> c: String,
> d: Option[String],
> e: Option[Double],
> f: Option[Double]
>   )
> }
> {code}
> So it will usually come up during
> {code:bash}
> for i in $(seq 1 200); do
>   echo $i
>   spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar
> done
> {code}
> causing a variety of possible errors, such as
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code}
> or
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25694) URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue

2019-03-13 Thread Eugene (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792230#comment-16792230
 ] 

Eugene commented on SPARK-25694:


[~howardatwork],

 

I have hit this problem as well, seems no workaround without change in 
httpclient.  I ever used scalaj.http, but replaced it httpcomponent

 

> URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue
> ---
>
> Key: SPARK-25694
> URL: https://issues.apache.org/jira/browse/SPARK-25694
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2
>Reporter: Bo Yang
>Priority: Minor
>
> URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
> returns FsUrlConnection object, which is not compatible with 
> HttpURLConnection. This will cause exception when using some third party http 
> library (e.g. scalaj.http).
> The following code in Spark 2.3.0 introduced the issue: 
> sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
> {code}
> object SharedState extends Logging  {   ...   
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())   ...
> }
> {code}
> Here is the example exception when using scalaj.http in Spark:
> {code}
>  StackTrace: scala.MatchError: 
> org.apache.hadoop.fs.FsUrlConnection:[http://.example.com|http://.example.com/]
>  (of class org.apache.hadoop.fs.FsUrlConnection)
>  at 
> scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
>  at scalaj.http.HttpRequest.exec(Http.scala:335)
>  at scalaj.http.HttpRequest.asString(Http.scala:455)
> {code}
>   
> One option to fix the issue is to return null in 
> URLStreamHandlerFactory.createURLStreamHandler when the protocol is 
> http/https, so it will use the default behavior and be compatible with 
> scalaj.http. Following is the code example:
> {code}
> class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with 
> Logging {
>   private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()
>   override def createURLStreamHandler(protocol: String): URLStreamHandler = {
> val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
> if (handler == null) {
>   return null
> }
> if (protocol != null &&
>   (protocol.equalsIgnoreCase("http")
>   || protocol.equalsIgnoreCase("https"))) {
>   // return null to use system default URLStreamHandler
>   null
> } else {
>   handler
> }
>   }
> }
> {code}
> I would like to get some discussion here before submitting a pull request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27141) Use ConfigEntry for hardcoded configs Yarn

2019-03-13 Thread wangjiaochun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792223#comment-16792223
 ] 

wangjiaochun commented on SPARK-27141:
--

I'm sorry,it‘s something mistake close this Jira. and I will reopen it again, 
but I have solved this problem and already pull it master.

> Use ConfigEntry for hardcoded configs Yarn
> --
>
> Key: SPARK-27141
> URL: https://issues.apache.org/jira/browse/SPARK-27141
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: wangjiaochun
>Priority: Major
> Fix For: 3.0.0
>
>
> Some of following Yarn file related configs are not use ConfigEntry value,try 
> to replace them. 
> ApplicationMaster
> YarnAllocatorSuite
> ApplicationMasterSuite
> BaseYarnClusterSuite
> YarnClusterSuite



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-13 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-26742.

Resolution: Fixed

Issue resolved by pull request 24002
[https://github.com/apache/spark/pull/24002]

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Assignee: Jiaxin Shan
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-13 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-26742:
--

Assignee: Jiaxin Shan

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Assignee: Jiaxin Shan
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-15251) Cannot apply PythonUDF to aggregated column

2019-03-13 Thread Matthew Livesey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Livesey closed SPARK-15251.
---

Cannot be reproduced on master, and is now years old. Closing as no longer 
relevant.

> Cannot apply PythonUDF to aggregated column
> ---
>
> Key: SPARK-15251
> URL: https://issues.apache.org/jira/browse/SPARK-15251
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.1
>Reporter: Matthew Livesey
>Priority: Major
>
> In scala it is possible to define a UDF an apply it to an aggregated value in 
> an expression, for example:
> {code}
> def timesTwo(x: Int): Int = x * 2
> sqlContext.udf.register("timesTwo", timesTwo _)
> sqlContext.sql("SELECT timesTwo(Sum(x)) t FROM my_data").show()
> case class Data(x: Int, y: String)
> val data = List(Data(1, "a"), Data(2, "b"))
> val rdd = sc.parallelize(data)
> val df = rdd.toDF
> df.registerTempTable("my_data")
> sqlContext.sql("SELECT timesTwo(Sum(x)) t FROM my_data").show() 
> +---+
> |  t|
> +---+
> |  6|
> +---+
> {code}
> Performing the same computation in pyspark:
> {code}
> def timesTwo(x):
> return x * 2
> sqlContext.udf.register("timesTwo", timesTwo)
> data = [(1, 'a'), (2, 'b')]
> rdd = sc.parallelize(data)
> df = sqlContext.createDataFrame(rdd, ["x", "y"])
> df.registerTempTable("my_data")
> sqlContext.sql("SELECT timesTwo(Sum(x)) t FROM my_data").show()
> {code}
> Gives the following:
> {code}
> AnalysisException: u"expression 'pythonUDF' is neither present in the group 
> by, nor is it an aggregate function. Add to group by or wrap in first() (or 
> first_value) if you don't care which value you get.;"
> {code}
> Using a lambda rather than a named function gives the same error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN

2019-03-13 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792120#comment-16792120
 ] 

Felix Cheung commented on SPARK-26910:
--

2.3.3 failed. we are waiting for 2.4.1 to be released

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>Assignee: Felix Cheung
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors

2019-03-13 Thread Martin Loncaric (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792086#comment-16792086
 ] 

Martin Loncaric edited comment on SPARK-26555 at 3/13/19 8:26 PM:
--

Update: I have proved that the issue lies in reflection thread safety issues in 
org.apache.spark.sql.catalyst.ScalaReflection: 
https://stackoverflow.com/questions/55150590/thread-safety-in-scala-reflection-with-type-matching

Investigating whether this can be fixed with different usage of the reflection 
library, or whether this is a scala issue.


was (Author: mwlon):
Update: I have been able to replicate this without Spark at all, using snippets 
from org.apache.spark.sql.catalyst.ScalaReflection: 
https://stackoverflow.com/questions/55150590/thread-safety-in-scala-reflection-with-type-matching

Investigating whether this can be fixed with different usage of the reflection 
library, or whether this is a scala issue.

> Thread safety issue causes createDataset to fail with misleading errors
> ---
>
> Key: SPARK-26555
> URL: https://issues.apache.org/jira/browse/SPARK-26555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Martin Loncaric
>Priority: Major
>
> This can be replicated (~2% of the time) with
> {code:scala}
> import java.sql.Timestamp
> import java.util.concurrent.{Executors, Future}
> import org.apache.spark.sql.SparkSession
> import scala.collection.mutable.ListBuffer
> import scala.concurrent.ExecutionContext
> import scala.util.Random
> object Main {
>   def main(args: Array[String]): Unit = {
> val sparkSession = SparkSession.builder
>   .getOrCreate()
> import sparkSession.implicits._
> val executor = Executors.newFixedThreadPool(1)
> try {
>   implicit val xc: ExecutionContext = 
> ExecutionContext.fromExecutorService(executor)
>   val futures = new ListBuffer[Future[_]]()
>   for (i <- 1 to 3) {
> futures += executor.submit(new Runnable {
>   override def run(): Unit = {
> val d = if (Random.nextInt(2) == 0) Some("d value") else None
> val e = if (Random.nextInt(2) == 0) Some(5.0) else None
> val f = if (Random.nextInt(2) == 0) Some(6.0) else None
> println("DEBUG", d, e, f)
> sparkSession.createDataset(Seq(
>   MyClass(new Timestamp(1L), "b", "c", d, e, f)
> ))
>   }
> })
>   }
>   futures.foreach(_.get())
> } finally {
>   println("SHUTDOWN")
>   executor.shutdown()
>   sparkSession.stop()
> }
>   }
>   case class MyClass(
> a: Timestamp,
> b: String,
> c: String,
> d: Option[String],
> e: Option[Double],
> f: Option[Double]
>   )
> }
> {code}
> So it will usually come up during
> {code:bash}
> for i in $(seq 1 200); do
>   echo $i
>   spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar
> done
> {code}
> causing a variety of possible errors, such as
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code}
> or
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors

2019-03-13 Thread Martin Loncaric (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792086#comment-16792086
 ] 

Martin Loncaric commented on SPARK-26555:
-

Update: I have been able to replicate this without Spark at all, using snippets 
from org.apache.spark.sql.catalyst.ScalaReflection: 
https://stackoverflow.com/questions/55150590/thread-safety-in-scala-reflection-with-type-matching

Investigating whether this can be fixed with different usage of the reflection 
library, or whether this is a scala issue.

> Thread safety issue causes createDataset to fail with misleading errors
> ---
>
> Key: SPARK-26555
> URL: https://issues.apache.org/jira/browse/SPARK-26555
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Martin Loncaric
>Priority: Major
>
> This can be replicated (~2% of the time) with
> {code:scala}
> import java.sql.Timestamp
> import java.util.concurrent.{Executors, Future}
> import org.apache.spark.sql.SparkSession
> import scala.collection.mutable.ListBuffer
> import scala.concurrent.ExecutionContext
> import scala.util.Random
> object Main {
>   def main(args: Array[String]): Unit = {
> val sparkSession = SparkSession.builder
>   .getOrCreate()
> import sparkSession.implicits._
> val executor = Executors.newFixedThreadPool(1)
> try {
>   implicit val xc: ExecutionContext = 
> ExecutionContext.fromExecutorService(executor)
>   val futures = new ListBuffer[Future[_]]()
>   for (i <- 1 to 3) {
> futures += executor.submit(new Runnable {
>   override def run(): Unit = {
> val d = if (Random.nextInt(2) == 0) Some("d value") else None
> val e = if (Random.nextInt(2) == 0) Some(5.0) else None
> val f = if (Random.nextInt(2) == 0) Some(6.0) else None
> println("DEBUG", d, e, f)
> sparkSession.createDataset(Seq(
>   MyClass(new Timestamp(1L), "b", "c", d, e, f)
> ))
>   }
> })
>   }
>   futures.foreach(_.get())
> } finally {
>   println("SHUTDOWN")
>   executor.shutdown()
>   sparkSession.stop()
> }
>   }
>   case class MyClass(
> a: Timestamp,
> b: String,
> c: String,
> d: Option[String],
> e: Option[Double],
> f: Option[Double]
>   )
> }
> {code}
> So it will usually come up during
> {code:bash}
> for i in $(seq 1 200); do
>   echo $i
>   spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar
> done
> {code}
> causing a variety of possible errors, such as
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: scala.MatchError: scala.Option[String] (of class 
> scala.reflect.internal.Types$ClassArgsTypeRef)
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code}
> or
> {code}Exception in thread "main" java.util.concurrent.ExecutionException: 
> java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> Caused by: java.lang.UnsupportedOperationException: Schema for type 
> scala.Option[scala.Double] is not supported
>   at 
> org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27123) Improve CollapseProject to handle projects cross limit/repartition/sample

2019-03-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791991#comment-16791991
 ] 

Apache Spark commented on SPARK-27123:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/24078

> Improve CollapseProject to handle projects cross limit/repartition/sample
> -
>
> Key: SPARK-27123
> URL: https://issues.apache.org/jira/browse/SPARK-27123
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> `CollapseProject` optimizer simplifies the plan by merging the adjacent 
> projects and performing alias substitution.
> {code:java}
> scala> sql("SELECT b c FROM (SELECT a b FROM t)").explain
> == Physical Plan ==
> *(1) Project [a#5 AS c#1]
> +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5]
> {code}
> We can do that more complex cases like the following.
> *BEFORE*
> {code:java}
> scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM 
> t)").explain
> == Physical Plan ==
> *(2) Project [b#0 AS c#1]
> +- Exchange RoundRobinPartitioning(1)
>+- *(1) Project [a#5 AS b#0]
>   +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5]
> {code}
> *AFTER*
> {code:java}
> scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM 
> t)").explain
> == Physical Plan ==
> Exchange RoundRobinPartitioning(1)
> +- *(1) Project [a#11 AS c#7]
>+- Scan hive default.t [a#11], HiveTableRelation `default`.`t`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#11]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25449) Don't send zero accumulators in heartbeats

2019-03-13 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791960#comment-16791960
 ] 

Shixiong Zhu commented on SPARK-25449:
--

I think this patch actually fixed a bug introduced by 
https://github.com/apache/spark/commit/0514e8d4b69615ba8918649e7e3c46b5713b6540 
It didn't use the correct default timeout. Before this batch, using 
`spark.executor.heartbeatInterval 30` would send a heartbeat every 30 ms, but 
each heartbeat RPC message timeout was 30 seconds.

This patch just unifies the default time unit in all usages of 
"spark.executor.heartbeatInterval".

> Don't send zero accumulators in heartbeats
> --
>
> Key: SPARK-25449
> URL: https://issues.apache.org/jira/browse/SPARK-25449
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Mukul Murthy
>Assignee: Mukul Murthy
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Heartbeats sent from executors to the driver every 10 seconds contain metrics 
> and are generally on the order of a few KBs. However, for large jobs with 
> lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks 
> to die with heartbeat failures. We can mitigate this by not sending zero 
> metrics to the driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27153) add weightCol in python RegressionEvaluator

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27153:


Assignee: Apache Spark

> add weightCol in python RegressionEvaluator
> ---
>
> Key: SPARK-27153
> URL: https://issues.apache.org/jira/browse/SPARK-27153
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Minor
>
> -https://issues.apache.org/jira/browse/SPARK-24102- added weightCol in 
> RegressionEvaluator.scala. This Jira will add weightCol in python version of 
> RegressionEvaluator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27153) add weightCol in python RegressionEvaluator

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27153:


Assignee: (was: Apache Spark)

> add weightCol in python RegressionEvaluator
> ---
>
> Key: SPARK-27153
> URL: https://issues.apache.org/jira/browse/SPARK-27153
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> -https://issues.apache.org/jira/browse/SPARK-24102- added weightCol in 
> RegressionEvaluator.scala. This Jira will add weightCol in python version of 
> RegressionEvaluator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2019-03-13 Thread yuhao yang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791938#comment-16791938
 ] 

yuhao yang commented on SPARK-20082:


Yuhao is taking family bonding leave from March 7th to Apr 19th . Please expect 
delayed email response. Conctact +86 13738085700 for anything urgent.

Thanks,
Yuhao


> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu DESPRIEE
>Priority: Major
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27153) add weightCol in python RegressionEvaluator

2019-03-13 Thread Huaxin Gao (JIRA)
Huaxin Gao created SPARK-27153:
--

 Summary: add weightCol in python RegressionEvaluator
 Key: SPARK-27153
 URL: https://issues.apache.org/jira/browse/SPARK-27153
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib, PySpark
Affects Versions: 3.0.0
Reporter: Huaxin Gao


-https://issues.apache.org/jira/browse/SPARK-24102- added weightCol in 
RegressionEvaluator.scala. This Jira will add weightCol in python version of 
RegressionEvaluator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24432) Add support for dynamic resource allocation

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24432:


Assignee: (was: Apache Spark)

> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24432) Add support for dynamic resource allocation

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24432:


Assignee: Apache Spark

> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Yinan Li
>Assignee: Apache Spark
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point

2019-03-13 Thread Marcellus de Castro Tavares (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791919#comment-16791919
 ] 

Marcellus de Castro Tavares commented on SPARK-20082:
-

Hi, is this feature still on the roadmap? It's been in progress for a while.

Thanks

> Incremental update of LDA model, by adding initialModel as start point
> --
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Mathieu DESPRIEE
>Priority: Major
>
> Some mllib models support an initialModel to start from and update it 
> incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally 
> update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27106) merge CaseInsensitiveStringMap and DataSourceOptions

2019-03-13 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-27106.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24025
[https://github.com/apache/spark/pull/24025]

> merge CaseInsensitiveStringMap and DataSourceOptions
> 
>
> Key: SPARK-27106
> URL: https://issues.apache.org/jira/browse/SPARK-27106
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27060) DDL Commands are accepting Keywords like create, drop as tableName

2019-03-13 Thread Thincrs (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791907#comment-16791907
 ] 

Thincrs commented on SPARK-27060:
-

A user of thincrs has selected this issue. Deadline: Wed, Mar 20, 2019 5:23 PM

> DDL Commands are accepting Keywords like create, drop as tableName
> --
>
> Key: SPARK-27060
> URL: https://issues.apache.org/jira/browse/SPARK-27060
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sachin Ramachandra Setty
>Priority: Minor
>
> Seems to be a compatibility issue compared to other components such as hive 
> and mySql. 
> DDL commands are successful even though the tableName is same as keyword. 
> Tested with columnNames as well and issue exists. 
> Whereas, Hive-Beeline is throwing ParseException and not accepting keywords 
> as tableName or columnName and mySql is accepting keywords only as columnName.
> Spark-Behaviour :
> {code}
> Connected to: Spark SQL (version 2.3.2.0101)
> CLI_DBMS_APPID
> Beeline version 1.2.1.spark_2.3.2.0101 by Apache Hive
> 0: jdbc:hive2://10.18.3.XXX:23040/default> create table create(id int);
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.255 seconds)
> 0: jdbc:hive2://10.18.3.XXX:23040/default> create table drop(int int);
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.257 seconds)
> 0: jdbc:hive2://10.18.3.XXX:23040/default> drop table drop;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.236 seconds)
> 0: jdbc:hive2://10.18.3.XXX:23040/default> drop table create;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.168 seconds)
> 0: jdbc:hive2://10.18.3.XXX:23040/default> create table tab1(float float);
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.111 seconds)
> 0: jdbc:hive2://10.18.XXX:23040/default> create table double(double float);
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.093 seconds)
> {code}
> Hive-Behaviour :
> {code}
> Connected to: Apache Hive (version 3.1.0)
> Driver: Hive JDBC (version 3.1.0)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 3.1.0 by Apache Hive
> 0: jdbc:hive2://10.18.XXX:21066/> create table create(id int);
> Error: Error while compiling statement: FAILED: ParseException line 1:13 
> cannot recognize input near 'create' '(' 'id' in table name 
> (state=42000,code=4)
> 0: jdbc:hive2://10.18.XXX:21066/> create table drop(id int);
> Error: Error while compiling statement: FAILED: ParseException line 1:13 
> cannot recognize input near 'drop' '(' 'id' in table name 
> (state=42000,code=4)
> 0: jdbc:hive2://10.18XXX:21066/> create table tab1(float float);
> Error: Error while compiling statement: FAILED: ParseException line 1:18 
> cannot recognize input near 'float' 'float' ')' in column name or constraint 
> (state=42000,code=4)
> 0: jdbc:hive2://10.18XXX:21066/> drop table create(id int);
> Error: Error while compiling statement: FAILED: ParseException line 1:11 
> cannot recognize input near 'create' '(' 'id' in table name 
> (state=42000,code=4)
> 0: jdbc:hive2://10.18.XXX:21066/> drop table drop(id int);
> Error: Error while compiling statement: FAILED: ParseException line 1:11 
> cannot recognize input near 'drop' '(' 'id' in table name 
> (state=42000,code=4)
> mySql :
> CREATE TABLE CREATE(ID integer);
> Error: near "CREATE": syntax error
> CREATE TABLE DROP(ID integer);
> Error: near "DROP": syntax error
> CREATE TABLE TAB1(FLOAT FLOAT);
> Success
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26103) OutOfMemory error with large query plans

2019-03-13 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-26103.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23169
[https://github.com/apache/spark/pull/23169]

> OutOfMemory error with large query plans
> 
>
> Key: SPARK-26103
> URL: https://issues.apache.org/jira/browse/SPARK-26103
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2
> Environment: Amazon EMR 5.19
> 1 c5.4xlarge master instance
> 1 c5.4xlarge core instance
> 2 c5.4xlarge task instances
>Reporter: Dave DeCaprio
>Assignee: Dave DeCaprio
>Priority: Major
> Fix For: 3.0.0
>
>
> Large query plans can cause OutOfMemory errors in the Spark driver.
> We are creating data frames that are not extremely large but contain lots of 
> nested joins.  These plans execute efficiently because of caching and 
> partitioning, but the text version of the query plans generated can be 
> hundreds of megabytes.  Running many of these in parallel causes our driver 
> process to fail.
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
> java.util.Arrays.copyOfRange(Arrays.java:2694) at 
> java.lang.String.(String.java:203) at 
> java.lang.StringBuilder.toString(StringBuilder.java:405) at 
> scala.StringContext.standardInterpolator(StringContext.scala:125) at 
> scala.StringContext.s(StringContext.scala:90) at 
> org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
>  
>  
> A similar error is reported in 
> [https://stackoverflow.com/questions/38307258/out-of-memory-error-when-writing-out-spark-dataframes-to-parquet-format]
>  
> Code exists to truncate the string if the number of output columns is larger 
> than 25, but not if the rest of the query plan is huge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26103) OutOfMemory error with large query plans

2019-03-13 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-26103:
--

Assignee: Dave DeCaprio

> OutOfMemory error with large query plans
> 
>
> Key: SPARK-26103
> URL: https://issues.apache.org/jira/browse/SPARK-26103
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2
> Environment: Amazon EMR 5.19
> 1 c5.4xlarge master instance
> 1 c5.4xlarge core instance
> 2 c5.4xlarge task instances
>Reporter: Dave DeCaprio
>Assignee: Dave DeCaprio
>Priority: Major
>
> Large query plans can cause OutOfMemory errors in the Spark driver.
> We are creating data frames that are not extremely large but contain lots of 
> nested joins.  These plans execute efficiently because of caching and 
> partitioning, but the text version of the query plans generated can be 
> hundreds of megabytes.  Running many of these in parallel causes our driver 
> process to fail.
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
> java.util.Arrays.copyOfRange(Arrays.java:2694) at 
> java.lang.String.(String.java:203) at 
> java.lang.StringBuilder.toString(StringBuilder.java:405) at 
> scala.StringContext.standardInterpolator(StringContext.scala:125) at 
> scala.StringContext.s(StringContext.scala:90) at 
> org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
>  
>  
> A similar error is reported in 
> [https://stackoverflow.com/questions/38307258/out-of-memory-error-when-writing-out-spark-dataframes-to-parquet-format]
>  
> Code exists to truncate the string if the number of output columns is larger 
> than 25, but not if the rest of the query plan is huge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27152) Column equality does not work for aliased columns.

2019-03-13 Thread Ryan Radtke (JIRA)
Ryan Radtke created SPARK-27152:
---

 Summary: Column equality does not work for aliased columns.
 Key: SPARK-27152
 URL: https://issues.apache.org/jira/browse/SPARK-27152
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Ryan Radtke


assert($"zip".as("zip_code") equals $"zip".as("zip_code")) will return false

assert($"zip" equals $"zip") will return true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791751#comment-16791751
 ] 

Ajith S edited comment on SPARK-26961 at 3/13/19 2:37 PM:
--

1) Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so *it has no effect for already 
created instance*

!image-2019-03-13-19-53-52-390.png!

 

2) URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor. Also as per javadoc

[https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html]
{code:java}
Note that the ClassLoader class is registered as parallel capable by default. 
However, its subclasses still need to register themselves if they are parallel 
capable. {code}
Hence MutableURLClassLoader lost its parallel capability by failing to register 
unlike URLClassLoader

 


was (Author: ajithshetty):
1) Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so it has no effect

!image-2019-03-13-19-53-52-390.png!

 

2) URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor. Also as per javadoc

[https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html]
{code:java}
Note that the ClassLoader class is registered as parallel capable by default. 
However, its subclasses still need to register themselves if they are parallel 
capable. {code}
Hence MutableURLClassLoader lost its parallel capability by failing to register 
unlike URLClassLoader

 

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
> Attachments: image-2019-03-13-19-53-52-390.png
>
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> 

[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27142:

Description: 
Currently for Monitoring Spark application SQL information is not available 
from REST but only via UI. REST provides only 
applications,jobs,stages,environment. This Jira is targeted to provide a REST 
API so that SQL level information can be found

 

Details: 
https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728

  was:Currently for Monitoring Spark application SQL information is not 
available from REST but only via UI. REST provides only 
applications,jobs,stages,environment. This Jira is targeted to provide a REST 
API so that SQL level information can be found


> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found
>  
> Details: 
> https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791751#comment-16791751
 ] 

Ajith S edited comment on SPARK-26961 at 3/13/19 2:32 PM:
--

1) Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so it has no effect

!image-2019-03-13-19-53-52-390.png!

 

2) URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor. Also as per javadoc

[https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html]
{code:java}
Note that the ClassLoader class is registered as parallel capable by default. 
However, its subclasses still need to register themselves if they are parallel 
capable. {code}
Hence MutableURLClassLoader lost its parallel capability by failing to register 
unlike URLClassLoader

 


was (Author: ajithshetty):
Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so it has no effect

!image-2019-03-13-19-53-52-390.png!

 

URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
> Attachments: image-2019-03-13-19-53-52-390.png
>
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> 

[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791739#comment-16791739
 ] 

Sean Owen commented on SPARK-26961:
---

When I run your class and print the result of registerAsParallelCapable, it 
returns true. Yes, parent initialization happens first, but URLClassLoader is 
also parallel capable.

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> "ForkJoinPool-1-worker-57":
>  at 

[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791751#comment-16791751
 ] 

Ajith S commented on SPARK-26961:
-

Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so it has no effect

!image-2019-03-13-19-53-52-390.png!

 

URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
> Attachments: image-2019-03-13-19-53-52-390.png
>
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> 

[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-26961:

Attachment: image-2019-03-13-19-53-52-390.png

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
> Attachments: image-2019-03-13-19-53-52-390.png
>
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> "ForkJoinPool-1-worker-57":
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
>  - waiting to lock <0x0005b7991168> (a 
> 

[jira] [Commented] (SPARK-27137) Spark captured variable is null if the code is pasted via :paste

2019-03-13 Thread Osira Ben (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791744#comment-16791744
 ] 

Osira Ben commented on SPARK-27137:
---

Indeed it works in the current master! Thank you!

> Spark captured variable is null if the code is pasted via :paste
> 
>
> Key: SPARK-27137
> URL: https://issues.apache.org/jira/browse/SPARK-27137
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Osira Ben
>Priority: Major
>
> If I execute this piece of code
> {code:java}
> val foo = "foo"
> def f(arg: Any): Unit = {
>   Option(42).foreach(_ => java.util.Objects.requireNonNull(foo, "foo"))
> }
> sc.parallelize(Seq(1, 2), 2).foreach(f)
> {code}
> {{in spark2-shell via :paste it throws}}
> {code:java}
> scala> :paste
> // Entering paste mode (ctrl-D to finish)
> val foo = "foo"
> def f(arg: Any): Unit = {
>   Option(42).foreach(_ => java.util.Objects.requireNonNull(foo, "foo"))
> }
> sc.parallelize(Seq(1, 2), 2).foreach(f)
> // Exiting paste mode, now interpreting.
> 19/03/11 15:02:06 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 
> (TID 2, hadoop.company.com, executor 1): java.lang.NullPointerException: foo
> at java.util.Objects.requireNonNull(Objects.java:228)
> {code}
> However if I execute it pasting without :paste or via spark2-shell -i it 
> doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-26961:

Attachment: (was: image-2019-03-13-19-51-38-708.png)

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> "ForkJoinPool-1-worker-57":
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
>  - waiting to lock <0x0005b7991168> (a 
> org.apache.spark.util.MutableURLClassLoader)
>  at 

[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-26961:

Attachment: image-2019-03-13-19-51-38-708.png

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
> Attachments: image-2019-03-13-19-51-38-708.png
>
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> "ForkJoinPool-1-worker-57":
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
>  - waiting to lock <0x0005b7991168> (a 
> 

[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27142:

Attachment: image-2019-03-13-19-29-26-896.png

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791728#comment-16791728
 ] 

Ajith S commented on SPARK-27142:
-

Ok apologies for being abstract about this requirement. Let me explain. A 
single SQL query can result into multiple jobs. So for end user who is using 
STS or spark-sql, the intended highest level of probe is the SQL which he has 
executed. This information can be seen from SQL tab. Attaching a sample. 

!image-2019-03-13-19-29-26-896.png!

But same information he cannot access using the REST API exposed by spark and 
he always have to rely on jobs API which may be difficult. So i intend to 
expose the information seen in SQL tab in UI via REST API

Mainly:
 # executionId :  long
 # status : string - possible values COMPLETED/RUNNING/FAILED
 # description : string - executed SQL string
 # submissionTime : formatted time of SQL submission
 # duration : string - total run time
 # runningJobIds : Seq[Int] - sequence of running job ids
 # failedJobIds : Seq[Int] - sequence of failed job ids
 # successJobIds : Seq[Int] - sequence of success job ids

 

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27142:

Attachment: (was: image-2019-03-13-19-19-27-831.png)

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27142:

Attachment: (was: image-2019-03-13-19-19-24-951.png)

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27142:

Attachment: image-2019-03-13-19-19-24-951.png

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
> Attachments: image-2019-03-13-19-19-24-951.png, 
> image-2019-03-13-19-19-27-831.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information

2019-03-13 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27142:

Attachment: image-2019-03-13-19-19-27-831.png

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
> Attachments: image-2019-03-13-19-19-24-951.png, 
> image-2019-03-13-19-19-27-831.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Description: 
I run this (reduced) following code multiples times under the same exact input 
files : 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
   d.foreach {
case (inputPath) => {
  val spark = sparkMain.newSession()
  
  spark.udf.register("myUdf",udf(myUdf _)) 

  val df = spark.read.format("csv").option("inferSchema", 
"false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 

  df.createOrReplaceTempView("mytable")

  val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
mytable """)

  sql.write.parquet( ... ) 
   }
 }
} finally {
  p.shutdown()
}{code}
Once in ten (spark-submit the application), the driver failed with an Exception 
related to Spark SQL and the UDF. However, as you can see, I have reduced the 
UDF to minimum, it now returns None everytime, and the problem still occurs. 
So, I think the problem is more likely related to having the driver submitting 
multiples jobs in parallel, aka "scheduling within apps".

The exception is as follow :
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 

[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791712#comment-16791712
 ] 

Ajith S commented on SPARK-26961:
-

[~srowen] That too will not work

Here is my custom classloader
{code:java}
class MYClassLoader(urls: Array[URL], parent: ClassLoader)
  extends URLClassLoader(urls, parent) {

  ClassLoader.registerAsParallelCapable()

  override def loadClass(name: String): Class[_] = {
super.loadClass(name)
  }
}
{code}
If we see class initialization flow, we see that super constructor is called 
before ClassLoader.registerAsParallelCapable() line is hit, hence it doesn't 
take effect 
{code:java}
:280, ClassLoader (java.lang)
:316, ClassLoader (java.lang)
:76, SecureClassLoader (java.security)
:100, URLClassLoader (java.net)
:23, MYClassLoader (org.apache.spark.util.ajith)
{code}
as per [https://github.com/scala/bug/issues/11429] scala 2.x do not have a pure 
static support yet. So moving classloader to a java based implementation may be 
only option we have

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at 

[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Description: 
I run this (reduced) following code multiples times under the same exact input 
files : 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
   d.foreach {
case (inputPath) => {
  val spark = sparkMain.newSession()
  
  spark.udf.register("myUdf",udf(myUdf _)) 

  val df = spark.read.format("csv").option("inferSchema", 
"false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 

  df.createOrReplaceTempView("mytable")

  val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
mytable """)

  sql.write.parquet( ... ) 
   }
 }
} finally {
  p.shutdown()
}{code}
Once in ten (spark-submit the application), the driver failed with an Exception 
related to Spark SQL and the UDF. However, as you can see, I have reduced the 
UDF to minimum, it now returns None everytime, and the problem still occurs. 
So, I think the problem is more likely related to having the driver submitting 
multiples jobs in parallel, aka "scheduling within apps".

The exception is as follow :
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 

[jira] [Updated] (SPARK-27151) ClearCacheCommand should be case-object to avoid copys

2019-03-13 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-27151:
-
Priority: Trivial  (was: Major)

> ClearCacheCommand should be case-object to avoid copys
> --
>
> Key: SPARK-27151
> URL: https://issues.apache.org/jira/browse/SPARK-27151
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27151) ClearCacheCommand should be case-object to avoid copys

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27151:


Assignee: (was: Apache Spark)

> ClearCacheCommand should be case-object to avoid copys
> --
>
> Key: SPARK-27151
> URL: https://issues.apache.org/jira/browse/SPARK-27151
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27151) ClearCacheCommand should be case-object to avoid copys

2019-03-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27151:


Assignee: Apache Spark

> ClearCacheCommand should be case-object to avoid copys
> --
>
> Key: SPARK-27151
> URL: https://issues.apache.org/jira/browse/SPARK-27151
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Major
>
> To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27151) ClearCacheCommand should be case-object to avoid copys

2019-03-13 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-27151:
-
Summary: ClearCacheCommand should be case-object to avoid copys  (was: 
Makes ClearCacheCommand case-object)

> ClearCacheCommand should be case-object to avoid copys
> --
>
> Key: SPARK-27151
> URL: https://issues.apache.org/jira/browse/SPARK-27151
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27151) Makes ClearCacheCommand case-object

2019-03-13 Thread Takeshi Yamamuro (JIRA)
Takeshi Yamamuro created SPARK-27151:


 Summary: Makes ClearCacheCommand case-object
 Key: SPARK-27151
 URL: https://issues.apache.org/jira/browse/SPARK-27151
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Takeshi Yamamuro


To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11284) ALS produces predictions as floats and should be double

2019-03-13 Thread Alex Combessie (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791671#comment-16791671
 ] 

Alex Combessie edited comment on SPARK-11284 at 3/13/19 1:00 PM:
-

Hello everyone, [~ddahlem], [~mengxr] 

I am still getting a Type error error when evaluating an ALS model in a 
pipeline. I have tested it on Spark 2.2.0.2.6.4.0-91. It is strange as it seems 
the issue is closed.

Here is the error message:

_Caused by: java.lang.ClassCastException: java.lang.Float cannot be cast to 
java.lang.Double_

Happy to provide more details. I am running the classic MovieLens example on a 
100k dataset. Any views on this?

Thanks,

Alex


was (Author: alex_combessie):
Hello everyone,

I am still getting a Type error error when evaluating an ALS model in a 
pipeline. I have tested it on Spark 2.2.0.2.6.4.0-91. It is strange as it seems 
the issue is closed.

Here is the error message:

_Caused by: java.lang.ClassCastException: java.lang.Float cannot be cast to 
java.lang.Double_

Happy to provide more details. I am running the classic MovieLens example on a 
100k dataset. Any views on this?

Thanks,

Alex

 

> ALS produces predictions as floats and should be double
> ---
>
> Key: SPARK-11284
> URL: https://issues.apache.org/jira/browse/SPARK-11284
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.5.1
> Environment: All
>Reporter: Dominik Dahlem
>Priority: Major
>  Labels: ml, recommender
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Using pyspark.ml and DataFrames, The ALS recommender cannot be evaluated 
> using the RegressionEvaluator, because of a type mis-match between the model 
> transformation and the evaluation APIs. One can work around this by casting 
> the prediction column into double before passing it into the evaluator. 
> However, this does not work with pipelines and cross validation.
> Code and traceback below:
> {code}
> als = ALS(rank=10, maxIter=30, regParam=0.1, userCol='userID', 
> itemCol='movieID', ratingCol='rating')
> model = als.fit(training)
> predictions = model.transform(validation)
> evaluator = RegressionEvaluator(predictionCol='prediction', 
> labelCol='rating')
> validationRmse = evaluator.evaluate(predictions, 
> {evaluator.metricName: 'rmse'})
> {code}
> Traceback:
> validationRmse = evaluator.evaluate(predictions, {evaluator.metricName: 
> 'rmse'})
>   File 
> "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py",
>  line 63, in evaluate
>   File 
> "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py",
>  line 94, in _evaluate
>   File 
> "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>  line 813, in __call__
>   File 
> "/Users/dominikdahlem/projects/repositories/spark/python/pyspark/sql/utils.py",
>  line 42, in deco
> raise IllegalArgumentException(s.split(': ', 1)[1])
> pyspark.sql.utils.IllegalArgumentException: requirement failed: Column 
> prediction must be of type DoubleType but was actually FloatType.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11284) ALS produces predictions as floats and should be double

2019-03-13 Thread Alex Combessie (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791671#comment-16791671
 ] 

Alex Combessie commented on SPARK-11284:


Hello everyone,

I am still getting a Type error error when evaluating an ALS model in a 
pipeline. I have tested it on Spark 2.2.0.2.6.4.0-91. It is strange as it seems 
the issue is closed.

Here is the error message:

_Caused by: java.lang.ClassCastException: java.lang.Float cannot be cast to 
java.lang.Double_

Happy to provide more details. I am running the classic MovieLens example on a 
100k dataset. Any views on this?

Thanks,

Alex

 

> ALS produces predictions as floats and should be double
> ---
>
> Key: SPARK-11284
> URL: https://issues.apache.org/jira/browse/SPARK-11284
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.5.1
> Environment: All
>Reporter: Dominik Dahlem
>Priority: Major
>  Labels: ml, recommender
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Using pyspark.ml and DataFrames, The ALS recommender cannot be evaluated 
> using the RegressionEvaluator, because of a type mis-match between the model 
> transformation and the evaluation APIs. One can work around this by casting 
> the prediction column into double before passing it into the evaluator. 
> However, this does not work with pipelines and cross validation.
> Code and traceback below:
> {code}
> als = ALS(rank=10, maxIter=30, regParam=0.1, userCol='userID', 
> itemCol='movieID', ratingCol='rating')
> model = als.fit(training)
> predictions = model.transform(validation)
> evaluator = RegressionEvaluator(predictionCol='prediction', 
> labelCol='rating')
> validationRmse = evaluator.evaluate(predictions, 
> {evaluator.metricName: 'rmse'})
> {code}
> Traceback:
> validationRmse = evaluator.evaluate(predictions, {evaluator.metricName: 
> 'rmse'})
>   File 
> "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py",
>  line 63, in evaluate
>   File 
> "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py",
>  line 94, in _evaluate
>   File 
> "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>  line 813, in __call__
>   File 
> "/Users/dominikdahlem/projects/repositories/spark/python/pyspark/sql/utils.py",
>  line 42, in deco
> raise IllegalArgumentException(s.split(': ', 1)[1])
> pyspark.sql.utils.IllegalArgumentException: requirement failed: Column 
> prediction must be of type DoubleType but was actually FloatType.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Affects Version/s: 2.3.2
   2.3.3

> Scheduling Within an Application : Spark SQL randomly failed on UDF
> ---
>
> Key: SPARK-27150
> URL: https://issues.apache.org/jira/browse/SPARK-27150
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: Josh Sean
>Priority: Major
>
> I run this (reduced) following code multiples times under the same exact 
> input files : 
> {code:java}
> def myUdf(input : java.lang.String) : Option[Long] = {
>   None
> }
> ...
> val sparkMain = ... .getOrCreate()
> val d = inputPaths.toList.par
> val p = new scala.concurrent.forkjoin.ForkJoinPool(12)
> try {
>d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
>    d.foreach {
> case (inputPath) => {
>   val spark = sparkMain.newSession()
>   
>   spark.udf.register("myUdf",udf(myUdf _)) 
>   val df = spark.read.format("csv").option("inferSchema", 
> "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 
>   df.createOrReplaceTempView("mytable")
>   val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
> mytable """)
>   sql.write.parquet( ... ) 
>}
>  }
> } finally {
>   p.shutdown()
> }{code}
> Once in ten (spark-submit the application), the driver failed with an 
> Exception related to Spark SQL and the UDF. However, as you can see, I have 
> reduced the UDF to minimum, it now returns None everytime, and the problem 
> still occurs. So, I think the problem is more likely related to having the 
> driver submitting multiples jobs in parallel, aka "scheduling within apps".
> The exception is as follow :
> {code:java}
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
> at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
> 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
> struct<> to bigint; line 5 pos 10;
> ...
> at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
> at 
> 

[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Affects Version/s: 2.4.0

> Scheduling Within an Application : Spark SQL randomly failed on UDF
> ---
>
> Key: SPARK-27150
> URL: https://issues.apache.org/jira/browse/SPARK-27150
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Josh Sean
>Priority: Major
>
> I run this (reduced) following code multiples times under the same exact 
> input files : 
> {code:java}
> def myUdf(input : java.lang.String) : Option[Long] = {
>   None
> }
> ...
> val sparkMain = ... .getOrCreate()
> val d = inputPaths.toList.par
> val p = new scala.concurrent.forkjoin.ForkJoinPool(12)
> try {
>d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
>    d.foreach {
> case (inputPath) => {
>   val spark = sparkMain.newSession()
>   
>   spark.udf.register("myUdf",udf(myUdf _)) 
>   val df = spark.read.format("csv").option("inferSchema", 
> "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 
>   df.createOrReplaceTempView("mytable")
>   val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
> mytable """)
>   sql.write.parquet( ... ) 
>}
>  }
> } finally {
>   p.shutdown()
> }{code}
> Once in ten (spark-submit the application), the driver failed with an 
> Exception related to Spark SQL and the UDF. However, as you can see, I have 
> reduced the UDF to minimum, it now returns None everytime, and the problem 
> still occurs. So, I think the problem is more likely related to having the 
> driver submitting multiples jobs in parallel, aka "scheduling within apps".
> The exception is as follow :
> {code:java}
> Exception in thread "main" java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
> at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
> 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
> struct<> to bigint; line 5 pos 10;
> ...
> at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
> at 
> 

[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791634#comment-16791634
 ] 

Sean Owen commented on SPARK-26961:
---

[~ajithshetty] I see, OK. It could happen in the class itself, in the 
constructor. The calls after the first would do nothing. 

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>  at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
> "ForkJoinPool-1-worker-57":
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:404)
>  - 

[jira] [Updated] (SPARK-27145) Close store after test, in the SQLAppStatusListenerSuite

2019-03-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-27145:
--
Issue Type: Improvement  (was: Bug)

> Close store after test, in the SQLAppStatusListenerSuite
> 
>
> Key: SPARK-27145
> URL: https://issues.apache.org/jira/browse/SPARK-27145
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 2.3.3, 2.4.0, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> We create many stores in the  SQLAppStatusListenerSuite, but we need to the 
> close store after test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27064) create StreamingWrite at the begining of streaming execution

2019-03-13 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-27064.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23981
[https://github.com/apache/spark/pull/23981]

> create StreamingWrite at the begining of streaming execution
> 
>
> Key: SPARK-27064
> URL: https://issues.apache.org/jira/browse/SPARK-27064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27143) Provide REST API for JDBC/ODBC level information

2019-03-13 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791615#comment-16791615
 ] 

Sean Owen commented on SPARK-27143:
---

Likewise here, we have a metrics system already. This doesn't say what you 
intend to expose. I think a REST API opens new questions about security, etc 
too, especially about SQL queries.

> Provide REST API for JDBC/ODBC level information
> 
>
> Key: SPARK-27143
> URL: https://issues.apache.org/jira/browse/SPARK-27143
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
>
> Currently for Monitoring Spark application JDBC/ODBC information is not 
> available from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that JDBC/ODBC level information like session statistics, sql 
> staistics can be provided



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information

2019-03-13 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791614#comment-16791614
 ] 

Sean Owen commented on SPARK-27142:
---

We have a metrics system for metrics and a SQL tab for SQL queries. I don't 
imagine we need a new REST API?
This doesn't say what info you are trying to expose?

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Priority: Minor
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Description: 
I run this (reduced) following code multiples times under the same exact input 
files : 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
   d.foreach {
case (inputPath) => {
  val spark = sparkMain.newSession()
  
  spark.udf.register("myUdf",udf(myUdf _)) 

  val df = spark.read.format("csv").option("inferSchema", 
"false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 

  df.createOrReplaceTempView("mytable")

  val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
mytable """)

  sql.write.parquet( ... ) 
   }
 }
} finally {
  p.shutdown()
}{code}
Once in ten (spark-submit the application), the driver failed with an Exception 
related to Spark SQL and the UDF. However, as you can see, I have reduced the 
UDF to minimum, it now returns None everytime, and the problem still occurs. 
So, I think the problem is more likely related to having the driver submitting 
multiples jobs in parallel, aka "scheduling within apps".

The exception is as follow :
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 

[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Description: 
I run this (reduced) following code multiples times under the same exact input 
files : 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
   d.foreach {
case (inputPath) => {
  val spark = sparkMain.newSession()
  
  spark.udf.register("myUdf",udf(myUdf _)) 

  val df = spark.read.format("csv").option("inferSchema", 
"false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 

  df.createOrReplaceTempView("mytable")

  val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
mytable """)

  sql.write.parquet( ... ) 
   }
 }
} finally {
  p.shutdown()
}{code}
Once in ten (spark-submit the application), the driver failed with an Exception 
related to Spark SQL and the UDF. However, as you can see, I have reduced the 
UDF to minimum, it now returns None everytime, and the problem still occurs. 
So, I think the problem is more likely related to having the driver submitting 
multiples jobs in parallel, aka "scheduling within apps".

The exception is as follow :
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 

[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Description: 
I run this (reduced) following code multiples times under the same exact input 
files : 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
   d.foreach {
case (inputPath) => {
  val spark = sparkMain.newSession()
  
  spark.udf.register("myUdf",udf(myUdf _)) 

  val df = spark.read.format("csv").option("inferSchema", 
"false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 

  df.createOrReplaceTempView("mytable")

  val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
mytable """)

  sql.write.parquet( ... ) 
   }
 }
} finally {
  p.shutdown()
}{code}
Once in ten (spark-submit the application), the driver failed with an Exception 
related to Spark SQL and the UDF. However, as you can see, I have reduced the 
UDF to minimum, it now returns None everytime, and the problem still occurs. 
So, I think the problem is more likely related to having the driver submitting 
multiples jobs in parallel, aka "scheduling within apps".

The exception is as follow :
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 

[jira] [Updated] (SPARK-27144) Explode with structType may throw NPE when the first column's nullable is false while the second column's nullable is true

2019-03-13 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27144:
-
Description: 
 Create a dataFrame containing two columns names [weight, animal], the weight's 
nullable is false while the animal' nullable is true.

Give null value in the col animal,

then construct a new column with 
{code:java}
explode(
array(
  struct(lit("weight").alias("key"), 
col("weight").cast(StringType).alias("value")),
  struct(lit("animal").alias("key"), 
col("animal").cast(StringType).alias("value"))
  )
  )
{code}
 then select the struct with .*,  Spark will throw NPE
{code:java}
19/03/13 14:39:10 INFO DAGScheduler: ResultStage 3 (show at SparkTest.scala:74) 
failed in 0.043 s due to Job aborted due to stage failure: Task 3 in stage 3.0 
failed 1 times, most recent failure: Lost task 3.0 in stage 3.0 (TID 9, 
localhost, executor driver): java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.project_doConsume$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

{code}
 

Codes for reproduce: 
{code:java}
val data = Seq(Row(20.0, "dog","a"), Row(3.5, "cat","b"), Row(0.06, 
null,"c"))

val schema = StructType(List(
  StructField("weight", DoubleType, false),
  StructField("animal", StringType, true),
  StructField("extra", StringType, true)))

val col1 = "weight"
val col2 = "animal"

val originalDF = spark.createDataFrame(spark.sparkContext.parallelize(data), 
schema)

// This should fail in select(test.*)
val df1 = originalDF.withColumn(
  "test",
  explode(array(
struct(lit(col1).alias("key"), col(col1).cast(StringType).alias("value")),
struct(lit(col2).alias("key"), col(col2).cast(StringType).alias("value")
df1.printSchema()
df1.select("test.*").show()


// This should succeed in select(test.*)
val df2 = originalDF.withColumn(
  "test",
  explode(array(
   struct(lit(col2).alias("key"), col(col2).cast(StringType).alias("value")),
   struct(lit(col1).alias("key"), col(col1).cast(StringType).alias("value")
df2.printSchema()
df2.select("test.*").show()
{code}

  was:
 Create a dataFrame containing two columns names [weight, animal], the weight's 
nullable is false while the animal' nullable is true.

Give null value in the col animal,

then construct a new column with 
{code:java}
explode(
array(
  struct(lit("weight").alias("key"), 
col("weight").cast(StringType).alias("value")),
  struct(lit("animal").alias("key"), 
col("animal").cast(StringType).alias("value"))
  )
  )
{code}
 then select the struct with .*,  Spark will throw NPE
{code:java}
19/03/13 14:39:10 INFO DAGScheduler: ResultStage 3 (show at SparkTest.scala:74) 
failed in 0.043 s due to Job aborted due to stage failure: Task 3 in stage 3.0 
failed 1 times, most recent failure: Lost task 3.0 in stage 3.0 (TID 9, 
localhost, executor driver): java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194)
at 

[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Description: 
 

I run this (reduced) following code multiples times under the same exact input 
files (about 100) :

 

 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
   d.foreach {
case (inputPath) => {
  val spark = sparkMain.newSession()
  
  spark.udf.register("myUdf",udf(myUdf _)) 

  val df = spark.read.format("csv").option("inferSchema", 
"false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 

  df.createOrReplaceTempView("mytable")

  val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
mytable """)

  sql.write.parquet( ... ) 
   }
 }
} finally {
  p.shutdown()
}{code}
Once in ten (spark-submit the application), the driver failed with an Exception 
related to Spark SQL and the UDF. However, as you can see, I have reduced the 
UDF to minimum, it returns None everytime, and the problem still occurs. I 
think the problem is related to having the driver submitting multiples jobs, 
aka "scheduling within apps".

The exception is as follow :
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 

[jira] [Commented] (SPARK-27146) Add two Yarn Configs according to spark home page configuration Instructions document

2019-03-13 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791587#comment-16791587
 ] 

Hyukjin Kwon commented on SPARK-27146:
--

It doesn't necessarily document all the configurations. I'd explain why this is 
important to document in the site. Otherwise, let's don't add them.

> Add two Yarn Configs according to spark home page configuration Instructions 
> document
> -
>
> Key: SPARK-27146
> URL: https://issues.apache.org/jira/browse/SPARK-27146
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: wangjiaochun
>Priority: Minor
> Fix For: 3.0.0
>
>
> In web pages http://spark.apache.org/docs/latest/running-on-yarn.html,there 
> are two configuration option【spark.yarn.dist.forceDownloadSchemes and 
> spark.blacklist.application.maxFailedExecutorsPerNode】not implemented in yarn 
> config file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Description: 
 

I run this (reduced) following code multiples times under the same exact input 
files (about 100) :

 

 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
   d.foreach {
case (inputPath) => {
  val spark = sparkMain.newSession()
  
  spark.udf.register("myUdf",udf(myUdf _)) 

  val df = spark.read.format("csv").option("inferSchema", 
"false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 

  df.createOrReplaceTempView("mytable")

  val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
mytable """)

  sql.write.parquet( ... ) 
   }
 }
} finally {
  p.shutdown()
}{code}
Once in ten (spark-submit the application), the driver failed with an Exception 
related to Spark SQL and the UDF. However, as you can see, I have reduced the 
UDF to minimum, it returns None everytime, and the problem still occurs. 

The exception is as follow :
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:127)
at 

[jira] [Resolved] (SPARK-27144) Explode with structType may throw NPE when the first column's nullable is false while the second column's nullable is true

2019-03-13 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-27144.
--
Resolution: Cannot Reproduce

> Explode with structType may throw NPE when the first column's nullable is 
> false while the second column's nullable is true
> --
>
> Key: SPARK-27144
> URL: https://issues.apache.org/jira/browse/SPARK-27144
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
> Environment: Spark 2.3.0, local mode.
>Reporter: Yoga
>Priority: Major
>
>  Create a dataFrame containing two columns names [weight, animal], the 
> weight's nullable is false while the animal' nullable is true.
> Give null value in the col animal,
> then construct a new column with 
> {code:java}
> explode(
> array(
>   struct(lit("weight").alias("key"), 
> col("weight").cast(StringType).alias("value")),
>   struct(lit("animal").alias("key"), 
> col("animal").cast(StringType).alias("value"))
>   )
>   )
> {code}
>  then select the struct with .*,  Spark will throw NPE
> {code:java}
> 19/03/13 14:39:10 INFO DAGScheduler: ResultStage 3 (show at 
> SparkTest.scala:74) failed in 0.043 s due to Job aborted due to stage 
> failure: Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task 
> 3.0 in stage 3.0 (TID 9, localhost, executor driver): 
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.project_doConsume$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Codes for reproduce: 
> {code:java}
> val data = Seq(
>   Row(20.0, "dog","a"),
>   Row(3.5, "cat","b"),
>   Row(0.06, null,"c")
> )
>  val schema = StructType(List(
> StructField("weight", DoubleType, false),
> StructField("animal", StringType, true),
> StructField("extra", StringType, true)
>   )
> )
>  val col1 = "weight"
>  val col2 = "animal"
> //this should fail in select(test.*)
> val df1 = originalDF.withColumn("test",
>   explode(
> array(
>   struct(lit(col1).alias("key"), 
> col(col1).cast(StringType).alias("value")),
>   struct(lit(col2).alias("key"), 
> col(col2).cast(StringType).alias("value"))
>   )
>   )
> )
> df1.printSchema()
> df1.select("test.*").show()
> // this should succeed in select(test.*)
> val df2 = originalDF.withColumn("test",
>   explode(
> array(
>   struct(lit(col2).alias("key"), 
> col(col2).cast(StringType).alias("value")),
>   struct(lit(col1).alias("key"), 
> col(col1).cast(StringType).alias("value"))
> )
>   )
> )
> df2.printSchema()
> dfs.select("test.*").show()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For 

[jira] [Commented] (SPARK-27144) Explode with structType may throw NPE when the first column's nullable is false while the second column's nullable is true

2019-03-13 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791580#comment-16791580
 ] 

Hyukjin Kwon commented on SPARK-27144:
--

It works in the current master:

{code}

scala> df1.select("test.*").show()
+--+--+
|   key| value|
+--+--+
|weight|  20.0|
|animal|   dog|
|weight|   3.5|
|animal|   cat|
|weight|6.0E-6|
|animal|  null|
+--+--+
{code}

{code}
scala> df2.select("test.*").show()
+--+--+
|   key| value|
+--+--+
|animal|   dog|
|weight|  20.0|
|animal|   cat|
|weight|   3.5|
|animal|  null|
|weight|6.0E-6|
+--+--+
{code}

I'd be great if the JIRA can be identified and backported if applicable. I am 
resolving this for now anyway.


> Explode with structType may throw NPE when the first column's nullable is 
> false while the second column's nullable is true
> --
>
> Key: SPARK-27144
> URL: https://issues.apache.org/jira/browse/SPARK-27144
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
> Environment: Spark 2.3.0, local mode.
>Reporter: Yoga
>Priority: Major
>
>  Create a dataFrame containing two columns names [weight, animal], the 
> weight's nullable is false while the animal' nullable is true.
> Give null value in the col animal,
> then construct a new column with 
> {code:java}
> explode(
> array(
>   struct(lit("weight").alias("key"), 
> col("weight").cast(StringType).alias("value")),
>   struct(lit("animal").alias("key"), 
> col("animal").cast(StringType).alias("value"))
>   )
>   )
> {code}
>  then select the struct with .*,  Spark will throw NPE
> {code:java}
> 19/03/13 14:39:10 INFO DAGScheduler: ResultStage 3 (show at 
> SparkTest.scala:74) failed in 0.043 s due to Job aborted due to stage 
> failure: Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task 
> 3.0 in stage 3.0 (TID 9, localhost, executor driver): 
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.project_doConsume$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Codes for reproduce: 
> {code:java}
> val data = Seq(
>   Row(20.0, "dog","a"),
>   Row(3.5, "cat","b"),
>   Row(0.06, null,"c")
> )
>  val schema = StructType(List(
> StructField("weight", DoubleType, false),
> StructField("animal", StringType, true),
> StructField("extra", StringType, true)
>   )
> )
>  val col1 = "weight"
>  val col2 = "animal"
> //this should fail in select(test.*)
> val df1 = originalDF.withColumn("test",
>   explode(
> array(
>   struct(lit(col1).alias("key"), 
> col(col1).cast(StringType).alias("value")),
>   struct(lit(col2).alias("key"), 
> col(col2).cast(StringType).alias("value"))
>   )
>   )
> )
> df1.printSchema()
> df1.select("test.*").show()
> // this should 

[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Sean updated SPARK-27150:
--
Description: 
 

I run this (reduced) following code multiples times under the same exact input 
files (about 100) :

 

 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
   d.foreach {
case (inputPath) => {
  val spark = sparkMain.newSession()
  
  spark.udf.register("myUdf",udf(myUdf _)) 

  val df = spark.read.format("csv").option("inferSchema", 
"false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) 

  df.createOrReplaceTempView("mytable")

  val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM 
mytable """)

  sql.write.parquet( ... ) 
   }
 }
}{code}
Once in ten application submit, the driver failed with an Exception related to 
Spark SQL and the UDF. However, as you can see, I have reduced the UDF to 
minimum, it's only return None everytime, and the problem still occurs. 

The exception is as follow :
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:95)
at 

[jira] [Created] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF

2019-03-13 Thread Josh Sean (JIRA)
Josh Sean created SPARK-27150:
-

 Summary: Scheduling Within an Application : Spark SQL randomly 
failed on UDF
 Key: SPARK-27150
 URL: https://issues.apache.org/jira/browse/SPARK-27150
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 2.3.1
Reporter: Josh Sean


 

I run this (reduced) following code multiples times under the same exact input 
files (about 100) :

 

 
{code:java}
def myUdf(input : java.lang.String) : Option[Long] = {
  None
}

...

val sparkMain = ... .getOrCreate()
val d = inputPaths.toList.par
val p = new scala.concurrent.forkjoin.ForkJoinPool(12)

try {

   d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p)
 d.foreach {
 case (inputPath) => {
val spark = sparkMain.newSession()
spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read
 .format("csv") .option("inferSchema", "false")
 .option("mode", "DROPMALFORMED")
 .schema(mySchema)
 .load(inputPath)

 df.createOrReplaceTempView("mytable")

val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable 
""")

 sql
 .write
 .parquet( ... )

 }
 }
}{code}
Once in ten application submit, the driver failed with an Exception related to 
Spark SQL and the UDF. However, as you can see, I have reduced the UDF to 
minimum, it's only return None everytime, and the problem still occurs. 

The exception is that :

 
{code:java}
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 
'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast 
struct<> to bigint; line 5 pos 10;
...
at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at 

[jira] [Commented] (SPARK-27141) Use ConfigEntry for hardcoded configs Yarn

2019-03-13 Thread Sandeep Katta (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791489#comment-16791489
 ] 

Sandeep Katta commented on SPARK-27141:
---

@[~wangjch] I still see some hardcoded configs present [Code 
here|https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L215]

Why this Jira is closed ?

> Use ConfigEntry for hardcoded configs Yarn
> --
>
> Key: SPARK-27141
> URL: https://issues.apache.org/jira/browse/SPARK-27141
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: wangjiaochun
>Priority: Major
> Fix For: 3.0.0
>
>
> Some of following Yarn file related configs are not use ConfigEntry value,try 
> to replace them. 
> ApplicationMaster
> YarnAllocatorSuite
> ApplicationMasterSuite
> BaseYarnClusterSuite
> YarnClusterSuite



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >