[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792355#comment-16792355 ] Ajith S commented on SPARK-27122: - ping [~srowen] [~dongjoon] [~Gengliang.Wang] > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27156) why is the "http://:18080/static" browse able?
[ https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792331#comment-16792331 ] Shivu Sondur commented on SPARK-27156: -- i am working on it > why is the "http://:18080/static" browse able? > > > Key: SPARK-27156 > URL: https://issues.apache.org/jira/browse/SPARK-27156 > Project: Spark > Issue Type: Question > Components: Spark Core, Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png > > > I would like to know is there a way to disable spark history server /static > folder ? Please do refer on the attachment provided. Reason for asking is for > security purposes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27157) Add Executor metrics to monitoring doc
[ https://issues.apache.org/jira/browse/SPARK-27157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792327#comment-16792327 ] Lantao Jin commented on SPARK-27157: I am going to prepare a PR > Add Executor metrics to monitoring doc > -- > > Key: SPARK-27157 > URL: https://issues.apache.org/jira/browse/SPARK-27157 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 2.4.0 >Reporter: Lantao Jin >Priority: Minor > > {{Executor Task Metrics}} exists in spark doc, we should add an {{Executor > Metrics}} section in monitoring.md. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27157) Add Executor metrics to monitoring doc
[ https://issues.apache.org/jira/browse/SPARK-27157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lantao Jin updated SPARK-27157: --- Issue Type: Sub-task (was: Documentation) Parent: SPARK-23206 > Add Executor metrics to monitoring doc > -- > > Key: SPARK-27157 > URL: https://issues.apache.org/jira/browse/SPARK-27157 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 2.4.0 >Reporter: Lantao Jin >Priority: Minor > > {{Executor Task Metrics}} exists in spark doc, we should add an {{Executor > Metrics}} section in monitoring.md. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27157) Add Executor metrics to monitoring doc
Lantao Jin created SPARK-27157: -- Summary: Add Executor metrics to monitoring doc Key: SPARK-27157 URL: https://issues.apache.org/jira/browse/SPARK-27157 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.4.0 Reporter: Lantao Jin {{Executor Task Metrics}} exists in spark doc, we should add an {{Executor Metrics}} section in monitoring.md. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318 ] Ajith S edited comment on SPARK-27122 at 3/14/19 4:15 AM: -- The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. Here in org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not shaded, it expects {color:#FF}org.eclipse.jetty.servlet.ServletContextHandler{color} {code:java} ui.getHandlers.map(_.getServletHandler()).foreach { h => val holder = new FilterHolder(){code} ui.getHandlers is in spark-core and its loaded from spark-core.jar which is shaded and hence refers to {color:#FF}org.spark_project.jetty.servlet.ServletContextHandler{color} And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! was (Author: ajithshetty): The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. Here in org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not shaded, it expects org.eclipse.jetty.servlet.ServletContextHandler {code:java} ui.getHandlers.map(_.getServletHandler()).foreach { h => val holder = new FilterHolder(){code} ui.getHandlers is in spark-core and its loaded from spark-core.jar which is shaded and hence refers to org.spark_project.jetty.servlet.ServletContextHandler And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318 ] Ajith S edited comment on SPARK-27122 at 3/14/19 4:14 AM: -- The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. Here in org.apache.spark.scheduler.cluster.YarnSchedulerBackend, as its not shaded, it expects org.eclipse.jetty.servlet.ServletContextHandler {code:java} ui.getHandlers.map(_.getServletHandler()).foreach { h => val holder = new FilterHolder(){code} ui.getHandlers is in spark-core and its loaded from spark-core.jar which is shaded and hence refers to org.spark_project.jetty.servlet.ServletContextHandler And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! was (Author: ajithshetty): The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318 ] Ajith S edited comment on SPARK-27122 at 3/14/19 4:07 AM: -- The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) . Check attachment. And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! was (Author: ajithshetty): The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318 ] Ajith S edited comment on SPARK-27122 at 3/14/19 4:06 AM: -- The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! was (Author: ajithshetty): The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) Here is test classpath info: !image-2019-03-14-09-34-20-592.png! And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27122: Attachment: image-2019-03-14-09-35-23-046.png > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792318#comment-16792318 ] Ajith S commented on SPARK-27122: - The problem seems to be shading of jetty package. When we run test, the class path seems to be made from the classes folder(resource-managers/yarn/target/scala-2.12/classes) instead of jar (resource-managers/yarn/target/spark-yarn_2.12-3.0.0-SNAPSHOT.jar) Here is test classpath info: !image-2019-03-14-09-34-20-592.png! And here is the javap command which shows the difference between org.apache.spark.scheduler.cluster.YarnSchedulerBackend present in jar folder and classes folder !image-2019-03-14-09-35-23-046.png! > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png, > image-2019-03-14-09-35-23-046.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27122: Attachment: image-2019-03-14-09-34-20-592.png > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > Attachments: image-2019-03-14-09-34-20-592.png > > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?
[ https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27156: - Description: I would like to know is there a way to disable spark history server /static folder ? Please do refer on the attachment provide. (was: I would like to know is there a way to disable spark history server /static folder ? Please do refer on the attachment provide. !image-2019-03-14-11-47-15-300.png!) > why is the "http://:18080/static" browse able? > > > Key: SPARK-27156 > URL: https://issues.apache.org/jira/browse/SPARK-27156 > Project: Spark > Issue Type: Question > Components: Spark Core, Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png > > > I would like to know is there a way to disable spark history server /static > folder ? Please do refer on the attachment provide. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?
[ https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27156: - Attachment: Screen Shot 2019-03-14 at 11.46.31 AM.png > why is the "http://:18080/static" browse able? > > > Key: SPARK-27156 > URL: https://issues.apache.org/jira/browse/SPARK-27156 > Project: Spark > Issue Type: Question > Components: Spark Core, Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png > > > I would like to know is there a way to disable spark history server /static > folder ? Please do refer on the attachment provide. > !image-2019-03-14-11-47-15-300.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?
[ https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27156: - Description: I would like to know is there a way to disable spark history server /static folder ? Please do refer on the attachment provided. (was: I would like to know is there a way to disable spark history server /static folder ? Please do refer on the attachment provide. ) > why is the "http://:18080/static" browse able? > > > Key: SPARK-27156 > URL: https://issues.apache.org/jira/browse/SPARK-27156 > Project: Spark > Issue Type: Question > Components: Spark Core, Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png > > > I would like to know is there a way to disable spark history server /static > folder ? Please do refer on the attachment provided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27156) why is the "http://:18080/static" browse able?
[ https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Garcia updated SPARK-27156: - Description: I would like to know is there a way to disable spark history server /static folder ? Please do refer on the attachment provided. Reason for asking is for security purposes. (was: I would like to know is there a way to disable spark history server /static folder ? Please do refer on the attachment provided. ) > why is the "http://:18080/static" browse able? > > > Key: SPARK-27156 > URL: https://issues.apache.org/jira/browse/SPARK-27156 > Project: Spark > Issue Type: Question > Components: Spark Core, Web UI >Affects Versions: 1.6.2 >Reporter: Jerry Garcia >Priority: Minor > Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png > > > I would like to know is there a way to disable spark history server /static > folder ? Please do refer on the attachment provided. Reason for asking is for > security purposes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27156) why is the "http://:18080/static" browse able?
Jerry Garcia created SPARK-27156: Summary: why is the "http://:18080/static" browse able? Key: SPARK-27156 URL: https://issues.apache.org/jira/browse/SPARK-27156 Project: Spark Issue Type: Question Components: Spark Core, Web UI Affects Versions: 1.6.2 Reporter: Jerry Garcia I would like to know is there a way to disable spark history server /static folder ? Please do refer on the attachment provide. !image-2019-03-14-11-47-15-300.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27122) YARN test failures in Java 9+
[ https://issues.apache.org/jira/browse/SPARK-27122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792299#comment-16792299 ] Ajith S commented on SPARK-27122: - I can reproduce this issue even in Java8. I would like to work on this. > YARN test failures in Java 9+ > - > > Key: SPARK-27122 > URL: https://issues.apache.org/jira/browse/SPARK-27122 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: Sean Owen >Priority: Major > > Currently on Java 11: > {code} > YarnSchedulerBackendSuite: > - RequestExecutors reflects node blacklist and is serializable > - Respect user filters when adding AM IP filter *** FAILED *** > java.lang.ClassCastException: > org.spark_project.jetty.servlet.ServletContextHandler cannot be cast to > org.eclipse.jetty.servlet.ServletContextHandler > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2(YarnSchedulerBackend.scala:183) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend.$anonfun$addWebUIFilter$2$adapted(YarnSchedulerBackend.scala:174) > at scala.Option.foreach(Option.scala:274) > ... > {code} > This looks like a classpath issue, probably ultimately related to the same > classloader issues in https://issues.apache.org/jira/browse/SPARK-26839 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub
[ https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27155: Assignee: (was: Apache Spark) > Docker Oracle XE image docker image has been removed by DockerHub > -- > > Key: SPARK-27155 > URL: https://issues.apache.org/jira/browse/SPARK-27155 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Zhu, Lipeng >Priority: Major > Attachments: image-2019-03-14-11-00-05-498.png > > > Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed > by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner > which is the Oracle. > > !image-2019-03-14-11-00-05-498.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub
[ https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu, Lipeng updated SPARK-27155: Description: Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner which is the Oracle. !image-2019-03-14-11-00-05-498.png! was: Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner which is the Oracle. !image-2019-03-14-10-59-31-099.png! > Docker Oracle XE image docker image has been removed by DockerHub > -- > > Key: SPARK-27155 > URL: https://issues.apache.org/jira/browse/SPARK-27155 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Zhu, Lipeng >Priority: Major > Attachments: image-2019-03-14-11-00-05-498.png > > > Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed > by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner > which is the Oracle. > > !image-2019-03-14-11-00-05-498.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub
[ https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792292#comment-16792292 ] Apache Spark commented on SPARK-27155: -- User 'lipzhu' has created a pull request for this issue: https://github.com/apache/spark/pull/24086 > Docker Oracle XE image docker image has been removed by DockerHub > -- > > Key: SPARK-27155 > URL: https://issues.apache.org/jira/browse/SPARK-27155 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Zhu, Lipeng >Priority: Major > Attachments: image-2019-03-14-11-00-05-498.png > > > Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed > by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner > which is the Oracle. > > !image-2019-03-14-11-00-05-498.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub
[ https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27155: Assignee: Apache Spark > Docker Oracle XE image docker image has been removed by DockerHub > -- > > Key: SPARK-27155 > URL: https://issues.apache.org/jira/browse/SPARK-27155 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Zhu, Lipeng >Assignee: Apache Spark >Priority: Major > Attachments: image-2019-03-14-11-00-05-498.png > > > Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed > by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner > which is the Oracle. > > !image-2019-03-14-11-00-05-498.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub
[ https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792293#comment-16792293 ] Apache Spark commented on SPARK-27155: -- User 'lipzhu' has created a pull request for this issue: https://github.com/apache/spark/pull/24086 > Docker Oracle XE image docker image has been removed by DockerHub > -- > > Key: SPARK-27155 > URL: https://issues.apache.org/jira/browse/SPARK-27155 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Zhu, Lipeng >Priority: Major > Attachments: image-2019-03-14-11-00-05-498.png > > > Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed > by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner > which is the Oracle. > > !image-2019-03-14-11-00-05-498.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub
[ https://issues.apache.org/jira/browse/SPARK-27155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu, Lipeng updated SPARK-27155: Attachment: image-2019-03-14-11-00-05-498.png > Docker Oracle XE image docker image has been removed by DockerHub > -- > > Key: SPARK-27155 > URL: https://issues.apache.org/jira/browse/SPARK-27155 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.0 >Reporter: Zhu, Lipeng >Priority: Major > Attachments: image-2019-03-14-11-00-05-498.png > > > Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed > by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner > which is the Oracle. > > !image-2019-03-14-11-00-05-498.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27155) Docker Oracle XE image docker image has been removed by DockerHub
Zhu, Lipeng created SPARK-27155: --- Summary: Docker Oracle XE image docker image has been removed by DockerHub Key: SPARK-27155 URL: https://issues.apache.org/jira/browse/SPARK-27155 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.0 Reporter: Zhu, Lipeng Since 2019-Feb-13(the Valentine's day eve) this docker image has been removed by DockerHub due to the Docker DMCA Takedown Notice from the Copyright owner which is the Oracle. !image-2019-03-14-10-59-31-099.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors
[ https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26555: Assignee: Apache Spark > Thread safety issue causes createDataset to fail with misleading errors > --- > > Key: SPARK-26555 > URL: https://issues.apache.org/jira/browse/SPARK-26555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Assignee: Apache Spark >Priority: Major > > This can be replicated (~2% of the time) with > {code:scala} > import java.sql.Timestamp > import java.util.concurrent.{Executors, Future} > import org.apache.spark.sql.SparkSession > import scala.collection.mutable.ListBuffer > import scala.concurrent.ExecutionContext > import scala.util.Random > object Main { > def main(args: Array[String]): Unit = { > val sparkSession = SparkSession.builder > .getOrCreate() > import sparkSession.implicits._ > val executor = Executors.newFixedThreadPool(1) > try { > implicit val xc: ExecutionContext = > ExecutionContext.fromExecutorService(executor) > val futures = new ListBuffer[Future[_]]() > for (i <- 1 to 3) { > futures += executor.submit(new Runnable { > override def run(): Unit = { > val d = if (Random.nextInt(2) == 0) Some("d value") else None > val e = if (Random.nextInt(2) == 0) Some(5.0) else None > val f = if (Random.nextInt(2) == 0) Some(6.0) else None > println("DEBUG", d, e, f) > sparkSession.createDataset(Seq( > MyClass(new Timestamp(1L), "b", "c", d, e, f) > )) > } > }) > } > futures.foreach(_.get()) > } finally { > println("SHUTDOWN") > executor.shutdown() > sparkSession.stop() > } > } > case class MyClass( > a: Timestamp, > b: String, > c: String, > d: Option[String], > e: Option[Double], > f: Option[Double] > ) > } > {code} > So it will usually come up during > {code:bash} > for i in $(seq 1 200); do > echo $i > spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar > done > {code} > causing a variety of possible errors, such as > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code} > or > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors
[ https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26555: Assignee: (was: Apache Spark) > Thread safety issue causes createDataset to fail with misleading errors > --- > > Key: SPARK-26555 > URL: https://issues.apache.org/jira/browse/SPARK-26555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Priority: Major > > This can be replicated (~2% of the time) with > {code:scala} > import java.sql.Timestamp > import java.util.concurrent.{Executors, Future} > import org.apache.spark.sql.SparkSession > import scala.collection.mutable.ListBuffer > import scala.concurrent.ExecutionContext > import scala.util.Random > object Main { > def main(args: Array[String]): Unit = { > val sparkSession = SparkSession.builder > .getOrCreate() > import sparkSession.implicits._ > val executor = Executors.newFixedThreadPool(1) > try { > implicit val xc: ExecutionContext = > ExecutionContext.fromExecutorService(executor) > val futures = new ListBuffer[Future[_]]() > for (i <- 1 to 3) { > futures += executor.submit(new Runnable { > override def run(): Unit = { > val d = if (Random.nextInt(2) == 0) Some("d value") else None > val e = if (Random.nextInt(2) == 0) Some(5.0) else None > val f = if (Random.nextInt(2) == 0) Some(6.0) else None > println("DEBUG", d, e, f) > sparkSession.createDataset(Seq( > MyClass(new Timestamp(1L), "b", "c", d, e, f) > )) > } > }) > } > futures.foreach(_.get()) > } finally { > println("SHUTDOWN") > executor.shutdown() > sparkSession.stop() > } > } > case class MyClass( > a: Timestamp, > b: String, > c: String, > d: Option[String], > e: Option[Double], > f: Option[Double] > ) > } > {code} > So it will usually come up during > {code:bash} > for i in $(seq 1 200); do > echo $i > spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar > done > {code} > causing a variety of possible errors, such as > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code} > or > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27151) ClearCacheCommand extends IgnoreCahedData to avoid plan node copys
[ https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-27151: - Description: In SPARK-27011, we introduced IgnoreCahedData to avoid plan node copys in CacheManager. Since ClearCacheCommand has no argument, it also can extend IgnoreCahedData. was:To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`. > ClearCacheCommand extends IgnoreCahedData to avoid plan node copys > -- > > Key: SPARK-27151 > URL: https://issues.apache.org/jira/browse/SPARK-27151 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Trivial > > In SPARK-27011, we introduced IgnoreCahedData to avoid plan node copys in > CacheManager. > Since ClearCacheCommand has no argument, it also can extend IgnoreCahedData. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27151) ClearCacheCommand extends IgnoreCahedData to avoid plan node copys
[ https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-27151: - Summary: ClearCacheCommand extends IgnoreCahedData to avoid plan node copys (was: ClearCacheCommand should be case-object to avoid copys) > ClearCacheCommand extends IgnoreCahedData to avoid plan node copys > -- > > Key: SPARK-27151 > URL: https://issues.apache.org/jira/browse/SPARK-27151 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Trivial > > To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27152) Column equality does not work for aliased columns.
[ https://issues.apache.org/jira/browse/SPARK-27152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792248#comment-16792248 ] Hyukjin Kwon commented on SPARK-27152: -- What does this matter? > Column equality does not work for aliased columns. > -- > > Key: SPARK-27152 > URL: https://issues.apache.org/jira/browse/SPARK-27152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Ryan Radtke >Priority: Major > > assert($"zip".as("zip_code") equals $"zip".as("zip_code")) will return false > assert($"zip" equals $"zip") will return true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27154) Incomplete Execution for Spark/dev/run-test-jenkins.py
Vaibhavd created SPARK-27154: Summary: Incomplete Execution for Spark/dev/run-test-jenkins.py Key: SPARK-27154 URL: https://issues.apache.org/jira/browse/SPARK-27154 Project: Spark Issue Type: Question Components: jenkins Affects Versions: 2.4.0 Environment: {code:java} // code placeholder {code} JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64" BUILD_DISPLAY_NAME="Jenkins build" BUILD_URL="xxx " GITHUB_PROJECT_URL="https://github.com/apache/spark; GITHUB_OAUTH_KEY="xxx" GITHUB_API_ENDPOINT="https://api.github.com/repos/apache/spark; AMPLAB_JENKINS_BUILD_TOOL="maven" AMPLAB_JENKINS="True" sha1="origin/pr/23560/merge" ghprbActualCommit="d73cfb51941f99516b7878acace26db35ea72076" ghprbActualCommitAuthor="jiafu.zh...@intel.com" ghprbActualCommitAuthorEmail="jiafu.zh...@intel.com" ghprbTriggerAuthor="Marcelo Vanzin" ghprbPullId=23560 ghprbTargetBranch="master" ghprbSourceBranch="thread_conf_separation" GIT_BRANCH="thread_conf_separation" ghprbPullAuthorEmail="jiafu.zh...@intel.com" ghprbPullDescription="GitHub pull request #23560 of commit d73cfb51941f99516b7878acace26db35ea72076 automatically merged." ghprbPullTitle="[SPARK-26632][Core] Separate Thread Configurations of Driver and Executor" ghprbPullLink=https://api.github.com/repos/apache/spark/pulls/23560 Reporter: Vaibhavd When I run `Spark/dev/run-test-jenkins.py` with following env variables set. (Ref: [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103464/parameters/]) The execution gets stuck at this point (build step), {code:java} [INFO] --- scala-maven-plugin:3.4.4:compile (scala-compile-first) @ spark-tags_2.12 --- [INFO] Using zinc server for incremental compilation [INFO] Toolchain in scala-maven-plugin: /usr/lib/jvm/java-8-openjdk-amd64 {code} I am not sure what's going wrong. Am I missing some environment variable? When I run `/dev/run-tests.py` there is no problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23264) Support interval values without INTERVAL clauses
[ https://issues.apache.org/jira/browse/SPARK-23264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-23264. -- Resolution: Fixed Fix Version/s: 3.0.0 Resolved by https://github.com/apache/spark/pull/20433# > Support interval values without INTERVAL clauses > > > Key: SPARK-23264 > URL: https://issues.apache.org/jira/browse/SPARK-23264 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.2.1 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Minor > Fix For: 3.0.0 > > > The master currently cannot parse a SQL query below; > {code:java} > SELECT cast('2017-08-04' as date) + 1 days; > {code} > Since other dbms-like systems support this syntax (e.g., hive and mysql), it > might help to support in spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27093) Honor ParseMode in AvroFileFormat
[ https://issues.apache.org/jira/browse/SPARK-27093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792246#comment-16792246 ] Hyukjin Kwon commented on SPARK-27093: -- Yea, for second case, it sounds {{ignoreCorruptFiles}} should cover this case. For third case, it looks like the file should not be able to read anyway. If the schema is known to be incompatible, those schema should not be used to write those Avro files, if that's intended to be read via Spark. I think ORC or Parquet file formats don't also support those case and parse modes since it's quite unlikely to have few malformed rows in the entire dataset like CSV and JSON. > Honor ParseMode in AvroFileFormat > - > > Key: SPARK-27093 > URL: https://issues.apache.org/jira/browse/SPARK-27093 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 2.4.0 >Reporter: Tim Cerexhe >Priority: Major > > The Avro reader is missing the ability to handle malformed or truncated files > like the JSON reader. Currently it throws exceptions when it encounters any > bad or truncated record in an Avro file, causing the entire Spark job to fail > from a single dodgy file. > Ideally the AvroFileFormat would accept a Permissive or DropMalformed > ParseMode like Spark's JSON format. This would enable the the Avro reader to > drop bad records and continue processing the good records rather than abort > the entire job. > Obviously the default could remain as FailFastMode, which is the current > effective behavior, so this wouldn’t break any existing users. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors
[ https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792240#comment-16792240 ] Martin Loncaric commented on SPARK-26555: - This is an existing issue with scala: https://github.com/scala/bug/issues/10766 > Thread safety issue causes createDataset to fail with misleading errors > --- > > Key: SPARK-26555 > URL: https://issues.apache.org/jira/browse/SPARK-26555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Priority: Major > > This can be replicated (~2% of the time) with > {code:scala} > import java.sql.Timestamp > import java.util.concurrent.{Executors, Future} > import org.apache.spark.sql.SparkSession > import scala.collection.mutable.ListBuffer > import scala.concurrent.ExecutionContext > import scala.util.Random > object Main { > def main(args: Array[String]): Unit = { > val sparkSession = SparkSession.builder > .getOrCreate() > import sparkSession.implicits._ > val executor = Executors.newFixedThreadPool(1) > try { > implicit val xc: ExecutionContext = > ExecutionContext.fromExecutorService(executor) > val futures = new ListBuffer[Future[_]]() > for (i <- 1 to 3) { > futures += executor.submit(new Runnable { > override def run(): Unit = { > val d = if (Random.nextInt(2) == 0) Some("d value") else None > val e = if (Random.nextInt(2) == 0) Some(5.0) else None > val f = if (Random.nextInt(2) == 0) Some(6.0) else None > println("DEBUG", d, e, f) > sparkSession.createDataset(Seq( > MyClass(new Timestamp(1L), "b", "c", d, e, f) > )) > } > }) > } > futures.foreach(_.get()) > } finally { > println("SHUTDOWN") > executor.shutdown() > sparkSession.stop() > } > } > case class MyClass( > a: Timestamp, > b: String, > c: String, > d: Option[String], > e: Option[Double], > f: Option[Double] > ) > } > {code} > So it will usually come up during > {code:bash} > for i in $(seq 1 200); do > echo $i > spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar > done > {code} > causing a variety of possible errors, such as > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code} > or > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25694) URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue
[ https://issues.apache.org/jira/browse/SPARK-25694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792230#comment-16792230 ] Eugene commented on SPARK-25694: [~howardatwork], I have hit this problem as well, seems no workaround without change in httpclient. I ever used scalaj.http, but replaced it httpcomponent > URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue > --- > > Key: SPARK-25694 > URL: https://issues.apache.org/jira/browse/SPARK-25694 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2 >Reporter: Bo Yang >Priority: Minor > > URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() > returns FsUrlConnection object, which is not compatible with > HttpURLConnection. This will cause exception when using some third party http > library (e.g. scalaj.http). > The following code in Spark 2.3.0 introduced the issue: > sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala: > {code} > object SharedState extends Logging { ... > URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()) ... > } > {code} > Here is the example exception when using scalaj.http in Spark: > {code} > StackTrace: scala.MatchError: > org.apache.hadoop.fs.FsUrlConnection:[http://.example.com|http://.example.com/] > (of class org.apache.hadoop.fs.FsUrlConnection) > at > scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343) > at scalaj.http.HttpRequest.exec(Http.scala:335) > at scalaj.http.HttpRequest.asString(Http.scala:455) > {code} > > One option to fix the issue is to return null in > URLStreamHandlerFactory.createURLStreamHandler when the protocol is > http/https, so it will use the default behavior and be compatible with > scalaj.http. Following is the code example: > {code} > class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with > Logging { > private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory() > override def createURLStreamHandler(protocol: String): URLStreamHandler = { > val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol) > if (handler == null) { > return null > } > if (protocol != null && > (protocol.equalsIgnoreCase("http") > || protocol.equalsIgnoreCase("https"))) { > // return null to use system default URLStreamHandler > null > } else { > handler > } > } > } > {code} > I would like to get some discussion here before submitting a pull request. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27141) Use ConfigEntry for hardcoded configs Yarn
[ https://issues.apache.org/jira/browse/SPARK-27141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792223#comment-16792223 ] wangjiaochun commented on SPARK-27141: -- I'm sorry,it‘s something mistake close this Jira. and I will reopen it again, but I have solved this problem and already pull it master. > Use ConfigEntry for hardcoded configs Yarn > -- > > Key: SPARK-27141 > URL: https://issues.apache.org/jira/browse/SPARK-27141 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: wangjiaochun >Priority: Major > Fix For: 3.0.0 > > > Some of following Yarn file related configs are not use ConfigEntry value,try > to replace them. > ApplicationMaster > YarnAllocatorSuite > ApplicationMasterSuite > BaseYarnClusterSuite > YarnClusterSuite -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-26742. Resolution: Fixed Issue resolved by pull request 24002 [https://github.com/apache/spark/pull/24002] > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Assignee: Jiaxin Shan >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-26742: -- Assignee: Jiaxin Shan > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Assignee: Jiaxin Shan >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-15251) Cannot apply PythonUDF to aggregated column
[ https://issues.apache.org/jira/browse/SPARK-15251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew Livesey closed SPARK-15251. --- Cannot be reproduced on master, and is now years old. Closing as no longer relevant. > Cannot apply PythonUDF to aggregated column > --- > > Key: SPARK-15251 > URL: https://issues.apache.org/jira/browse/SPARK-15251 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.1 >Reporter: Matthew Livesey >Priority: Major > > In scala it is possible to define a UDF an apply it to an aggregated value in > an expression, for example: > {code} > def timesTwo(x: Int): Int = x * 2 > sqlContext.udf.register("timesTwo", timesTwo _) > sqlContext.sql("SELECT timesTwo(Sum(x)) t FROM my_data").show() > case class Data(x: Int, y: String) > val data = List(Data(1, "a"), Data(2, "b")) > val rdd = sc.parallelize(data) > val df = rdd.toDF > df.registerTempTable("my_data") > sqlContext.sql("SELECT timesTwo(Sum(x)) t FROM my_data").show() > +---+ > | t| > +---+ > | 6| > +---+ > {code} > Performing the same computation in pyspark: > {code} > def timesTwo(x): > return x * 2 > sqlContext.udf.register("timesTwo", timesTwo) > data = [(1, 'a'), (2, 'b')] > rdd = sc.parallelize(data) > df = sqlContext.createDataFrame(rdd, ["x", "y"]) > df.registerTempTable("my_data") > sqlContext.sql("SELECT timesTwo(Sum(x)) t FROM my_data").show() > {code} > Gives the following: > {code} > AnalysisException: u"expression 'pythonUDF' is neither present in the group > by, nor is it an aggregate function. Add to group by or wrap in first() (or > first_value) if you don't care which value you get.;" > {code} > Using a lambda rather than a named function gives the same error. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN
[ https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792120#comment-16792120 ] Felix Cheung commented on SPARK-26910: -- 2.3.3 failed. we are waiting for 2.4.1 to be released > Re-release SparkR to CRAN > - > > Key: SPARK-26910 > URL: https://issues.apache.org/jira/browse/SPARK-26910 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Michael Chirico >Assignee: Felix Cheung >Priority: Major > > The logical successor to https://issues.apache.org/jira/browse/SPARK-15799 > I don't see anything specifically tracking re-release in the Jira list. It > would be helpful to have an issue tracking this to refer to as an outsider, > as well as to document what the blockers are in case some outside help could > be useful. > * Is there a plan to re-release SparkR to CRAN? > * What are the major blockers to doing so at the moment? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors
[ https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792086#comment-16792086 ] Martin Loncaric edited comment on SPARK-26555 at 3/13/19 8:26 PM: -- Update: I have proved that the issue lies in reflection thread safety issues in org.apache.spark.sql.catalyst.ScalaReflection: https://stackoverflow.com/questions/55150590/thread-safety-in-scala-reflection-with-type-matching Investigating whether this can be fixed with different usage of the reflection library, or whether this is a scala issue. was (Author: mwlon): Update: I have been able to replicate this without Spark at all, using snippets from org.apache.spark.sql.catalyst.ScalaReflection: https://stackoverflow.com/questions/55150590/thread-safety-in-scala-reflection-with-type-matching Investigating whether this can be fixed with different usage of the reflection library, or whether this is a scala issue. > Thread safety issue causes createDataset to fail with misleading errors > --- > > Key: SPARK-26555 > URL: https://issues.apache.org/jira/browse/SPARK-26555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Priority: Major > > This can be replicated (~2% of the time) with > {code:scala} > import java.sql.Timestamp > import java.util.concurrent.{Executors, Future} > import org.apache.spark.sql.SparkSession > import scala.collection.mutable.ListBuffer > import scala.concurrent.ExecutionContext > import scala.util.Random > object Main { > def main(args: Array[String]): Unit = { > val sparkSession = SparkSession.builder > .getOrCreate() > import sparkSession.implicits._ > val executor = Executors.newFixedThreadPool(1) > try { > implicit val xc: ExecutionContext = > ExecutionContext.fromExecutorService(executor) > val futures = new ListBuffer[Future[_]]() > for (i <- 1 to 3) { > futures += executor.submit(new Runnable { > override def run(): Unit = { > val d = if (Random.nextInt(2) == 0) Some("d value") else None > val e = if (Random.nextInt(2) == 0) Some(5.0) else None > val f = if (Random.nextInt(2) == 0) Some(6.0) else None > println("DEBUG", d, e, f) > sparkSession.createDataset(Seq( > MyClass(new Timestamp(1L), "b", "c", d, e, f) > )) > } > }) > } > futures.foreach(_.get()) > } finally { > println("SHUTDOWN") > executor.shutdown() > sparkSession.stop() > } > } > case class MyClass( > a: Timestamp, > b: String, > c: String, > d: Option[String], > e: Option[Double], > f: Option[Double] > ) > } > {code} > So it will usually come up during > {code:bash} > for i in $(seq 1 200); do > echo $i > spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar > done > {code} > causing a variety of possible errors, such as > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code} > or > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26555) Thread safety issue causes createDataset to fail with misleading errors
[ https://issues.apache.org/jira/browse/SPARK-26555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792086#comment-16792086 ] Martin Loncaric commented on SPARK-26555: - Update: I have been able to replicate this without Spark at all, using snippets from org.apache.spark.sql.catalyst.ScalaReflection: https://stackoverflow.com/questions/55150590/thread-safety-in-scala-reflection-with-type-matching Investigating whether this can be fixed with different usage of the reflection library, or whether this is a scala issue. > Thread safety issue causes createDataset to fail with misleading errors > --- > > Key: SPARK-26555 > URL: https://issues.apache.org/jira/browse/SPARK-26555 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Martin Loncaric >Priority: Major > > This can be replicated (~2% of the time) with > {code:scala} > import java.sql.Timestamp > import java.util.concurrent.{Executors, Future} > import org.apache.spark.sql.SparkSession > import scala.collection.mutable.ListBuffer > import scala.concurrent.ExecutionContext > import scala.util.Random > object Main { > def main(args: Array[String]): Unit = { > val sparkSession = SparkSession.builder > .getOrCreate() > import sparkSession.implicits._ > val executor = Executors.newFixedThreadPool(1) > try { > implicit val xc: ExecutionContext = > ExecutionContext.fromExecutorService(executor) > val futures = new ListBuffer[Future[_]]() > for (i <- 1 to 3) { > futures += executor.submit(new Runnable { > override def run(): Unit = { > val d = if (Random.nextInt(2) == 0) Some("d value") else None > val e = if (Random.nextInt(2) == 0) Some(5.0) else None > val f = if (Random.nextInt(2) == 0) Some(6.0) else None > println("DEBUG", d, e, f) > sparkSession.createDataset(Seq( > MyClass(new Timestamp(1L), "b", "c", d, e, f) > )) > } > }) > } > futures.foreach(_.get()) > } finally { > println("SHUTDOWN") > executor.shutdown() > sparkSession.stop() > } > } > case class MyClass( > a: Timestamp, > b: String, > c: String, > d: Option[String], > e: Option[Double], > f: Option[Double] > ) > } > {code} > So it will usually come up during > {code:bash} > for i in $(seq 1 200); do > echo $i > spark-submit --master local[4] target/scala-2.11/spark-test_2.11-0.1.jar > done > {code} > causing a variety of possible errors, such as > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: scala.MatchError: scala.Option[String] (of class > scala.reflect.internal.Types$ClassArgsTypeRef) > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$deserializerFor$1.apply(ScalaReflection.scala:210){code} > or > {code}Exception in thread "main" java.util.concurrent.ExecutionException: > java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > Caused by: java.lang.UnsupportedOperationException: Schema for type > scala.Option[scala.Double] is not supported > at > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:789){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27123) Improve CollapseProject to handle projects cross limit/repartition/sample
[ https://issues.apache.org/jira/browse/SPARK-27123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791991#comment-16791991 ] Apache Spark commented on SPARK-27123: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/24078 > Improve CollapseProject to handle projects cross limit/repartition/sample > - > > Key: SPARK-27123 > URL: https://issues.apache.org/jira/browse/SPARK-27123 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > > `CollapseProject` optimizer simplifies the plan by merging the adjacent > projects and performing alias substitution. > {code:java} > scala> sql("SELECT b c FROM (SELECT a b FROM t)").explain > == Physical Plan == > *(1) Project [a#5 AS c#1] > +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5] > {code} > We can do that more complex cases like the following. > *BEFORE* > {code:java} > scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM > t)").explain > == Physical Plan == > *(2) Project [b#0 AS c#1] > +- Exchange RoundRobinPartitioning(1) >+- *(1) Project [a#5 AS b#0] > +- Scan hive default.t [a#5], HiveTableRelation `default`.`t`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#5] > {code} > *AFTER* > {code:java} > scala> sql("SELECT b c FROM (SELECT /*+ REPARTITION(1) */ a b FROM > t)").explain > == Physical Plan == > Exchange RoundRobinPartitioning(1) > +- *(1) Project [a#11 AS c#7] >+- Scan hive default.t [a#11], HiveTableRelation `default`.`t`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#11] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25449) Don't send zero accumulators in heartbeats
[ https://issues.apache.org/jira/browse/SPARK-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791960#comment-16791960 ] Shixiong Zhu commented on SPARK-25449: -- I think this patch actually fixed a bug introduced by https://github.com/apache/spark/commit/0514e8d4b69615ba8918649e7e3c46b5713b6540 It didn't use the correct default timeout. Before this batch, using `spark.executor.heartbeatInterval 30` would send a heartbeat every 30 ms, but each heartbeat RPC message timeout was 30 seconds. This patch just unifies the default time unit in all usages of "spark.executor.heartbeatInterval". > Don't send zero accumulators in heartbeats > -- > > Key: SPARK-25449 > URL: https://issues.apache.org/jira/browse/SPARK-25449 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Mukul Murthy >Assignee: Mukul Murthy >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Heartbeats sent from executors to the driver every 10 seconds contain metrics > and are generally on the order of a few KBs. However, for large jobs with > lots of tasks, heartbeats can be on the order of tens of MBs, causing tasks > to die with heartbeat failures. We can mitigate this by not sending zero > metrics to the driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27153) add weightCol in python RegressionEvaluator
[ https://issues.apache.org/jira/browse/SPARK-27153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27153: Assignee: Apache Spark > add weightCol in python RegressionEvaluator > --- > > Key: SPARK-27153 > URL: https://issues.apache.org/jira/browse/SPARK-27153 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Minor > > -https://issues.apache.org/jira/browse/SPARK-24102- added weightCol in > RegressionEvaluator.scala. This Jira will add weightCol in python version of > RegressionEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27153) add weightCol in python RegressionEvaluator
[ https://issues.apache.org/jira/browse/SPARK-27153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27153: Assignee: (was: Apache Spark) > add weightCol in python RegressionEvaluator > --- > > Key: SPARK-27153 > URL: https://issues.apache.org/jira/browse/SPARK-27153 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Priority: Minor > > -https://issues.apache.org/jira/browse/SPARK-24102- added weightCol in > RegressionEvaluator.scala. This Jira will add weightCol in python version of > RegressionEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
[ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791938#comment-16791938 ] yuhao yang commented on SPARK-20082: Yuhao is taking family bonding leave from March 7th to Apr 19th . Please expect delayed email response. Conctact +86 13738085700 for anything urgent. Thanks, Yuhao > Incremental update of LDA model, by adding initialModel as start point > -- > > Key: SPARK-20082 > URL: https://issues.apache.org/jira/browse/SPARK-20082 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.1.0 >Reporter: Mathieu DESPRIEE >Priority: Major > > Some mllib models support an initialModel to start from and update it > incrementally with new data. > From what I understand of OnlineLDAOptimizer, it is possible to incrementally > update an existing model with batches of new documents. > I suggest to add an initialModel as a start point for LDA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27153) add weightCol in python RegressionEvaluator
Huaxin Gao created SPARK-27153: -- Summary: add weightCol in python RegressionEvaluator Key: SPARK-27153 URL: https://issues.apache.org/jira/browse/SPARK-27153 Project: Spark Issue Type: Improvement Components: ML, MLlib, PySpark Affects Versions: 3.0.0 Reporter: Huaxin Gao -https://issues.apache.org/jira/browse/SPARK-24102- added weightCol in RegressionEvaluator.scala. This Jira will add weightCol in python version of RegressionEvaluator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24432) Add support for dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24432: Assignee: (was: Apache Spark) > Add support for dynamic resource allocation > --- > > Key: SPARK-24432 > URL: https://issues.apache.org/jira/browse/SPARK-24432 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Yinan Li >Priority: Major > > This is an umbrella ticket for work on adding support for dynamic resource > allocation into the Kubernetes mode. This requires a Kubernetes-specific > external shuffle service. The feature is available in our fork at > github.com/apache-spark-on-k8s/spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24432) Add support for dynamic resource allocation
[ https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24432: Assignee: Apache Spark > Add support for dynamic resource allocation > --- > > Key: SPARK-24432 > URL: https://issues.apache.org/jira/browse/SPARK-24432 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Yinan Li >Assignee: Apache Spark >Priority: Major > > This is an umbrella ticket for work on adding support for dynamic resource > allocation into the Kubernetes mode. This requires a Kubernetes-specific > external shuffle service. The feature is available in our fork at > github.com/apache-spark-on-k8s/spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20082) Incremental update of LDA model, by adding initialModel as start point
[ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791919#comment-16791919 ] Marcellus de Castro Tavares commented on SPARK-20082: - Hi, is this feature still on the roadmap? It's been in progress for a while. Thanks > Incremental update of LDA model, by adding initialModel as start point > -- > > Key: SPARK-20082 > URL: https://issues.apache.org/jira/browse/SPARK-20082 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.1.0 >Reporter: Mathieu DESPRIEE >Priority: Major > > Some mllib models support an initialModel to start from and update it > incrementally with new data. > From what I understand of OnlineLDAOptimizer, it is possible to incrementally > update an existing model with batches of new documents. > I suggest to add an initialModel as a start point for LDA. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27106) merge CaseInsensitiveStringMap and DataSourceOptions
[ https://issues.apache.org/jira/browse/SPARK-27106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27106. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24025 [https://github.com/apache/spark/pull/24025] > merge CaseInsensitiveStringMap and DataSourceOptions > > > Key: SPARK-27106 > URL: https://issues.apache.org/jira/browse/SPARK-27106 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27060) DDL Commands are accepting Keywords like create, drop as tableName
[ https://issues.apache.org/jira/browse/SPARK-27060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791907#comment-16791907 ] Thincrs commented on SPARK-27060: - A user of thincrs has selected this issue. Deadline: Wed, Mar 20, 2019 5:23 PM > DDL Commands are accepting Keywords like create, drop as tableName > -- > > Key: SPARK-27060 > URL: https://issues.apache.org/jira/browse/SPARK-27060 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sachin Ramachandra Setty >Priority: Minor > > Seems to be a compatibility issue compared to other components such as hive > and mySql. > DDL commands are successful even though the tableName is same as keyword. > Tested with columnNames as well and issue exists. > Whereas, Hive-Beeline is throwing ParseException and not accepting keywords > as tableName or columnName and mySql is accepting keywords only as columnName. > Spark-Behaviour : > {code} > Connected to: Spark SQL (version 2.3.2.0101) > CLI_DBMS_APPID > Beeline version 1.2.1.spark_2.3.2.0101 by Apache Hive > 0: jdbc:hive2://10.18.3.XXX:23040/default> create table create(id int); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.255 seconds) > 0: jdbc:hive2://10.18.3.XXX:23040/default> create table drop(int int); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.257 seconds) > 0: jdbc:hive2://10.18.3.XXX:23040/default> drop table drop; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.236 seconds) > 0: jdbc:hive2://10.18.3.XXX:23040/default> drop table create; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.168 seconds) > 0: jdbc:hive2://10.18.3.XXX:23040/default> create table tab1(float float); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.111 seconds) > 0: jdbc:hive2://10.18.XXX:23040/default> create table double(double float); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.093 seconds) > {code} > Hive-Behaviour : > {code} > Connected to: Apache Hive (version 3.1.0) > Driver: Hive JDBC (version 3.1.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 3.1.0 by Apache Hive > 0: jdbc:hive2://10.18.XXX:21066/> create table create(id int); > Error: Error while compiling statement: FAILED: ParseException line 1:13 > cannot recognize input near 'create' '(' 'id' in table name > (state=42000,code=4) > 0: jdbc:hive2://10.18.XXX:21066/> create table drop(id int); > Error: Error while compiling statement: FAILED: ParseException line 1:13 > cannot recognize input near 'drop' '(' 'id' in table name > (state=42000,code=4) > 0: jdbc:hive2://10.18XXX:21066/> create table tab1(float float); > Error: Error while compiling statement: FAILED: ParseException line 1:18 > cannot recognize input near 'float' 'float' ')' in column name or constraint > (state=42000,code=4) > 0: jdbc:hive2://10.18XXX:21066/> drop table create(id int); > Error: Error while compiling statement: FAILED: ParseException line 1:11 > cannot recognize input near 'create' '(' 'id' in table name > (state=42000,code=4) > 0: jdbc:hive2://10.18.XXX:21066/> drop table drop(id int); > Error: Error while compiling statement: FAILED: ParseException line 1:11 > cannot recognize input near 'drop' '(' 'id' in table name > (state=42000,code=4) > mySql : > CREATE TABLE CREATE(ID integer); > Error: near "CREATE": syntax error > CREATE TABLE DROP(ID integer); > Error: near "DROP": syntax error > CREATE TABLE TAB1(FLOAT FLOAT); > Success > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26103) OutOfMemory error with large query plans
[ https://issues.apache.org/jira/browse/SPARK-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-26103. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23169 [https://github.com/apache/spark/pull/23169] > OutOfMemory error with large query plans > > > Key: SPARK-26103 > URL: https://issues.apache.org/jira/browse/SPARK-26103 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2 > Environment: Amazon EMR 5.19 > 1 c5.4xlarge master instance > 1 c5.4xlarge core instance > 2 c5.4xlarge task instances >Reporter: Dave DeCaprio >Assignee: Dave DeCaprio >Priority: Major > Fix For: 3.0.0 > > > Large query plans can cause OutOfMemory errors in the Spark driver. > We are creating data frames that are not extremely large but contain lots of > nested joins. These plans execute efficiently because of caching and > partitioning, but the text version of the query plans generated can be > hundreds of megabytes. Running many of these in parallel causes our driver > process to fail. > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at > java.util.Arrays.copyOfRange(Arrays.java:2694) at > java.lang.String.(String.java:203) at > java.lang.StringBuilder.toString(StringBuilder.java:405) at > scala.StringContext.standardInterpolator(StringContext.scala:125) at > scala.StringContext.s(StringContext.scala:90) at > org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) > > > A similar error is reported in > [https://stackoverflow.com/questions/38307258/out-of-memory-error-when-writing-out-spark-dataframes-to-parquet-format] > > Code exists to truncate the string if the number of output columns is larger > than 25, but not if the rest of the query plan is huge. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26103) OutOfMemory error with large query plans
[ https://issues.apache.org/jira/browse/SPARK-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-26103: -- Assignee: Dave DeCaprio > OutOfMemory error with large query plans > > > Key: SPARK-26103 > URL: https://issues.apache.org/jira/browse/SPARK-26103 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2 > Environment: Amazon EMR 5.19 > 1 c5.4xlarge master instance > 1 c5.4xlarge core instance > 2 c5.4xlarge task instances >Reporter: Dave DeCaprio >Assignee: Dave DeCaprio >Priority: Major > > Large query plans can cause OutOfMemory errors in the Spark driver. > We are creating data frames that are not extremely large but contain lots of > nested joins. These plans execute efficiently because of caching and > partitioning, but the text version of the query plans generated can be > hundreds of megabytes. Running many of these in parallel causes our driver > process to fail. > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at > java.util.Arrays.copyOfRange(Arrays.java:2694) at > java.lang.String.(String.java:203) at > java.lang.StringBuilder.toString(StringBuilder.java:405) at > scala.StringContext.standardInterpolator(StringContext.scala:125) at > scala.StringContext.s(StringContext.scala:90) at > org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) > > > A similar error is reported in > [https://stackoverflow.com/questions/38307258/out-of-memory-error-when-writing-out-spark-dataframes-to-parquet-format] > > Code exists to truncate the string if the number of output columns is larger > than 25, but not if the rest of the query plan is huge. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27152) Column equality does not work for aliased columns.
Ryan Radtke created SPARK-27152: --- Summary: Column equality does not work for aliased columns. Key: SPARK-27152 URL: https://issues.apache.org/jira/browse/SPARK-27152 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Ryan Radtke assert($"zip".as("zip_code") equals $"zip".as("zip_code")) will return false assert($"zip" equals $"zip") will return true. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791751#comment-16791751 ] Ajith S edited comment on SPARK-26961 at 3/13/19 2:37 PM: -- 1) Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so *it has no effect for already created instance* !image-2019-03-13-19-53-52-390.png! 2) URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor. Also as per javadoc [https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html] {code:java} Note that the ClassLoader class is registered as parallel capable by default. However, its subclasses still need to register themselves if they are parallel capable. {code} Hence MutableURLClassLoader lost its parallel capability by failing to register unlike URLClassLoader was (Author: ajithshetty): 1) Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so it has no effect !image-2019-03-13-19-53-52-390.png! 2) URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor. Also as per javadoc [https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html] {code:java} Note that the ClassLoader class is registered as parallel capable by default. However, its subclasses still need to register themselves if they are parallel capable. {code} Hence MutableURLClassLoader lost its parallel capability by failing to register unlike URLClassLoader > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at >
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Description: Currently for Monitoring Spark application SQL information is not available from REST but only via UI. REST provides only applications,jobs,stages,environment. This Jira is targeted to provide a REST API so that SQL level information can be found Details: https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728 was:Currently for Monitoring Spark application SQL information is not available from REST but only via UI. REST provides only applications,jobs,stages,environment. This Jira is targeted to provide a REST API so that SQL level information can be found > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found > > Details: > https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791751#comment-16791751 ] Ajith S edited comment on SPARK-26961 at 3/13/19 2:32 PM: -- 1) Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so it has no effect !image-2019-03-13-19-53-52-390.png! 2) URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor. Also as per javadoc [https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html] {code:java} Note that the ClassLoader class is registered as parallel capable by default. However, its subclasses still need to register themselves if they are parallel capable. {code} Hence MutableURLClassLoader lost its parallel capability by failing to register unlike URLClassLoader was (Author: ajithshetty): Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so it has no effect !image-2019-03-13-19-53-52-390.png! URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at >
[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791739#comment-16791739 ] Sean Owen commented on SPARK-26961: --- When I run your class and print the result of registerAsParallelCapable, it returns true. Yes, parent initialization happens first, but URLClassLoader is also parallel capable. > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at
[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791751#comment-16791751 ] Ajith S commented on SPARK-26961: - Yes, the registerAsParallelCapable will return true, but if you inspect the classloader instance, parallelLockMap is still null as it was already initalized via super class constructor. so it has no effect !image-2019-03-13-19-53-52-390.png! URLClassLoader is parallel capable as it does registration in static block which is before calling parent(ClassLoader) constructor > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at >
[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-26961: Attachment: image-2019-03-13-19-53-52-390.png > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-53-52-390.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at java.lang.ClassLoader.loadClass(ClassLoader.java:404) > - waiting to lock <0x0005b7991168> (a >
[jira] [Commented] (SPARK-27137) Spark captured variable is null if the code is pasted via :paste
[ https://issues.apache.org/jira/browse/SPARK-27137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791744#comment-16791744 ] Osira Ben commented on SPARK-27137: --- Indeed it works in the current master! Thank you! > Spark captured variable is null if the code is pasted via :paste > > > Key: SPARK-27137 > URL: https://issues.apache.org/jira/browse/SPARK-27137 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Osira Ben >Priority: Major > > If I execute this piece of code > {code:java} > val foo = "foo" > def f(arg: Any): Unit = { > Option(42).foreach(_ => java.util.Objects.requireNonNull(foo, "foo")) > } > sc.parallelize(Seq(1, 2), 2).foreach(f) > {code} > {{in spark2-shell via :paste it throws}} > {code:java} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val foo = "foo" > def f(arg: Any): Unit = { > Option(42).foreach(_ => java.util.Objects.requireNonNull(foo, "foo")) > } > sc.parallelize(Seq(1, 2), 2).foreach(f) > // Exiting paste mode, now interpreting. > 19/03/11 15:02:06 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 > (TID 2, hadoop.company.com, executor 1): java.lang.NullPointerException: foo > at java.util.Objects.requireNonNull(Objects.java:228) > {code} > However if I execute it pasting without :paste or via spark2-shell -i it > doesn't. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-26961: Attachment: (was: image-2019-03-13-19-51-38-708.png) > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at java.lang.ClassLoader.loadClass(ClassLoader.java:404) > - waiting to lock <0x0005b7991168> (a > org.apache.spark.util.MutableURLClassLoader) > at
[jira] [Updated] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-26961: Attachment: image-2019-03-13-19-51-38-708.png > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > Attachments: image-2019-03-13-19-51-38-708.png > > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at java.lang.ClassLoader.loadClass(ClassLoader.java:404) > - waiting to lock <0x0005b7991168> (a >
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: image-2019-03-13-19-29-26-896.png > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791728#comment-16791728 ] Ajith S commented on SPARK-27142: - Ok apologies for being abstract about this requirement. Let me explain. A single SQL query can result into multiple jobs. So for end user who is using STS or spark-sql, the intended highest level of probe is the SQL which he has executed. This information can be seen from SQL tab. Attaching a sample. !image-2019-03-13-19-29-26-896.png! But same information he cannot access using the REST API exposed by spark and he always have to rely on jobs API which may be difficult. So i intend to expose the information seen in SQL tab in UI via REST API Mainly: # executionId : long # status : string - possible values COMPLETED/RUNNING/FAILED # description : string - executed SQL string # submissionTime : formatted time of SQL submission # duration : string - total run time # runningJobIds : Seq[Int] - sequence of running job ids # failedJobIds : Seq[Int] - sequence of failed job ids # successJobIds : Seq[Int] - sequence of success job ids > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-29-26-896.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: (was: image-2019-03-13-19-19-27-831.png) > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: (was: image-2019-03-13-19-19-24-951.png) > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: image-2019-03-13-19-19-24-951.png > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-19-24-951.png, > image-2019-03-13-19-19-27-831.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27142: Attachment: image-2019-03-13-19-19-27-831.png > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > Attachments: image-2019-03-13-19-19-24-951.png, > image-2019-03-13-19-19-27-831.png > > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Description: I run this (reduced) following code multiples times under the same exact input files : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read.format("csv").option("inferSchema", "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql.write.parquet( ... ) } } } finally { p.shutdown() }{code} Once in ten (spark-submit the application), the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it now returns None everytime, and the problem still occurs. So, I think the problem is more likely related to having the driver submitting multiples jobs in parallel, aka "scheduling within apps". The exception is as follow : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at
[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791712#comment-16791712 ] Ajith S commented on SPARK-26961: - [~srowen] That too will not work Here is my custom classloader {code:java} class MYClassLoader(urls: Array[URL], parent: ClassLoader) extends URLClassLoader(urls, parent) { ClassLoader.registerAsParallelCapable() override def loadClass(name: String): Class[_] = { super.loadClass(name) } } {code} If we see class initialization flow, we see that super constructor is called before ClassLoader.registerAsParallelCapable() line is hit, hence it doesn't take effect {code:java} :280, ClassLoader (java.lang) :316, ClassLoader (java.lang) :76, SecureClassLoader (java.security) :100, URLClassLoader (java.net) :23, MYClassLoader (org.apache.spark.util.ajith) {code} as per [https://github.com/scala/bug/issues/11429] scala 2.x do not have a pure static support yet. So moving classloader to a java based implementation may be only option we have > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Description: I run this (reduced) following code multiples times under the same exact input files : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read.format("csv").option("inferSchema", "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql.write.parquet( ... ) } } } finally { p.shutdown() }{code} Once in ten (spark-submit the application), the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it now returns None everytime, and the problem still occurs. So, I think the problem is more likely related to having the driver submitting multiples jobs in parallel, aka "scheduling within apps". The exception is as follow : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at
[jira] [Updated] (SPARK-27151) ClearCacheCommand should be case-object to avoid copys
[ https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-27151: - Priority: Trivial (was: Major) > ClearCacheCommand should be case-object to avoid copys > -- > > Key: SPARK-27151 > URL: https://issues.apache.org/jira/browse/SPARK-27151 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Trivial > > To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27151) ClearCacheCommand should be case-object to avoid copys
[ https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27151: Assignee: (was: Apache Spark) > ClearCacheCommand should be case-object to avoid copys > -- > > Key: SPARK-27151 > URL: https://issues.apache.org/jira/browse/SPARK-27151 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Major > > To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27151) ClearCacheCommand should be case-object to avoid copys
[ https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27151: Assignee: Apache Spark > ClearCacheCommand should be case-object to avoid copys > -- > > Key: SPARK-27151 > URL: https://issues.apache.org/jira/browse/SPARK-27151 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Assignee: Apache Spark >Priority: Major > > To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27151) ClearCacheCommand should be case-object to avoid copys
[ https://issues.apache.org/jira/browse/SPARK-27151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro updated SPARK-27151: - Summary: ClearCacheCommand should be case-object to avoid copys (was: Makes ClearCacheCommand case-object) > ClearCacheCommand should be case-object to avoid copys > -- > > Key: SPARK-27151 > URL: https://issues.apache.org/jira/browse/SPARK-27151 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Takeshi Yamamuro >Priority: Major > > To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27151) Makes ClearCacheCommand case-object
Takeshi Yamamuro created SPARK-27151: Summary: Makes ClearCacheCommand case-object Key: SPARK-27151 URL: https://issues.apache.org/jira/browse/SPARK-27151 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Takeshi Yamamuro To avoid unnecessary copys, `ClearCacheCommand` should be `case-object`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11284) ALS produces predictions as floats and should be double
[ https://issues.apache.org/jira/browse/SPARK-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791671#comment-16791671 ] Alex Combessie edited comment on SPARK-11284 at 3/13/19 1:00 PM: - Hello everyone, [~ddahlem], [~mengxr] I am still getting a Type error error when evaluating an ALS model in a pipeline. I have tested it on Spark 2.2.0.2.6.4.0-91. It is strange as it seems the issue is closed. Here is the error message: _Caused by: java.lang.ClassCastException: java.lang.Float cannot be cast to java.lang.Double_ Happy to provide more details. I am running the classic MovieLens example on a 100k dataset. Any views on this? Thanks, Alex was (Author: alex_combessie): Hello everyone, I am still getting a Type error error when evaluating an ALS model in a pipeline. I have tested it on Spark 2.2.0.2.6.4.0-91. It is strange as it seems the issue is closed. Here is the error message: _Caused by: java.lang.ClassCastException: java.lang.Float cannot be cast to java.lang.Double_ Happy to provide more details. I am running the classic MovieLens example on a 100k dataset. Any views on this? Thanks, Alex > ALS produces predictions as floats and should be double > --- > > Key: SPARK-11284 > URL: https://issues.apache.org/jira/browse/SPARK-11284 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.5.1 > Environment: All >Reporter: Dominik Dahlem >Priority: Major > Labels: ml, recommender > Original Estimate: 1h > Remaining Estimate: 1h > > Using pyspark.ml and DataFrames, The ALS recommender cannot be evaluated > using the RegressionEvaluator, because of a type mis-match between the model > transformation and the evaluation APIs. One can work around this by casting > the prediction column into double before passing it into the evaluator. > However, this does not work with pipelines and cross validation. > Code and traceback below: > {code} > als = ALS(rank=10, maxIter=30, regParam=0.1, userCol='userID', > itemCol='movieID', ratingCol='rating') > model = als.fit(training) > predictions = model.transform(validation) > evaluator = RegressionEvaluator(predictionCol='prediction', > labelCol='rating') > validationRmse = evaluator.evaluate(predictions, > {evaluator.metricName: 'rmse'}) > {code} > Traceback: > validationRmse = evaluator.evaluate(predictions, {evaluator.metricName: > 'rmse'}) > File > "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py", > line 63, in evaluate > File > "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py", > line 94, in _evaluate > File > "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", > line 813, in __call__ > File > "/Users/dominikdahlem/projects/repositories/spark/python/pyspark/sql/utils.py", > line 42, in deco > raise IllegalArgumentException(s.split(': ', 1)[1]) > pyspark.sql.utils.IllegalArgumentException: requirement failed: Column > prediction must be of type DoubleType but was actually FloatType. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11284) ALS produces predictions as floats and should be double
[ https://issues.apache.org/jira/browse/SPARK-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791671#comment-16791671 ] Alex Combessie commented on SPARK-11284: Hello everyone, I am still getting a Type error error when evaluating an ALS model in a pipeline. I have tested it on Spark 2.2.0.2.6.4.0-91. It is strange as it seems the issue is closed. Here is the error message: _Caused by: java.lang.ClassCastException: java.lang.Float cannot be cast to java.lang.Double_ Happy to provide more details. I am running the classic MovieLens example on a 100k dataset. Any views on this? Thanks, Alex > ALS produces predictions as floats and should be double > --- > > Key: SPARK-11284 > URL: https://issues.apache.org/jira/browse/SPARK-11284 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.5.1 > Environment: All >Reporter: Dominik Dahlem >Priority: Major > Labels: ml, recommender > Original Estimate: 1h > Remaining Estimate: 1h > > Using pyspark.ml and DataFrames, The ALS recommender cannot be evaluated > using the RegressionEvaluator, because of a type mis-match between the model > transformation and the evaluation APIs. One can work around this by casting > the prediction column into double before passing it into the evaluator. > However, this does not work with pipelines and cross validation. > Code and traceback below: > {code} > als = ALS(rank=10, maxIter=30, regParam=0.1, userCol='userID', > itemCol='movieID', ratingCol='rating') > model = als.fit(training) > predictions = model.transform(validation) > evaluator = RegressionEvaluator(predictionCol='prediction', > labelCol='rating') > validationRmse = evaluator.evaluate(predictions, > {evaluator.metricName: 'rmse'}) > {code} > Traceback: > validationRmse = evaluator.evaluate(predictions, {evaluator.metricName: > 'rmse'}) > File > "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py", > line 63, in evaluate > File > "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/pyspark.zip/pyspark/ml/evaluation.py", > line 94, in _evaluate > File > "/Users/dominikdahlem/software/spark-1.6.0-SNAPSHOT-bin-custom-spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", > line 813, in __call__ > File > "/Users/dominikdahlem/projects/repositories/spark/python/pyspark/sql/utils.py", > line 42, in deco > raise IllegalArgumentException(s.split(': ', 1)[1]) > pyspark.sql.utils.IllegalArgumentException: requirement failed: Column > prediction must be of type DoubleType but was actually FloatType. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Affects Version/s: 2.3.2 2.3.3 > Scheduling Within an Application : Spark SQL randomly failed on UDF > --- > > Key: SPARK-27150 > URL: https://issues.apache.org/jira/browse/SPARK-27150 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.4.0 >Reporter: Josh Sean >Priority: Major > > I run this (reduced) following code multiples times under the same exact > input files : > {code:java} > def myUdf(input : java.lang.String) : Option[Long] = { > None > } > ... > val sparkMain = ... .getOrCreate() > val d = inputPaths.toList.par > val p = new scala.concurrent.forkjoin.ForkJoinPool(12) > try { >d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) > d.foreach { > case (inputPath) => { > val spark = sparkMain.newSession() > > spark.udf.register("myUdf",udf(myUdf _)) > val df = spark.read.format("csv").option("inferSchema", > "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) > df.createOrReplaceTempView("mytable") > val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM > mytable """) > sql.write.parquet( ... ) >} > } > } finally { > p.shutdown() > }{code} > Once in ten (spark-submit the application), the driver failed with an > Exception related to Spark SQL and the UDF. However, as you can see, I have > reduced the UDF to minimum, it now returns None everytime, and the problem > still occurs. So, I think the problem is more likely related to having the > driver submitting multiples jobs in parallel, aka "scheduling within apps". > The exception is as follow : > {code:java} > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) > at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > Caused by: org.apache.spark.sql.AnalysisException: cannot resolve > 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast > struct<> to bigint; line 5 pos 10; > ... > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) > at >
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Affects Version/s: 2.4.0 > Scheduling Within an Application : Spark SQL randomly failed on UDF > --- > > Key: SPARK-27150 > URL: https://issues.apache.org/jira/browse/SPARK-27150 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.1, 2.4.0 >Reporter: Josh Sean >Priority: Major > > I run this (reduced) following code multiples times under the same exact > input files : > {code:java} > def myUdf(input : java.lang.String) : Option[Long] = { > None > } > ... > val sparkMain = ... .getOrCreate() > val d = inputPaths.toList.par > val p = new scala.concurrent.forkjoin.ForkJoinPool(12) > try { >d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) > d.foreach { > case (inputPath) => { > val spark = sparkMain.newSession() > > spark.udf.register("myUdf",udf(myUdf _)) > val df = spark.read.format("csv").option("inferSchema", > "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) > df.createOrReplaceTempView("mytable") > val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM > mytable """) > sql.write.parquet( ... ) >} > } > } finally { > p.shutdown() > }{code} > Once in ten (spark-submit the application), the driver failed with an > Exception related to Spark SQL and the UDF. However, as you can see, I have > reduced the UDF to minimum, it now returns None everytime, and the problem > still occurs. So, I think the problem is more likely related to having the > driver submitting multiples jobs in parallel, aka "scheduling within apps". > The exception is as follow : > {code:java} > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) > at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > Caused by: org.apache.spark.sql.AnalysisException: cannot resolve > 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast > struct<> to bigint; line 5 pos 10; > ... > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) > at >
[jira] [Commented] (SPARK-26961) Found Java-level deadlock in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791634#comment-16791634 ] Sean Owen commented on SPARK-26961: --- [~ajithshetty] I see, OK. It could happen in the class itself, in the constructor. The calls after the first would do nothing. > Found Java-level deadlock in Spark Driver > - > > Key: SPARK-26961 > URL: https://issues.apache.org/jira/browse/SPARK-26961 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0 >Reporter: Rong Jialei >Priority: Major > > Our spark job usually will finish in minutes, however, we recently found it > take days to run, and we can only kill it when this happened. > An investigation show all worker container could not connect drive after > start, and driver is hanging, using jstack, we found a Java-level deadlock. > > *Jstack output for deadlock part is showing below:* > > Found one Java-level deadlock: > = > "SparkUI-907": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > "ForkJoinPool-1-worker-57": > waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a > org.apache.spark.util.MutableURLClassLoader), > which is held by "ForkJoinPool-1-worker-7" > "ForkJoinPool-1-worker-7": > waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a > org.apache.hadoop.conf.Configuration), > which is held by "ForkJoinPool-1-worker-57" > Java stack information for the threads listed above: > === > "SparkUI-907": > at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328) > - waiting to lock <0x0005c0c1e5e0> (a > org.apache.hadoop.conf.Configuration) > at > org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145) > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363) > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840) > at > org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74) > at java.net.URL.getURLStreamHandler(URL.java:1142) > at java.net.URL.(URL.java:599) > at java.net.URL.(URL.java:490) > at java.net.URL.(URL.java:439) > at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176) > at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171) > at > org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:534) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > "ForkJoinPool-1-worker-57": > at java.lang.ClassLoader.loadClass(ClassLoader.java:404) > -
[jira] [Updated] (SPARK-27145) Close store after test, in the SQLAppStatusListenerSuite
[ https://issues.apache.org/jira/browse/SPARK-27145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-27145: -- Issue Type: Improvement (was: Bug) > Close store after test, in the SQLAppStatusListenerSuite > > > Key: SPARK-27145 > URL: https://issues.apache.org/jira/browse/SPARK-27145 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 2.3.3, 2.4.0, 3.0.0 >Reporter: shahid >Priority: Minor > > We create many stores in the SQLAppStatusListenerSuite, but we need to the > close store after test. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27064) create StreamingWrite at the begining of streaming execution
[ https://issues.apache.org/jira/browse/SPARK-27064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27064. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23981 [https://github.com/apache/spark/pull/23981] > create StreamingWrite at the begining of streaming execution > > > Key: SPARK-27064 > URL: https://issues.apache.org/jira/browse/SPARK-27064 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27143) Provide REST API for JDBC/ODBC level information
[ https://issues.apache.org/jira/browse/SPARK-27143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791615#comment-16791615 ] Sean Owen commented on SPARK-27143: --- Likewise here, we have a metrics system already. This doesn't say what you intend to expose. I think a REST API opens new questions about security, etc too, especially about SQL queries. > Provide REST API for JDBC/ODBC level information > > > Key: SPARK-27143 > URL: https://issues.apache.org/jira/browse/SPARK-27143 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application JDBC/ODBC information is not > available from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that JDBC/ODBC level information like session statistics, sql > staistics can be provided -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27142) Provide REST API for SQL level information
[ https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791614#comment-16791614 ] Sean Owen commented on SPARK-27142: --- We have a metrics system for metrics and a SQL tab for SQL queries. I don't imagine we need a new REST API? This doesn't say what info you are trying to expose? > Provide REST API for SQL level information > -- > > Key: SPARK-27142 > URL: https://issues.apache.org/jira/browse/SPARK-27142 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ajith S >Priority: Minor > > Currently for Monitoring Spark application SQL information is not available > from REST but only via UI. REST provides only > applications,jobs,stages,environment. This Jira is targeted to provide a REST > API so that SQL level information can be found -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Description: I run this (reduced) following code multiples times under the same exact input files : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read.format("csv").option("inferSchema", "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql.write.parquet( ... ) } } } finally { p.shutdown() }{code} Once in ten (spark-submit the application), the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it now returns None everytime, and the problem still occurs. So, I think the problem is more likely related to having the driver submitting multiples jobs in parallel, aka "scheduling within apps". The exception is as follow : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Description: I run this (reduced) following code multiples times under the same exact input files : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read.format("csv").option("inferSchema", "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql.write.parquet( ... ) } } } finally { p.shutdown() }{code} Once in ten (spark-submit the application), the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it now returns None everytime, and the problem still occurs. So, I think the problem is more likely related to having the driver submitting multiples jobs in parallel, aka "scheduling within apps". The exception is as follow : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Description: I run this (reduced) following code multiples times under the same exact input files : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read.format("csv").option("inferSchema", "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql.write.parquet( ... ) } } } finally { p.shutdown() }{code} Once in ten (spark-submit the application), the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it now returns None everytime, and the problem still occurs. So, I think the problem is more likely related to having the driver submitting multiples jobs in parallel, aka "scheduling within apps". The exception is as follow : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at
[jira] [Updated] (SPARK-27144) Explode with structType may throw NPE when the first column's nullable is false while the second column's nullable is true
[ https://issues.apache.org/jira/browse/SPARK-27144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27144: - Description: Create a dataFrame containing two columns names [weight, animal], the weight's nullable is false while the animal' nullable is true. Give null value in the col animal, then construct a new column with {code:java} explode( array( struct(lit("weight").alias("key"), col("weight").cast(StringType).alias("value")), struct(lit("animal").alias("key"), col("animal").cast(StringType).alias("value")) ) ) {code} then select the struct with .*, Spark will throw NPE {code:java} 19/03/13 14:39:10 INFO DAGScheduler: ResultStage 3 (show at SparkTest.scala:74) failed in 0.043 s due to Job aborted due to stage failure: Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task 3.0 in stage 3.0 (TID 9, localhost, executor driver): java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.project_doConsume$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} Codes for reproduce: {code:java} val data = Seq(Row(20.0, "dog","a"), Row(3.5, "cat","b"), Row(0.06, null,"c")) val schema = StructType(List( StructField("weight", DoubleType, false), StructField("animal", StringType, true), StructField("extra", StringType, true))) val col1 = "weight" val col2 = "animal" val originalDF = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) // This should fail in select(test.*) val df1 = originalDF.withColumn( "test", explode(array( struct(lit(col1).alias("key"), col(col1).cast(StringType).alias("value")), struct(lit(col2).alias("key"), col(col2).cast(StringType).alias("value") df1.printSchema() df1.select("test.*").show() // This should succeed in select(test.*) val df2 = originalDF.withColumn( "test", explode(array( struct(lit(col2).alias("key"), col(col2).cast(StringType).alias("value")), struct(lit(col1).alias("key"), col(col1).cast(StringType).alias("value") df2.printSchema() df2.select("test.*").show() {code} was: Create a dataFrame containing two columns names [weight, animal], the weight's nullable is false while the animal' nullable is true. Give null value in the col animal, then construct a new column with {code:java} explode( array( struct(lit("weight").alias("key"), col("weight").cast(StringType).alias("value")), struct(lit("animal").alias("key"), col("animal").cast(StringType).alias("value")) ) ) {code} then select the struct with .*, Spark will throw NPE {code:java} 19/03/13 14:39:10 INFO DAGScheduler: ResultStage 3 (show at SparkTest.scala:74) failed in 0.043 s due to Job aborted due to stage failure: Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task 3.0 in stage 3.0 (TID 9, localhost, executor driver): java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194) at
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Description: I run this (reduced) following code multiples times under the same exact input files (about 100) : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read.format("csv").option("inferSchema", "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql.write.parquet( ... ) } } } finally { p.shutdown() }{code} Once in ten (spark-submit the application), the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it returns None everytime, and the problem still occurs. I think the problem is related to having the driver submitting multiples jobs, aka "scheduling within apps". The exception is as follow : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at
[jira] [Commented] (SPARK-27146) Add two Yarn Configs according to spark home page configuration Instructions document
[ https://issues.apache.org/jira/browse/SPARK-27146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791587#comment-16791587 ] Hyukjin Kwon commented on SPARK-27146: -- It doesn't necessarily document all the configurations. I'd explain why this is important to document in the site. Otherwise, let's don't add them. > Add two Yarn Configs according to spark home page configuration Instructions > document > - > > Key: SPARK-27146 > URL: https://issues.apache.org/jira/browse/SPARK-27146 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.0.0 >Reporter: wangjiaochun >Priority: Minor > Fix For: 3.0.0 > > > In web pages http://spark.apache.org/docs/latest/running-on-yarn.html,there > are two configuration option【spark.yarn.dist.forceDownloadSchemes and > spark.blacklist.application.maxFailedExecutorsPerNode】not implemented in yarn > config file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Description: I run this (reduced) following code multiples times under the same exact input files (about 100) : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read.format("csv").option("inferSchema", "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql.write.parquet( ... ) } } } finally { p.shutdown() }{code} Once in ten (spark-submit the application), the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it returns None everytime, and the problem still occurs. The exception is as follow : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:127) at
[jira] [Resolved] (SPARK-27144) Explode with structType may throw NPE when the first column's nullable is false while the second column's nullable is true
[ https://issues.apache.org/jira/browse/SPARK-27144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27144. -- Resolution: Cannot Reproduce > Explode with structType may throw NPE when the first column's nullable is > false while the second column's nullable is true > -- > > Key: SPARK-27144 > URL: https://issues.apache.org/jira/browse/SPARK-27144 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 > Environment: Spark 2.3.0, local mode. >Reporter: Yoga >Priority: Major > > Create a dataFrame containing two columns names [weight, animal], the > weight's nullable is false while the animal' nullable is true. > Give null value in the col animal, > then construct a new column with > {code:java} > explode( > array( > struct(lit("weight").alias("key"), > col("weight").cast(StringType).alias("value")), > struct(lit("animal").alias("key"), > col("animal").cast(StringType).alias("value")) > ) > ) > {code} > then select the struct with .*, Spark will throw NPE > {code:java} > 19/03/13 14:39:10 INFO DAGScheduler: ResultStage 3 (show at > SparkTest.scala:74) failed in 0.043 s due to Job aborted due to stage > failure: Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task > 3.0 in stage 3.0 (TID 9, localhost, executor driver): > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.project_doConsume$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > > Codes for reproduce: > {code:java} > val data = Seq( > Row(20.0, "dog","a"), > Row(3.5, "cat","b"), > Row(0.06, null,"c") > ) > val schema = StructType(List( > StructField("weight", DoubleType, false), > StructField("animal", StringType, true), > StructField("extra", StringType, true) > ) > ) > val col1 = "weight" > val col2 = "animal" > //this should fail in select(test.*) > val df1 = originalDF.withColumn("test", > explode( > array( > struct(lit(col1).alias("key"), > col(col1).cast(StringType).alias("value")), > struct(lit(col2).alias("key"), > col(col2).cast(StringType).alias("value")) > ) > ) > ) > df1.printSchema() > df1.select("test.*").show() > // this should succeed in select(test.*) > val df2 = originalDF.withColumn("test", > explode( > array( > struct(lit(col2).alias("key"), > col(col2).cast(StringType).alias("value")), > struct(lit(col1).alias("key"), > col(col1).cast(StringType).alias("value")) > ) > ) > ) > df2.printSchema() > dfs.select("test.*").show() > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For
[jira] [Commented] (SPARK-27144) Explode with structType may throw NPE when the first column's nullable is false while the second column's nullable is true
[ https://issues.apache.org/jira/browse/SPARK-27144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791580#comment-16791580 ] Hyukjin Kwon commented on SPARK-27144: -- It works in the current master: {code} scala> df1.select("test.*").show() +--+--+ | key| value| +--+--+ |weight| 20.0| |animal| dog| |weight| 3.5| |animal| cat| |weight|6.0E-6| |animal| null| +--+--+ {code} {code} scala> df2.select("test.*").show() +--+--+ | key| value| +--+--+ |animal| dog| |weight| 20.0| |animal| cat| |weight| 3.5| |animal| null| |weight|6.0E-6| +--+--+ {code} I'd be great if the JIRA can be identified and backported if applicable. I am resolving this for now anyway. > Explode with structType may throw NPE when the first column's nullable is > false while the second column's nullable is true > -- > > Key: SPARK-27144 > URL: https://issues.apache.org/jira/browse/SPARK-27144 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 > Environment: Spark 2.3.0, local mode. >Reporter: Yoga >Priority: Major > > Create a dataFrame containing two columns names [weight, animal], the > weight's nullable is false while the animal' nullable is true. > Give null value in the col animal, > then construct a new column with > {code:java} > explode( > array( > struct(lit("weight").alias("key"), > col("weight").cast(StringType).alias("value")), > struct(lit("animal").alias("key"), > col("animal").cast(StringType).alias("value")) > ) > ) > {code} > then select the struct with .*, Spark will throw NPE > {code:java} > 19/03/13 14:39:10 INFO DAGScheduler: ResultStage 3 (show at > SparkTest.scala:74) failed in 0.043 s due to Job aborted due to stage > failure: Task 3 in stage 3.0 failed 1 times, most recent failure: Lost task > 3.0 in stage 3.0 (TID 9, localhost, executor driver): > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.project_doConsume$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > > Codes for reproduce: > {code:java} > val data = Seq( > Row(20.0, "dog","a"), > Row(3.5, "cat","b"), > Row(0.06, null,"c") > ) > val schema = StructType(List( > StructField("weight", DoubleType, false), > StructField("animal", StringType, true), > StructField("extra", StringType, true) > ) > ) > val col1 = "weight" > val col2 = "animal" > //this should fail in select(test.*) > val df1 = originalDF.withColumn("test", > explode( > array( > struct(lit(col1).alias("key"), > col(col1).cast(StringType).alias("value")), > struct(lit(col2).alias("key"), > col(col2).cast(StringType).alias("value")) > ) > ) > ) > df1.printSchema() > df1.select("test.*").show() > // this should
[jira] [Updated] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
[ https://issues.apache.org/jira/browse/SPARK-27150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Sean updated SPARK-27150: -- Description: I run this (reduced) following code multiples times under the same exact input files (about 100) : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read.format("csv").option("inferSchema", "false").option("mode", "DROPMALFORMED").schema(mySchema).load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql.write.parquet( ... ) } } }{code} Once in ten application submit, the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it's only return None everytime, and the problem still occurs. The exception is as follow : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:95) at
[jira] [Created] (SPARK-27150) Scheduling Within an Application : Spark SQL randomly failed on UDF
Josh Sean created SPARK-27150: - Summary: Scheduling Within an Application : Spark SQL randomly failed on UDF Key: SPARK-27150 URL: https://issues.apache.org/jira/browse/SPARK-27150 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.3.1 Reporter: Josh Sean I run this (reduced) following code multiples times under the same exact input files (about 100) : {code:java} def myUdf(input : java.lang.String) : Option[Long] = { None } ... val sparkMain = ... .getOrCreate() val d = inputPaths.toList.par val p = new scala.concurrent.forkjoin.ForkJoinPool(12) try { d.tasksupport = new scala.collection.parallel.ForkJoinTaskSupport(p) d.foreach { case (inputPath) => { val spark = sparkMain.newSession() spark.udf.register("myUdf",udf(myUdf _)) val df = spark.read .format("csv") .option("inferSchema", "false") .option("mode", "DROPMALFORMED") .schema(mySchema) .load(inputPath) df.createOrReplaceTempView("mytable") val sql = spark.sql(""" SELECT CAST( myUdf(updated_date) as long) FROM mytable """) sql .write .parquet( ... ) } } }{code} Once in ten application submit, the driver failed with an Exception related to Spark SQL and the UDF. However, as you can see, I have reduced the UDF to minimum, it's only return None everytime, and the problem still occurs. The exception is that : {code:java} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(UDF(updated_date) AS BIGINT)' due to data type mismatch: cannot cast struct<> to bigint; line 5 pos 10; ... at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.immutable.List.map(List.scala:285) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at
[jira] [Commented] (SPARK-27141) Use ConfigEntry for hardcoded configs Yarn
[ https://issues.apache.org/jira/browse/SPARK-27141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791489#comment-16791489 ] Sandeep Katta commented on SPARK-27141: --- @[~wangjch] I still see some hardcoded configs present [Code here|https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L215] Why this Jira is closed ? > Use ConfigEntry for hardcoded configs Yarn > -- > > Key: SPARK-27141 > URL: https://issues.apache.org/jira/browse/SPARK-27141 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.0.0 >Reporter: wangjiaochun >Priority: Major > Fix For: 3.0.0 > > > Some of following Yarn file related configs are not use ConfigEntry value,try > to replace them. > ApplicationMaster > YarnAllocatorSuite > ApplicationMasterSuite > BaseYarnClusterSuite > YarnClusterSuite -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org