[jira] [Created] (SPARK-21635) ACOS(2) and ASIN(2) should be null
Yuming Wang created SPARK-21635: --- Summary: ACOS(2) and ASIN(2) should be null Key: SPARK-21635 URL: https://issues.apache.org/jira/browse/SPARK-21635 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Yuming Wang ACOS(2) and ASIN(2) should be null, I have create a patch for Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15799) Release SparkR on CRAN
[ https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113974#comment-16113974 ] Felix Cheung commented on SPARK-15799: -- we submitted 2.2.0 release to CRAN and got some comment that we hope to resolve (or get an exception, if we could).. > Release SparkR on CRAN > -- > > Key: SPARK-15799 > URL: https://issues.apache.org/jira/browse/SPARK-15799 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Xiangrui Meng > > Story: "As an R user, I would like to see SparkR released on CRAN, so I can > use SparkR easily in an existing R environment and have other packages built > on top of SparkR." > I made this JIRA with the following questions in mind: > * Are there known issues that prevent us releasing SparkR on CRAN? > * Do we want to package Spark jars in the SparkR release? > * Are there license issues? > * How does it fit into Spark's release process? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21634) Change OneRowRelation from a case object to case class
Reynold Xin created SPARK-21634: --- Summary: Change OneRowRelation from a case object to case class Key: SPARK-21634 URL: https://issues.apache.org/jira/browse/SPARK-21634 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Reporter: Reynold Xin Assignee: Reynold Xin OneRowRelation is the only plan that is a case object, which causes some issues with makeCopy using a 0-arg constructor. This patch changes it from a case object to a case class. This blocks SPARK-21619. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21626) The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
[ https://issues.apache.org/jira/browse/SPARK-21626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gu Chao updated SPARK-21626: Summary: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. (was: "WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable") > The short-circuit local reads feature cannot be used because libhadoop cannot > be loaded. > > > Key: SPARK-21626 > URL: https://issues.apache.org/jira/browse/SPARK-21626 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.2.0 >Reporter: Gu Chao > > After starting spark-shell, It outputs: > {code:none} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 17/08/04 11:24:41 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 17/08/04 11:24:44 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus" is already > registered. Ensure you dont have multiple JAR versions of the same plugin in > the classpath. The URL "file:/opt/spark/jars/datanucleus-core-3.2.10.jar" is > already registered, and you are trying to register an identical plugin > located at URL > "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar." > 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is > already registered. Ensure you dont have multiple JAR versions of the same > plugin in the classpath. The URL > "file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, > and you are trying to register an identical plugin located at URL > "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar." > 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" > is already registered. Ensure you dont have multiple JAR versions of the same > plugin in the classpath. The URL > "file:/opt/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and > you are trying to register an identical plugin located at URL > "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar." > 17/08/04 11:24:51 WARN ObjectStore: Failed to get database global_temp, > returning NoSuchObjectException > Spark context Web UI available at http://192.168.50.11:4040 > Spark context available as 'sc' (master = spark://hadoop:7077, app id = > app-20170804112442-0001). > Spark session available as 'spark'. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21626) "WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"
[ https://issues.apache.org/jira/browse/SPARK-21626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gu Chao updated SPARK-21626: Description: After starting spark-shell, It outputs: {code:none} Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/08/04 11:24:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/08/04 11:24:44 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar." 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar." 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar." 17/08/04 11:24:51 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://192.168.50.11:4040 Spark context available as 'sc' (master = spark://hadoop:7077, app id = app-20170804112442-0001). Spark session available as 'spark'. {code} was: After starting spark-shell, It outputs: Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/08/04 11:24:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/08/04 11:24:44 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar." 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar." 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar." 17/08/04 11:24:51 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://192.168.50.11:4040 Spark context available as 'sc' (master = spark://hadoop:7077, app id = app-20170804112442-0001). Spark session available as 'spark'. > "WARN NativeCodeLoader: Unable to load native-hadoop library for your > platform... using builtin-java classes where applicable" > -- > > Key: SPARK-21626 > URL: https://issues.apache.org/jira/browse/SPARK-21626 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.2.0 >Reporter: Gu Chao > > After starting spark-shell, It outputs: > {code:none} > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 17/08/04 11:24:41 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes
[jira] [Updated] (SPARK-21626) "WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"
[ https://issues.apache.org/jira/browse/SPARK-21626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gu Chao updated SPARK-21626: Description: After starting spark-shell, It outputs: Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/08/04 11:24:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/08/04 11:24:44 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar." 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar." 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/opt/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar." 17/08/04 11:24:51 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://192.168.50.11:4040 Spark context available as 'sc' (master = spark://hadoop:7077, app id = app-20170804112442-0001). Spark session available as 'spark'. was: After starting spark-shell, It output: 17/08/03 18:24:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable > "WARN NativeCodeLoader: Unable to load native-hadoop library for your > platform... using builtin-java classes where applicable" > -- > > Key: SPARK-21626 > URL: https://issues.apache.org/jira/browse/SPARK-21626 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.2.0 >Reporter: Gu Chao > > After starting spark-shell, It outputs: > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 17/08/04 11:24:41 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 17/08/04 11:24:44 WARN DomainSocketFactory: The short-circuit local reads > feature cannot be used because libhadoop cannot be loaded. > 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus" is already > registered. Ensure you dont have multiple JAR versions of the same plugin in > the classpath. The URL "file:/opt/spark/jars/datanucleus-core-3.2.10.jar" is > already registered, and you are trying to register an identical plugin > located at URL > "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-core-3.2.10.jar." > 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is > already registered. Ensure you dont have multiple JAR versions of the same > plugin in the classpath. The URL > "file:/opt/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, > and you are trying to register an identical plugin located at URL > "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-api-jdo-3.2.6.jar." > 17/08/04 11:24:48 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" > is already registered. Ensure you dont have multiple JAR versions of the same > plugin in the classpath. The URL > "file:/opt/spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and > you are trying to register an identical plugin located at URL > "file:/opt/spark-2.2.0-bin-hadoop2.6/jars/datanucleus-rdbms-3.2.9.jar." > 17/08/04 11:24:51 WARN ObjectStore: Failed to get database global_temp, > returning NoSuchObjectException > Spark context Web UI available at http://192.168.50.11:4040 > Spark context available as 'sc' (master = spark://hadoop:7077, app id = > app-20170804112442-0001). > Spark session available as 'spark'. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (SPARK-21633) Unary Transformer in Python
Ajay Saini created SPARK-21633: -- Summary: Unary Transformer in Python Key: SPARK-21633 URL: https://issues.apache.org/jira/browse/SPARK-21633 Project: Spark Issue Type: New Feature Components: ML Affects Versions: 2.2.0 Reporter: Ajay Saini Currently, the abstract class UnaryTransformer is only implemented in Scala. In order to make Pyspark easier to extend with custom transformers, it would be helpful to have the implementation of UnaryTransformer in Python as well. This task involves: - implementing the class UnaryTransformer in Python - testing the transform() functionality of the class to make sure it works -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21618) http(s) not accepted in spark-submit jar uri
[ https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113806#comment-16113806 ] Saisai Shao commented on SPARK-21618: - [~benmayne] If you try the master branch of Spark with SPARK-21012 in, the jars could be downloaded from http(s) url, please take a try. > http(s) not accepted in spark-submit jar uri > > > Key: SPARK-21618 > URL: https://issues.apache.org/jira/browse/SPARK-21618 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1, 2.2.0 > Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu > 16.04. >Reporter: Ben Mayne >Priority: Minor > Labels: documentation > > The documentation suggests I should be able to use an http(s) uri for a jar > in spark-submit, but I haven't been successful > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management > {noformat} > benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master > local[2] --class class.name.Test https://test.com/path/to/jar.jar > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > https > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at > org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > benmayne@Benjamins-MacBook-Pro ~ $ > {noformat} > If I replace the path with a valid hdfs path > (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the > same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 > on ubuntu. > this is the example that I'm trying to replicate from > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management: > > > Spark uses the following URL scheme to allow different strategies for > > disseminating jars: > > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file > > server, and every executor pulls the file from the driver HTTP server. > > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as > > expected > {noformat} > # Run on a Mesos cluster in cluster deploy mode with supervise > ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master mesos://207.184.161.138:7077 \ > --deploy-mode cluster \ > --supervise \ > --executor-memory 20G \ > --total-executor-cores 100 \ > http://path/to/examples.jar \ > 1000 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21624) Optimize communication cost of RF/GBT/DT
[ https://issues.apache.org/jira/browse/SPARK-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113793#comment-16113793 ] Peng Meng commented on SPARK-21624: --- Thanks [~mlnick], use Vector and compress is reasonable. I will submit a PR and show the performance data. Thanks. > Optimize communication cost of RF/GBT/DT > > > Key: SPARK-21624 > URL: https://issues.apache.org/jira/browse/SPARK-21624 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 2.3.0 >Reporter: Peng Meng > > {quote}The implementation of RF is bound by either the cost of statistics > computation on workers or by communicating the sufficient statistics.{quote} > The statistics are stored in allStats: > {code:java} > /** >* Flat array of elements. >* Index for start of stats for a (feature, bin) is: >* index = featureOffsets(featureIndex) + binIndex * statsSize >*/ > private var allStats: Array[Double] = new Array[Double](allStatsSize) > {code} > The size of allStats maybe very large, and it can be very spare, especially > on the nodes that near the leave of the tree. > I have changed allStats from Array to SparseVector, my tests show the > communication is down by about 50%. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21632) There is no need to make attempts for createDirectory if the dir had existed
liuzhaokun created SPARK-21632: -- Summary: There is no need to make attempts for createDirectory if the dir had existed Key: SPARK-21632 URL: https://issues.apache.org/jira/browse/SPARK-21632 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.1 Reporter: liuzhaokun Priority: Minor There is no need to make attempts for createDirectory if the dir had existed.So I think we should log it,and Jump out of the loop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21627) analyze hive table compute stats for columns with mixed case exception
[ https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113778#comment-16113778 ] Liang-Chi Hsieh commented on SPARK-21627: - I think it is just solved by SPARK-21599. > analyze hive table compute stats for columns with mixed case exception > -- > > Key: SPARK-21627 > URL: https://issues.apache.org/jira/browse/SPARK-21627 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bogdan Raducanu > > {code} > sql("create table tabel1(b int) partitioned by (partColumn int)") > sql("analyze table tabel1 compute statistics for columns partColumn, b") > {code} > {code} > java.util.NoSuchElementException: key not found: partColumn > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:59) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) > at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) > at > org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636) > ... 39 elided > {code} > Looks like regression introduced by https://github.com/apache/spark/pull/18248 > In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower > case column names. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21630) Pmod should not throw a divide by zero exception
[ https://issues.apache.org/jira/browse/SPARK-21630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113777#comment-16113777 ] Liang-Chi Hsieh commented on SPARK-21630: - Maybe duplicate to SPARK-21205? > Pmod should not throw a divide by zero exception > > > Key: SPARK-21630 > URL: https://issues.apache.org/jira/browse/SPARK-21630 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.1, 2.2.0 >Reporter: Herman van Hovell > > Pmod currently throws a divide by zero exception when the right input is 0. > It should - like Divide or Remainder - probably return null. > Here is a small reproducer: > {noformat} > scala> sql("select pmod(id, 0) from range(10)").show > 17/08/03 22:36:43 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) > java.lang.ArithmeticException: / by zero > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21631) Building Spark with SBT unsuccessful when source code in Mllib is modified, But with MVN is ok
[ https://issues.apache.org/jira/browse/SPARK-21631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113775#comment-16113775 ] Liang-Chi Hsieh commented on SPARK-21631: - Looks like just it is not compliant with Spark code style? > Building Spark with SBT unsuccessful when source code in Mllib is modified, > But with MVN is ok > -- > > Key: SPARK-21631 > URL: https://issues.apache.org/jira/browse/SPARK-21631 > Project: Spark > Issue Type: Bug > Components: Build, MLlib >Affects Versions: 2.1.1 > Environment: ubuntu 14.04 > Spark 2.1.1 > MVN 3.3.9 > scala 2.11.8 >Reporter: Sean Wong > > I added > import org.apache.spark.internal.Logging > at the head of LinearRegression.scala file > Then, I try to build Spark using SBT. > However, here is the error: > *[info] Done packaging. > java.lang.RuntimeException: errors exist > at scala.sys.package$.error(package.scala:27) > at org.scalastyle.sbt.Tasks$.onHasErrors$1(Plugin.scala:132) > at > org.scalastyle.sbt.Tasks$.doScalastyleWithConfig$1(Plugin.scala:187) > at org.scalastyle.sbt.Tasks$.doScalastyle(Plugin.scala:195) > at > SparkBuild$$anonfun$cachedScalaStyle$1$$anonfun$17.apply(SparkBuild.scala:205) > at > SparkBuild$$anonfun$cachedScalaStyle$1$$anonfun$17.apply(SparkBuild.scala:192) > at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235) > at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235) > at > sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:249) > at > sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:245) > at sbt.Difference.apply(Tracked.scala:224) > at sbt.Difference.apply(Tracked.scala:206) > at > sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:245) > at > sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:244) > at sbt.Difference.apply(Tracked.scala:224) > at sbt.Difference.apply(Tracked.scala:200) > at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:244) > at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:242) > at SparkBuild$$anonfun$cachedScalaStyle$1.apply(SparkBuild.scala:212) > at SparkBuild$$anonfun$cachedScalaStyle$1.apply(SparkBuild.scala:187) > at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) > at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) > at sbt.std.Transform$$anon$4.work(System.scala:63) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at > sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) > at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) > at sbt.Execute.work(Execute.scala:237) > at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228) > at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228) > at > sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159) > at sbt.CompletionService$$anon$2.call(CompletionService.scala:28) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > [error] (mllib/*:scalaStyleOnCompile) errors exist* > After this, I switch to use MVN to build Spark, Everything is ok and the > building is successful. > So is this a bug for SBT building? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20870) Update the output of spark-sql -H
[ https://issues.apache.org/jira/browse/SPARK-20870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113773#comment-16113773 ] Bravo Zhang commented on SPARK-20870: - Hi [~smilegator], I can't find the code handling the help message in spark. Is it managed in Hive project? https://github.com/apache/hive/blob/master/cli/src/java/org/apache/hadoop/hive/cli/OptionsProcessor.java > Update the output of spark-sql -H > - > > Key: SPARK-20870 > URL: https://issues.apache.org/jira/browse/SPARK-20870 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Xiao Li > Labels: starter > > When we input `./bin/spark-sql -H`, the output is still based on Hive. We > need to check whether all of them are working correctly. If not supported, we > need to remove it from the list. > Also, update the first line to `usage: spark-sql` > {noformat} > usage: hive > -d,--define
[jira] [Comment Edited] (SPARK-21629) OR nullability is incorrect
[ https://issues.apache.org/jira/browse/SPARK-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113752#comment-16113752 ] Takeshi Yamamuro edited comment on SPARK-21629 at 8/4/17 12:58 AM: --- What's a concrete query and result example? It is like a sequence below (I think this is a correct case though...)? {code} scala> Seq((Some(1), 1), (None, 2)).toDF("a", "b").selectExpr("a > 0 OR b > 0").printSchema root |-- ((a > 0) OR (b > 0)): boolean (nullable = true) scala> Seq((1, 1), (0, 2)).toDF("a", "b").selectExpr("a > 0 OR b > 0").printSchema root |-- ((a > 0) OR (b > 0)): boolean (nullable = false) {code} was (Author: maropu): What's a concrete query and result example? It is like a sequence below? {code} scala> Seq((Some(1), 1), (None, 2)).toDF("a", "b").selectExpr("a > 0 OR b > 0").printSchema root |-- ((a > 0) OR (b > 0)): boolean (nullable = true) scala> Seq((1, 1), (0, 2)).toDF("a", "b").selectExpr("a > 0 OR b > 0").printSchema root |-- ((a > 0) OR (b > 0)): boolean (nullable = false) {code} > OR nullability is incorrect > --- > > Key: SPARK-21629 > URL: https://issues.apache.org/jira/browse/SPARK-21629 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.1, 2.2.0 >Reporter: Herman van Hovell >Priority: Minor > > The SQL {{OR}} expression's nullability is slightly incorrect. It should only > be nullable when both of the input expressions are nullable, and not when > either of them is nullable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21629) OR nullability is incorrect
[ https://issues.apache.org/jira/browse/SPARK-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113752#comment-16113752 ] Takeshi Yamamuro commented on SPARK-21629: -- What's a concrete query and result example? It is like a sequence below? {code} scala> Seq((Some(1), 1), (None, 2)).toDF("a", "b").selectExpr("a > 0 OR b > 0").printSchema root |-- ((a > 0) OR (b > 0)): boolean (nullable = true) scala> Seq((1, 1), (0, 2)).toDF("a", "b").selectExpr("a > 0 OR b > 0").printSchema root |-- ((a > 0) OR (b > 0)): boolean (nullable = false) {code} > OR nullability is incorrect > --- > > Key: SPARK-21629 > URL: https://issues.apache.org/jira/browse/SPARK-21629 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.1, 2.2.0 >Reporter: Herman van Hovell >Priority: Minor > > The SQL {{OR}} expression's nullability is slightly incorrect. It should only > be nullable when both of the input expressions are nullable, and not when > either of them is nullable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21631) Building Spark with SBT unsuccessful when source code in Mllib is modified, But with MVN is ok
Sean Wong created SPARK-21631: - Summary: Building Spark with SBT unsuccessful when source code in Mllib is modified, But with MVN is ok Key: SPARK-21631 URL: https://issues.apache.org/jira/browse/SPARK-21631 Project: Spark Issue Type: Bug Components: Build, MLlib Affects Versions: 2.1.1 Environment: ubuntu 14.04 Spark 2.1.1 MVN 3.3.9 scala 2.11.8 Reporter: Sean Wong I added import org.apache.spark.internal.Logging at the head of LinearRegression.scala file Then, I try to build Spark using SBT. However, here is the error: *[info] Done packaging. java.lang.RuntimeException: errors exist at scala.sys.package$.error(package.scala:27) at org.scalastyle.sbt.Tasks$.onHasErrors$1(Plugin.scala:132) at org.scalastyle.sbt.Tasks$.doScalastyleWithConfig$1(Plugin.scala:187) at org.scalastyle.sbt.Tasks$.doScalastyle(Plugin.scala:195) at SparkBuild$$anonfun$cachedScalaStyle$1$$anonfun$17.apply(SparkBuild.scala:205) at SparkBuild$$anonfun$cachedScalaStyle$1$$anonfun$17.apply(SparkBuild.scala:192) at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235) at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:235) at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:249) at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:245) at sbt.Difference.apply(Tracked.scala:224) at sbt.Difference.apply(Tracked.scala:206) at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:245) at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:244) at sbt.Difference.apply(Tracked.scala:224) at sbt.Difference.apply(Tracked.scala:200) at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:244) at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:242) at SparkBuild$$anonfun$cachedScalaStyle$1.apply(SparkBuild.scala:212) at SparkBuild$$anonfun$cachedScalaStyle$1.apply(SparkBuild.scala:187) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40) at sbt.std.Transform$$anon$4.work(System.scala:63) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17) at sbt.Execute.work(Execute.scala:237) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159) at sbt.CompletionService$$anon$2.call(CompletionService.scala:28) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [error] (mllib/*:scalaStyleOnCompile) errors exist* After this, I switch to use MVN to build Spark, Everything is ok and the building is successful. So is this a bug for SBT building? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
[ https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113731#comment-16113731 ] Arthur Rand commented on SPARK-20812: - https://github.com/apache/spark/pull/18837 > Add Mesos Secrets support to the spark dispatcher > - > > Key: SPARK-20812 > URL: https://issues.apache.org/jira/browse/SPARK-20812 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Michael Gummelt > > Mesos 1.4 will support secrets. In order to support sending keytabs through > the Spark Dispatcher, or any other secret, we need to integrate this with the > Spark Dispatcher. > The integration should include support for both file-based and env-based > secrets. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19870) Repeatable deadlock on BlockInfoManager and TorrentBroadcast
[ https://issues.apache.org/jira/browse/SPARK-19870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113727#comment-16113727 ] David Lewis commented on SPARK-19870: - I think I'm hitting a similar bug, here are two stack traces in the block manager, one waiting for read and one waiting for write: {code}java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:502) org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:236) org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1323) org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1314) org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1314) scala.collection.Iterator$class.foreach(Iterator.scala:893) scala.collection.AbstractIterator.foreach(Iterator.scala:1336) org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1314) org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:66) org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:66) org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:66) org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:82) scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617){code} and {code}java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:502) org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:236) org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1323) org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1314) org.apache.spark.storage.BlockManager$$anonfun$removeBroadcast$2.apply(BlockManager.scala:1314) scala.collection.Iterator$class.foreach(Iterator.scala:893) scala.collection.AbstractIterator.foreach(Iterator.scala:1336) org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1314) org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply$mcI$sp(BlockManagerSlaveEndpoint.scala:66) org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:66) org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$4.apply(BlockManagerSlaveEndpoint.scala:66) org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:82) scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748){code} > Repeatable deadlock on BlockInfoManager and TorrentBroadcast > > > Key: SPARK-19870 > URL: https://issues.apache.org/jira/browse/SPARK-19870 > Project: Spark > Issue Type: Bug > Components: Block Manager, Shuffle >Affects Versions: 2.0.2, 2.1.0 > Environment: ubuntu linux 14.04 x86_64 on ec2, hadoop cdh 5.10.0, > yarn coarse-grained. >Reporter: Steven Ruppert > Attachments: stack.txt > > > Running what I believe to be a fairly vanilla spark job, using the RDD api, > with several shuffles, a cached RDD, and finally a conversion to DataFrame to > save to parquet. I get a repeatable deadlock at the very last reducers of one > of the stages. > Roughly: > {noformat} > "Executor task launch worker-6" #56 daemon prio=5 os_prio=0 > tid=0x7fffd88d3000 nid=0x1022b9 waiting for monitor entry > [0x7fffb95f3000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:207) > - waiting to lock <0x0005445cfc00> (a > org.apache.spark.broadcast.TorrentBroadcast$) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) > - locked
[jira] [Commented] (SPARK-18406) Race between end-of-task and completion iterator read lock release
[ https://issues.apache.org/jira/browse/SPARK-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113702#comment-16113702 ] Taichi Sano commented on SPARK-18406: - Hello, I am experiencing an issue very similar to this. I am currently trying to do a groupByKeyAndWindow() with batch size of 1, window size of 80, and shift size of 1 from data that is being streamed from Kafka (ver 0.10) with Direct Streaming. Every once in a while, I encounter the AssertionError like so: 17/08/03 22:32:19 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 20936.0 (TID 4409) java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84) at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366) at scala.Option.foreach(Option.scala:257) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361) at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) 17/08/03 22:32:19 ERROR org.apache.spark.executor.Executor: Exception in task 0.1 in stage 20936.0 (TID 4410) java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84) at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366) at scala.Option.foreach(Option.scala:257) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361) at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) 17/08/03 22:32:19 ERROR org.apache.spark.util.Utils: Uncaught exception in thread stdout writer for /opt/conda/bin/python java.lang.AssertionError: assertion failed: Block rdd_30291_0 is not locked for reading at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299) at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:720) at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:516) at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509) at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333) at
[jira] [Commented] (SPARK-20853) spark.ui.reverseProxy=true leads to hanging communication to master
[ https://issues.apache.org/jira/browse/SPARK-20853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113670#comment-16113670 ] Josh Bacon commented on SPARK-20853: For the record, I'm experiencing the exact same behavior as described by [~tmckay]. If total number of workers + drivers exceed 9 (each with spark.ui.reverseProxy enabled), then the Master U.I. becomes unresponsive. Remove either workers or running drivers below the threshold, the Master U.I. will become responsive again. > spark.ui.reverseProxy=true leads to hanging communication to master > --- > > Key: SPARK-20853 > URL: https://issues.apache.org/jira/browse/SPARK-20853 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.1.0 > Environment: ppc64le GNU/Linux, POWER8, only master node is reachable > externally other nodes are in an internal network >Reporter: Benno Staebler > Labels: network, web-ui > > When *reverse proxy is enabled* > {quote} > spark.ui.reverseProxy=true > spark.ui.reverseProxyUrl=/ > {quote} > first of all any invocation of the spark master Web UI hangs forever locally > (e.g. http://192.168.10.16:25001) and via external URL without any data > received. > One, sometimes two spark applications succeed without error and than workers > start throwing exceptions: > {quote} > Caused by: java.io.IOException: Failed to connect to /192.168.10.16:25050 > {quote} > The application dies during creation of SparkContext: > {quote} > 2017-05-22 16:11:23 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting > to master spark://node0101:25000... > 2017-05-22 16:11:23 INFO TransportClientFactory:254 - Successfully created > connection to node0101/192.168.10.16:25000 after 169 ms (132 ms spent in > bootstraps) > 2017-05-22 16:11:43 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting > to master spark://node0101:25000... > 2017-05-22 16:12:03 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting > to master spark://node0101:25000... > 2017-05-22 16:12:23 ERROR StandaloneSchedulerBackend:70 - Application has > been killed. Reason: All masters are unresponsive! Giving up. > 2017-05-22 16:12:23 WARN StandaloneSchedulerBackend:66 - Application ID is > not initialized yet. > 2017-05-22 16:12:23 INFO Utils:54 - Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 25056. > . > Caused by: java.lang.IllegalArgumentException: requirement failed: Can only > call getServletHandlers on a running MetricsSystem > {quote} > *This definitively does not happen without reverse proxy enabled!* -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19112) add codec for ZStandard
[ https://issues.apache.org/jira/browse/SPARK-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113658#comment-16113658 ] Marcelo Vanzin commented on SPARK-19112: Yes, we can't merge the PR until Facebook re-licenses the code. > add codec for ZStandard > --- > > Key: SPARK-19112 > URL: https://issues.apache.org/jira/browse/SPARK-19112 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Thomas Graves >Priority: Minor > > ZStandard: https://github.com/facebook/zstd and > http://facebook.github.io/zstd/ has been in use for a while now. v1.0 was > recently released. Hadoop > (https://issues.apache.org/jira/browse/HADOOP-13578) and others > (https://issues.apache.org/jira/browse/KAFKA-4514) are adopting it. > Zstd seems to give great results => Gzip level Compression with Lz4 level CPU. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19112) add codec for ZStandard
[ https://issues.apache.org/jira/browse/SPARK-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113653#comment-16113653 ] Adam Kennedy commented on SPARK-19112: -- Will this be impacted by LEGAL-303? zstd-jni embeds zstd which has the Facebook PATENTS file in it. > add codec for ZStandard > --- > > Key: SPARK-19112 > URL: https://issues.apache.org/jira/browse/SPARK-19112 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Thomas Graves >Priority: Minor > > ZStandard: https://github.com/facebook/zstd and > http://facebook.github.io/zstd/ has been in use for a while now. v1.0 was > recently released. Hadoop > (https://issues.apache.org/jira/browse/HADOOP-13578) and others > (https://issues.apache.org/jira/browse/KAFKA-4514) are adopting it. > Zstd seems to give great results => Gzip level Compression with Lz4 level CPU. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21478) Unpersist a DF also unpersists related DFs
[ https://issues.apache.org/jira/browse/SPARK-21478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113635#comment-16113635 ] Roberto Mirizzi commented on SPARK-21478: - Hi [~smilegator] Is that documented somewhere? > Unpersist a DF also unpersists related DFs > -- > > Key: SPARK-21478 > URL: https://issues.apache.org/jira/browse/SPARK-21478 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0 >Reporter: Roberto Mirizzi > > Starting with Spark 2.1.1 I observed this bug. Here's are the steps to > reproduce it: > # create a DF > # persist it > # count the items in it > # create a new DF as a transformation of the previous one > # persist it > # count the items in it > # unpersist the first DF > Once you do that you will see that also the 2nd DF is gone. > The code to reproduce it is: > {code:java} > val x1 = Seq(1).toDF() > x1.persist() > x1.count() > assert(x1.storageLevel.useMemory) > val x11 = x1.select($"value" * 2) > x11.persist() > x11.count() > assert(x11.storageLevel.useMemory) > x1.unpersist() > assert(!x1.storageLevel.useMemory) > //the following assertion FAILS > assert(x11.storageLevel.useMemory) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21618) http(s) not accepted in spark-submit jar uri
[ https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21618. --- Resolution: Duplicate > http(s) not accepted in spark-submit jar uri > > > Key: SPARK-21618 > URL: https://issues.apache.org/jira/browse/SPARK-21618 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1, 2.2.0 > Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu > 16.04. >Reporter: Ben Mayne >Priority: Minor > Labels: documentation > > The documentation suggests I should be able to use an http(s) uri for a jar > in spark-submit, but I haven't been successful > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management > {noformat} > benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master > local[2] --class class.name.Test https://test.com/path/to/jar.jar > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > https > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at > org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > benmayne@Benjamins-MacBook-Pro ~ $ > {noformat} > If I replace the path with a valid hdfs path > (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the > same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 > on ubuntu. > this is the example that I'm trying to replicate from > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management: > > > Spark uses the following URL scheme to allow different strategies for > > disseminating jars: > > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file > > server, and every executor pulls the file from the driver HTTP server. > > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as > > expected > {noformat} > # Run on a Mesos cluster in cluster deploy mode with supervise > ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master mesos://207.184.161.138:7077 \ > --deploy-mode cluster \ > --supervise \ > --executor-memory 20G \ > --total-executor-cores 100 \ > http://path/to/examples.jar \ > 1000 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21595) introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow
[ https://issues.apache.org/jira/browse/SPARK-21595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113622#comment-16113622 ] Tejas Patil commented on SPARK-21595: - [~hvanhovell] : I am fine with either options you mentioned. one more option: Right now the (switch from in-memory to `UnsafeExternalSorter`) and (`UnsafeExternalSorter` spilling to disk) is controlled by a single threshold. If we de-couple those two using separate thresholds, then the "spill on memory pressure" behavior will be achieved. The threshold for in-memory can be kept small and keeping the spilling to disk higher will avoid excessive disk spills. This is fairly simple change to do. What do you think ? > introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 > breaks existing workflow > - > > Key: SPARK-21595 > URL: https://issues.apache.org/jira/browse/SPARK-21595 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 2.2.0 > Environment: pyspark on linux >Reporter: Stephan Reiling >Priority: Minor > Labels: documentation, regression > > My pyspark code has the following statement: > {code:java} > # assign row key for tracking > df = df.withColumn( > 'association_idx', > sqlf.row_number().over( > Window.orderBy('uid1', 'uid2') > ) > ) > {code} > where df is a long, skinny (450M rows, 10 columns) dataframe. So this creates > one large window for the whole dataframe to sort over. > In spark 2.1 this works without problem, in spark 2.2 this fails either with > out of memory exception or too many open files exception, depending on memory > settings (which is what I tried first to fix this). > Monitoring the blockmgr, I see that spark 2.1 creates 152 files, spark 2.2 > creates >110,000 files. > In the log I see the following messages (110,000 of these): > {noformat} > 17/08/01 08:55:37 INFO UnsafeExternalSorter: Spilling data because number of > spilledRecords crossed the threshold 4096 > 17/08/01 08:55:37 INFO UnsafeExternalSorter: Thread 156 spilling sort data of > 64.1 MB to disk (0 time so far) > 17/08/01 08:55:37 INFO UnsafeExternalSorter: Spilling data because number of > spilledRecords crossed the threshold 4096 > 17/08/01 08:55:37 INFO UnsafeExternalSorter: Thread 156 spilling sort data of > 64.1 MB to disk (1 time so far) > {noformat} > So I started hunting for clues in UnsafeExternalSorter, without luck. What I > had missed was this one message: > {noformat} > 17/08/01 08:55:37 INFO ExternalAppendOnlyUnsafeRowArray: Reached spill > threshold of 4096 rows, switching to > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter > {noformat} > Which allowed me to track down the issue. > By changing the configuration to include: > {code:java} > spark.sql.windowExec.buffer.spill.threshold 2097152 > {code} > I got it to work again and with the same performance as spark 2.1. > I have workflows where I use windowing functions that do not fail, but took a > performance hit due to the excessive spilling when using the default of 4096. > I think to make it easier to track down these issues this config variable > should be included in the configuration documentation. > Maybe 4096 is too small of a default value? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113605#comment-16113605 ] Mark Hamstra commented on SPARK-21619: -- But part of the point of the split in my half-baked example is to fork the query execution pipeline before physical plan generation, allowing the cost of that generation to be parallelized with an instance per execution engine. Yes, maybe doing dispatch of physical plans via the CBO or other means is all that I should realistically hope for, but it doesn't mean that it isn't worth thinking about alternatives. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21617) ALTER TABLE...ADD COLUMNS broken in Hive 2.1 for DS tables
[ https://issues.apache.org/jira/browse/SPARK-21617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-21617: --- Summary: ALTER TABLE...ADD COLUMNS broken in Hive 2.1 for DS tables (was: ALTER TABLE...ADD COLUMNS creates invalid metadata in Hive metastore for DS tables) > ALTER TABLE...ADD COLUMNS broken in Hive 2.1 for DS tables > -- > > Key: SPARK-21617 > URL: https://issues.apache.org/jira/browse/SPARK-21617 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Marcelo Vanzin > > When you have a data source table and you run a "ALTER TABLE...ADD COLUMNS" > query, Spark will save invalid metadata to the Hive metastore. > Namely, it will overwrite the table's schema with the data frame's schema; > that is not desired for data source tables (where the schema is stored in a > table property instead). > Moreover, if you use a newer metastore client where > METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES is on by default, you > actually get an exception: > {noformat} > InvalidOperationException(message:The following columns have types > incompatible with the existing columns in their respective positions : > c1) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.throwExceptionIfIncompatibleColTypeChange(MetaStoreUtils.java:615) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:133) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3704) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3675) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.alter_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:402) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:309) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy27.alter_table_with_environmentContext(Unknown > Source) > at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:601) > {noformat} > That exception is handled by Spark in an odd way (see code in > {{HiveExternalCatalog.scala}}) which still stores invalid metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21617) ALTER TABLE...ADD COLUMNS creates invalid metadata in Hive metastore for DS tables
[ https://issues.apache.org/jira/browse/SPARK-21617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113602#comment-16113602 ] Marcelo Vanzin commented on SPARK-21617: Here's the full test error from our internal build against 2.1: {noformat} 15:11:29.602 WARN org.apache.spark.sql.hive.test.TestHiveExternalCatalog: Could not alter schema of table `default`.`t1` in a Hive compatible way. Updating Hive metastore in Spark SQL specific format. [snip] Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions : c1 at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:624) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:602) - alter datasource table add columns - partitioned - csv *** FAILED *** org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: at least one column must be specified for the table; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:107) at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableSchema(HiveExternalCatalog.scala:656) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableSchema(SessionCatalog.scala:372) {noformat} So the exception above is just a warning, and the problem seems to actually be in how Spark is recovering from that situation (the exception handler in {{HiveExternalCatalog.alterTableSchema}}). > ALTER TABLE...ADD COLUMNS creates invalid metadata in Hive metastore for DS > tables > -- > > Key: SPARK-21617 > URL: https://issues.apache.org/jira/browse/SPARK-21617 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Marcelo Vanzin > > When you have a data source table and you run a "ALTER TABLE...ADD COLUMNS" > query, Spark will save invalid metadata to the Hive metastore. > Namely, it will overwrite the table's schema with the data frame's schema; > that is not desired for data source tables (where the schema is stored in a > table property instead). > Moreover, if you use a newer metastore client where > METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES is on by default, you > actually get an exception: > {noformat} > InvalidOperationException(message:The following columns have types > incompatible with the existing columns in their respective positions : > c1) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.throwExceptionIfIncompatibleColTypeChange(MetaStoreUtils.java:615) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:133) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3704) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3675) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.alter_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table_with_environmentContext(HiveMetaStoreClient.java:402) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table_with_environmentContext(SessionHiveMetaStoreClient.java:309) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy27.alter_table_with_environmentContext(Unknown > Source) > at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:601) > {noformat} > That exception is handled by Spark in an odd way (see code in > {{HiveExternalCatalog.scala}}) which still stores invalid metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113598#comment-16113598 ] Reynold Xin commented on SPARK-21619: - Just look at structured streaming. That eould be one example. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113593#comment-16113593 ] Reynold Xin commented on SPARK-21619: - Just generate different physical plan? > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113586#comment-16113586 ] Mark Hamstra commented on SPARK-21619: -- Or you can just enlighten me on how one should design a dispatch function for multiple expressions of semantically equivalent query plans under the current architecture. :) Dispatching based on a canonical form of a plan seems like an obvious solution to me, but maybe I'm missing something. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113579#comment-16113579 ] Reynold Xin commented on SPARK-21619: - Ok so we are good with this one. Sorry I don't see why this issue blocks or has any impact on supporting different execution engines. I have many prototypes done myself that does exactly what you were describing in the past under the current design. Maybe we just need to agree to disagree. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113555#comment-16113555 ] Mark Hamstra edited comment on SPARK-21619 at 8/3/17 10:01 PM: --- Yes, I absolutely understand that this issue and PR are meant to address an immediate need, and that a deeper redesign would be one or likely more separate issues. I'm more trying to raise awareness or improve my understanding than to delay or block progress on addressing the immediate need. I do have concerns, though, that making canonical plans unexecutable just because they are in canonical form does make certain evolutions of Spark more difficult. As one half-baked example, you could want to decouple query plans from a single execution engine, so that certain kinds of logical plans could be sent toward execution on one engine (or cluster configuration) while other plans could be directed to a separate engine (presumably more suitable to those plans in some way.) Splitting and forking Spark's query execution pipeline in that kind of way isn't really that difficult (I've done it in at least a proof-of-concept), and has some perhaps significant potential benefits. To do that, though, you'd really like to have a single, canonical form for any semantically equivalent queries by the time they reach your dispatch function for determining the destination execution engine for a query (and where results will be cached locally, etc.) Making the canonical form unexecutable throws a wrench into that. was (Author: markhamstra): Yes, I absolutely understand that this issue and PR are meant to address an immediate need, and that a deeper redesign would be one or likely more separate issues. I more trying to raise awareness or improve my understanding than to delay or block progress on addressing the immediate need. I do have concerns, though, that making canonical plans unexecutable just because they are in canonical form does make certain evolutions of Spark more difficult. As one half-baked example, you could want to decouple query plans from a single execution engine, so that certain kinds of logical plans could be sent toward execution on one engine (or cluster configuration) while other plans could be directed to a separate engine (presumably more suitable to those plans in some way.) Splitting and forking Spark's query execution pipeline in that kind of way isn't really that difficult (I've done it in at least a proof-of-concept), and has some perhaps significant potential benefits. To do that, though, you'd really like to have a single, canonical form for any semantically equivalent queries by the time they reach your dispatch function for determining the destination execution engine for a query (and where results will be cached locally, etc.) Making the canonical form unexecutable throws a wrench into that. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113571#comment-16113571 ] Mark Hamstra commented on SPARK-21619: -- _"Why would you want to execute multiple semantically equivalent plans in different forms?" -> Because they can be executed in different times, using different aliases, etc?_ Right, so for separate executions of semantically equivalent plans you need to maintain a mapping between the aliases of a particular plan and their canonical form, but after doing that you can more easily recover data and metadata associated with a prior execution of an equivalent plan. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113555#comment-16113555 ] Mark Hamstra commented on SPARK-21619: -- Yes, I absolutely understand that this issue and PR are meant to address an immediate need, and that a deeper redesign would be one or likely more separate issues. I more trying to raise awareness or improve my understanding than to delay or block progress on addressing the immediate need. I do have concerns, though, that making canonical plans unexecutable just because they are in canonical form does make certain evolutions of Spark more difficult. As one half-baked example, you could want to decouple query plans from a single execution engine, so that certain kinds of logical plans could be sent toward execution on one engine (or cluster configuration) while other plans could be directed to a separate engine (presumably more suitable to those plans in some way.) Splitting and forking Spark's query execution pipeline in that kind of way isn't really that difficult (I've done it in at least a proof-of-concept), and has some perhaps significant potential benefits. To do that, though, you'd really like to have a single, canonical form for any semantically equivalent queries by the time they reach your dispatch function for determining the destination execution engine for a query (and where results will be cached locally, etc.) Making the canonical form unexecutable throws a wrench into that. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113538#comment-16113538 ] Reynold Xin commented on SPARK-21619: - Also self-joins are very difficult to handle. They have different expression ids for resolution, even though on both sides of the join the plans (at least the subtrees) are semantically equivalent. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113536#comment-16113536 ] Reynold Xin commented on SPARK-21619: - Mark that's a great point but you are going into the existential question of how we should design query execution and potentially overthrow the entire architecture here. The way canonicalization is defined as is in Spark is that it is not meant for execution. This ticket simply enforces that with a few line of change. If we want to redesign how query execution should work (I don't see why we would want to since I don't see much real practical benefits given we already have sameResult and semanticHash), we should do it separately. "Why would you want to execute multiple semantically equivalent plans in different forms?" -> Because they can be executed in different times, using different aliases, etc? > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113526#comment-16113526 ] Mark Hamstra commented on SPARK-21619: -- Two reason, mostly: 1) To provide better guarantees that plans that are deemed to be semantically equivalent actually end up being expressed the same way before execution and thus go down the same code paths; 2) To simplify some downstream logic; so instead of needing to maintain a mapping between multiple, semantically equivalent plans and a single canonical form, after a certain canonicalization point the plans really are the same. To perhaps clear up my confusion, maybe you can answer the question going the other way: Why would you want to execute multiple semantically equivalent plans in different forms? > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21588) SQLContext.getConf(key, null) should return null, but it throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113519#comment-16113519 ] Burak Yavuz commented on SPARK-21588: - that's what I was proposing. `null` seemed more familiar than `` before I looked at the code. > SQLContext.getConf(key, null) should return null, but it throws NPE > --- > > Key: SPARK-21588 > URL: https://issues.apache.org/jira/browse/SPARK-21588 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Burak Yavuz >Priority: Minor > > SQLContext.get(key) for a key that is not defined in the conf, and doesn't > have a default value defined, throws a NoSuchElementException. In order to > avoid that, I used a null as the default value, which threw a NPE instead. If > it is null, it shouldn't try to parse the default value in `getConfString` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113516#comment-16113516 ] Reynold Xin commented on SPARK-21619: - Sorry I don't understand your question or point at all. Why should a plan be canonicalized before execution? > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21588) SQLContext.getConf(key, null) should return null, but it throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113515#comment-16113515 ] Anton Okolnychyi commented on SPARK-21588: -- Sure, but the converter will not be called if the default value that you pass is "". However, the check can be extended to `defaultValue != null && defaultValue != ""` in the SQLConf#getConfString. > SQLContext.getConf(key, null) should return null, but it throws NPE > --- > > Key: SPARK-21588 > URL: https://issues.apache.org/jira/browse/SPARK-21588 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Burak Yavuz >Priority: Minor > > SQLContext.get(key) for a key that is not defined in the conf, and doesn't > have a default value defined, throws a NoSuchElementException. In order to > avoid that, I used a null as the default value, which threw a NPE instead. If > it is null, it shouldn't try to parse the default value in `getConfString` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113510#comment-16113510 ] Mark Hamstra commented on SPARK-21619: -- Ok, but my point is that if plans are to be canonicalized for some reasons, maybe they should also be canonicalized before execution. It seems odd both to execute plans that are not in a canonical form and to not be able to execute plans that are in a canonical form. That view makes failing the execution of canonical plans look more like a workaround/hack (maybe needed in the short term) than a solution to a deeper issue. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113498#comment-16113498 ] Reynold Xin commented on SPARK-21619: - Canonicalized plan is used for semantic comparison. This has nothing to do with the actual blocking of execution of a query plan. This is to avoid some buggy code accidentally executing a canonicalized plan that is not meant for execution (and only for comparison), and leads to silent incorrect results or weird exceptions at runtime. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21619) Fail the execution of canonicalized plans explicitly
[ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113494#comment-16113494 ] Mark Hamstra commented on SPARK-21619: -- Can you provide a little more context, Reynold, since on its face it would seem that if plans are to be blocked from executing based on their form, then non-canonical plans would be the ones that should be blocked. > Fail the execution of canonicalized plans explicitly > > > Key: SPARK-21619 > URL: https://issues.apache.org/jira/browse/SPARK-21619 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Reynold Xin >Assignee: Reynold Xin > > Canonicalized plans are not supposed to be executed. I ran into a case in > which there's some code that accidentally calls execute on a canonicalized > plan. This patch throws a more explicit exception when that happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21630) Pmod should not throw a divide by zero exception
Herman van Hovell created SPARK-21630: - Summary: Pmod should not throw a divide by zero exception Key: SPARK-21630 URL: https://issues.apache.org/jira/browse/SPARK-21630 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0, 2.1.1, 2.0.2 Reporter: Herman van Hovell Pmod currently throws a divide by zero exception when the right input is 0. It should - like Divide or Remainder - probably return null. Here is a small reproducer: {noformat} scala> sql("select pmod(id, 0) from range(10)").show 17/08/03 22:36:43 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3) java.lang.ArithmeticException: / by zero {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21629) OR nullability is incorrect
Herman van Hovell created SPARK-21629: - Summary: OR nullability is incorrect Key: SPARK-21629 URL: https://issues.apache.org/jira/browse/SPARK-21629 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0, 2.1.1, 2.0.0 Reporter: Herman van Hovell Priority: Minor The SQL {{OR}} expression's nullability is slightly incorrect. It should only be nullable when both of the input expressions are nullable, and not when either of them is nullable. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21588) SQLContext.getConf(key, null) should return null, but it throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113422#comment-16113422 ] Burak Yavuz commented on SPARK-21588: - [~vinodkc] [~aokolnychyi] It happens when the config has a value converter, example `spark.sql.shuffle.partitions`. Basically any non-string sql conf. > SQLContext.getConf(key, null) should return null, but it throws NPE > --- > > Key: SPARK-21588 > URL: https://issues.apache.org/jira/browse/SPARK-21588 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Burak Yavuz >Priority: Minor > > SQLContext.get(key) for a key that is not defined in the conf, and doesn't > have a default value defined, throws a NoSuchElementException. In order to > avoid that, I used a null as the default value, which threw a NPE instead. If > it is null, it shouldn't try to parse the default value in `getConfString` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21588) SQLContext.getConf(key, null) should return null, but it throws NPE
[ https://issues.apache.org/jira/browse/SPARK-21588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113412#comment-16113412 ] Anton Okolnychyi commented on SPARK-21588: -- I did not manage to reproduce this. I tried: {code} spark.sqlContext.getConf("spark.sql.streaming.checkpointLocation", null) // null spark.sqlContext.getConf("spark.sql.thriftserver.scheduler.pool", null) // null spark.sqlContext.getConf("spark.sql.sources.outputCommitterClass", null) // null spark.sqlContext.getConf("blabla", null) // null spark.sqlContext.getConf("spark.sql.sources.outputCommitterClass") // {code} I got a NPE only when I called getConf(key, null) for a parameter with a default value. For example, {code} spark.sqlContext.getConf("spark.sql.thriftServer.incrementalCollect", "") // spark.sqlContext.getConf("spark.sql.thriftServer.incrementalCollect", null) // NPE {code} > SQLContext.getConf(key, null) should return null, but it throws NPE > --- > > Key: SPARK-21588 > URL: https://issues.apache.org/jira/browse/SPARK-21588 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Burak Yavuz >Priority: Minor > > SQLContext.get(key) for a key that is not defined in the conf, and doesn't > have a default value defined, throws a NoSuchElementException. In order to > avoid that, I used a null as the default value, which threw a NPE instead. If > it is null, it shouldn't try to parse the default value in `getConfString` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15799) Release SparkR on CRAN
[ https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113404#comment-16113404 ] Brendan Dwyer commented on SPARK-15799: --- Is there any update on this? > Release SparkR on CRAN > -- > > Key: SPARK-15799 > URL: https://issues.apache.org/jira/browse/SPARK-15799 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Xiangrui Meng > > Story: "As an R user, I would like to see SparkR released on CRAN, so I can > use SparkR easily in an existing R environment and have other packages built > on top of SparkR." > I made this JIRA with the following questions in mind: > * Are there known issues that prevent us releasing SparkR on CRAN? > * Do we want to package Spark jars in the SparkR release? > * Are there license issues? > * How does it fit into Spark's release process? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21595) introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow
[ https://issues.apache.org/jira/browse/SPARK-21595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113388#comment-16113388 ] Herman van Hovell commented on SPARK-21595: --- The old and the new code are not exactly the same. The old code path would start using a disk spilling buffer when a window would become larger than 4096 rows. The key difference is that old code path would not start to spill at that point, that would only happen when the Spark would get pressed for memory and the memory manager starts to force spills. The current version is overly active and starts spilling at a much earlier stage. We have seen similar problems with customer workloads on our end. We either need to set this to a more sensible default, or return this to the old behavior. > introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 > breaks existing workflow > - > > Key: SPARK-21595 > URL: https://issues.apache.org/jira/browse/SPARK-21595 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 2.2.0 > Environment: pyspark on linux >Reporter: Stephan Reiling >Priority: Minor > Labels: documentation, regression > > My pyspark code has the following statement: > {code:java} > # assign row key for tracking > df = df.withColumn( > 'association_idx', > sqlf.row_number().over( > Window.orderBy('uid1', 'uid2') > ) > ) > {code} > where df is a long, skinny (450M rows, 10 columns) dataframe. So this creates > one large window for the whole dataframe to sort over. > In spark 2.1 this works without problem, in spark 2.2 this fails either with > out of memory exception or too many open files exception, depending on memory > settings (which is what I tried first to fix this). > Monitoring the blockmgr, I see that spark 2.1 creates 152 files, spark 2.2 > creates >110,000 files. > In the log I see the following messages (110,000 of these): > {noformat} > 17/08/01 08:55:37 INFO UnsafeExternalSorter: Spilling data because number of > spilledRecords crossed the threshold 4096 > 17/08/01 08:55:37 INFO UnsafeExternalSorter: Thread 156 spilling sort data of > 64.1 MB to disk (0 time so far) > 17/08/01 08:55:37 INFO UnsafeExternalSorter: Spilling data because number of > spilledRecords crossed the threshold 4096 > 17/08/01 08:55:37 INFO UnsafeExternalSorter: Thread 156 spilling sort data of > 64.1 MB to disk (1 time so far) > {noformat} > So I started hunting for clues in UnsafeExternalSorter, without luck. What I > had missed was this one message: > {noformat} > 17/08/01 08:55:37 INFO ExternalAppendOnlyUnsafeRowArray: Reached spill > threshold of 4096 rows, switching to > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter > {noformat} > Which allowed me to track down the issue. > By changing the configuration to include: > {code:java} > spark.sql.windowExec.buffer.spill.threshold 2097152 > {code} > I got it to work again and with the same performance as spark 2.1. > I have workflows where I use windowing functions that do not fail, but took a > performance hit due to the excessive spilling when using the default of 4096. > I think to make it easier to track down these issues this config variable > should be included in the configuration documentation. > Maybe 4096 is too small of a default value? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21097) Dynamic allocation will preserve cached data
[ https://issues.apache.org/jira/browse/SPARK-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113264#comment-16113264 ] Brad commented on SPARK-21097: -- I'm still working on thoroughly benchmarking and testing this change. If anyone is interested in this, send me a message. Thanks > Dynamic allocation will preserve cached data > > > Key: SPARK-21097 > URL: https://issues.apache.org/jira/browse/SPARK-21097 > Project: Spark > Issue Type: Improvement > Components: Block Manager, Scheduler, Spark Core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Brad > Attachments: Preserving Cached Data with Dynamic Allocation.pdf > > > We want to use dynamic allocation to distribute resources among many notebook > users on our spark clusters. One difficulty is that if a user has cached data > then we are either prevented from de-allocating any of their executors, or we > are forced to drop their cached data, which can lead to a bad user experience. > We propose adding a feature to preserve cached data by copying it to other > executors before de-allocation. This behavior would be enabled by a simple > spark config like "spark.dynamicAllocation.recoverCachedData". Now when an > executor reaches its configured idle timeout, instead of just killing it on > the spot, we will stop sending it new tasks, replicate all of its rdd blocks > onto other executors, and then kill it. If there is an issue while we > replicate the data, like an error, it takes too long, or there isn't enough > space, then we will fall back to the original behavior and drop the data and > kill the executor. > This feature should allow anyone with notebook users to use their cluster > resources more efficiently. Also since it will be completely opt-in it will > unlikely to cause problems for other use cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21453) Cached Kafka consumer may be closed too early
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21453: - Summary: Cached Kafka consumer may be closed too early (was: Streaming kafka source (structured spark)) > Cached Kafka consumer may be closed too early > - > > Key: SPARK-21453 > URL: https://issues.apache.org/jira/browse/SPARK-21453 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.0 > Environment: Spark 2.2.0 and kafka 0.10.2.0 >Reporter: Pablo Panero >Priority: Minor > > On a streaming job using built-in kafka source and sink (over SSL), with I > am getting the following exception: > Config of the source: > {code:java} > val df = spark.readStream > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("failOnDataLoss", value = false) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .option("subscribe", config.topicConfigList.keys.mkString(",")) > .load() > {code} > Config of the sink: > {code:java} > .writeStream > .option("checkpointLocation", > s"${config.checkpointDir}/${topicConfig._1}/") > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .start() > {code} > {code:java} > 17/07/18 10:11:58 WARN SslTransportLayer: Failed to send SSL Close message > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195) > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163) > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731) > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54) > at org.apache.kafka.common.network.Selector.doClose(Selector.java:540) > at org.apache.kafka.common.network.Selector.close(Selector.java:531) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378) > at org.apache.kafka.common.network.Selector.poll(Selector.java:303) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1047) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:298) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.org$apache$spark$sql$kafka010$CachedKafkaConsumer$$fetchData(CachedKafkaConsumer.scala:206) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:117) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:106) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:85) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.runUninterruptiblyIfPossible(CachedKafkaConsumer.scala:68) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:106) > at > org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:157) > at >
[jira] [Assigned] (SPARK-20713) Speculative task that got CommitDenied exception shows up as failed
[ https://issues.apache.org/jira/browse/SPARK-20713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-20713: - Assignee: (was: Nuochen Lyu) > Speculative task that got CommitDenied exception shows up as failed > --- > > Key: SPARK-20713 > URL: https://issues.apache.org/jira/browse/SPARK-20713 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Thomas Graves > Fix For: 2.3.0 > > > When running speculative tasks you can end up getting a task failure on a > speculative task (the other task succeeded) because that task got a > CommitDenied exception when really it was "killed" by the driver. It is a > race between when the driver kills and when the executor tries to commit. > I think ideally we should fix up the task state on this to be killed because > the fact that this task failed doesn't matter since the other speculative > task succeeded. tasks showing up as failure confuse the user and could make > other scheduler cases harder. > This is somewhat related to SPARK-13343 where I think we should be correctly > account for speculative tasks. only one of the 2 tasks really succeeded and > commited, and the other should be marked differently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20713) Speculative task that got CommitDenied exception shows up as failed
[ https://issues.apache.org/jira/browse/SPARK-20713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-20713: - Assignee: Nuochen Lyu > Speculative task that got CommitDenied exception shows up as failed > --- > > Key: SPARK-20713 > URL: https://issues.apache.org/jira/browse/SPARK-20713 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Thomas Graves >Assignee: Nuochen Lyu > Fix For: 2.3.0 > > > When running speculative tasks you can end up getting a task failure on a > speculative task (the other task succeeded) because that task got a > CommitDenied exception when really it was "killed" by the driver. It is a > race between when the driver kills and when the executor tries to commit. > I think ideally we should fix up the task state on this to be killed because > the fact that this task failed doesn't matter since the other speculative > task succeeded. tasks showing up as failure confuse the user and could make > other scheduler cases harder. > This is somewhat related to SPARK-13343 where I think we should be correctly > account for speculative tasks. only one of the 2 tasks really succeeded and > commited, and the other should be marked differently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-20713) Speculative task that got CommitDenied exception shows up as failed
[ https://issues.apache.org/jira/browse/SPARK-20713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-20713. --- Resolution: Fixed Assignee: Nuochen Lyu Fix Version/s: 2.3.0 > Speculative task that got CommitDenied exception shows up as failed > --- > > Key: SPARK-20713 > URL: https://issues.apache.org/jira/browse/SPARK-20713 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Thomas Graves >Assignee: Nuochen Lyu > Fix For: 2.3.0 > > > When running speculative tasks you can end up getting a task failure on a > speculative task (the other task succeeded) because that task got a > CommitDenied exception when really it was "killed" by the driver. It is a > race between when the driver kills and when the executor tries to commit. > I think ideally we should fix up the task state on this to be killed because > the fact that this task failed doesn't matter since the other speculative > task succeeded. tasks showing up as failure confuse the user and could make > other scheduler cases harder. > This is somewhat related to SPARK-13343 where I think we should be correctly > account for speculative tasks. only one of the 2 tasks really succeeded and > commited, and the other should be marked differently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21453) Streaming kafka source (structured spark)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113224#comment-16113224 ] Shixiong Zhu commented on SPARK-21453: -- Reopened this one. There might be some bug in caching Kafka consumers. [~ppanero] could you provide the logs, please? > Streaming kafka source (structured spark) > - > > Key: SPARK-21453 > URL: https://issues.apache.org/jira/browse/SPARK-21453 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.0 > Environment: Spark 2.2.0 and kafka 0.10.2.0 >Reporter: Pablo Panero >Priority: Minor > > On a streaming job using built-in kafka source and sink (over SSL), with I > am getting the following exception: > Config of the source: > {code:java} > val df = spark.readStream > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("failOnDataLoss", value = false) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .option("subscribe", config.topicConfigList.keys.mkString(",")) > .load() > {code} > Config of the sink: > {code:java} > .writeStream > .option("checkpointLocation", > s"${config.checkpointDir}/${topicConfig._1}/") > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .start() > {code} > {code:java} > 17/07/18 10:11:58 WARN SslTransportLayer: Failed to send SSL Close message > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195) > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163) > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731) > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54) > at org.apache.kafka.common.network.Selector.doClose(Selector.java:540) > at org.apache.kafka.common.network.Selector.close(Selector.java:531) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378) > at org.apache.kafka.common.network.Selector.poll(Selector.java:303) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1047) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:298) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.org$apache$spark$sql$kafka010$CachedKafkaConsumer$$fetchData(CachedKafkaConsumer.scala:206) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:117) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:106) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:85) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.runUninterruptiblyIfPossible(CachedKafkaConsumer.scala:68) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:106) > at >
[jira] [Reopened] (SPARK-21453) Streaming kafka source (structured spark)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-21453: -- > Streaming kafka source (structured spark) > - > > Key: SPARK-21453 > URL: https://issues.apache.org/jira/browse/SPARK-21453 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.0 > Environment: Spark 2.2.0 and kafka 0.10.2.0 >Reporter: Pablo Panero >Priority: Minor > > On a streaming job using built-in kafka source and sink (over SSL), with I > am getting the following exception: > Config of the source: > {code:java} > val df = spark.readStream > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("failOnDataLoss", value = false) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .option("subscribe", config.topicConfigList.keys.mkString(",")) > .load() > {code} > Config of the sink: > {code:java} > .writeStream > .option("checkpointLocation", > s"${config.checkpointDir}/${topicConfig._1}/") > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .start() > {code} > {code:java} > 17/07/18 10:11:58 WARN SslTransportLayer: Failed to send SSL Close message > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195) > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163) > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731) > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54) > at org.apache.kafka.common.network.Selector.doClose(Selector.java:540) > at org.apache.kafka.common.network.Selector.close(Selector.java:531) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378) > at org.apache.kafka.common.network.Selector.poll(Selector.java:303) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1047) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:298) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.org$apache$spark$sql$kafka010$CachedKafkaConsumer$$fetchData(CachedKafkaConsumer.scala:206) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:117) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:106) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:85) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.runUninterruptiblyIfPossible(CachedKafkaConsumer.scala:68) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:106) > at > org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:157) > at > org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:148) > at
[jira] [Commented] (SPARK-21453) Streaming kafka source (structured spark)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113220#comment-16113220 ] Shixiong Zhu commented on SPARK-21453: -- [~ppanero] could you create a new ticket for the Kafka producer issue? > Streaming kafka source (structured spark) > - > > Key: SPARK-21453 > URL: https://issues.apache.org/jira/browse/SPARK-21453 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.0 > Environment: Spark 2.2.0 and kafka 0.10.2.0 >Reporter: Pablo Panero >Priority: Minor > > On a streaming job using built-in kafka source and sink (over SSL), with I > am getting the following exception: > Config of the source: > {code:java} > val df = spark.readStream > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("failOnDataLoss", value = false) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .option("subscribe", config.topicConfigList.keys.mkString(",")) > .load() > {code} > Config of the sink: > {code:java} > .writeStream > .option("checkpointLocation", > s"${config.checkpointDir}/${topicConfig._1}/") > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .start() > {code} > {code:java} > 17/07/18 10:11:58 WARN SslTransportLayer: Failed to send SSL Close message > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195) > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163) > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731) > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54) > at org.apache.kafka.common.network.Selector.doClose(Selector.java:540) > at org.apache.kafka.common.network.Selector.close(Selector.java:531) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378) > at org.apache.kafka.common.network.Selector.poll(Selector.java:303) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1047) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:298) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.org$apache$spark$sql$kafka010$CachedKafkaConsumer$$fetchData(CachedKafkaConsumer.scala:206) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:117) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:106) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:85) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.runUninterruptiblyIfPossible(CachedKafkaConsumer.scala:68) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:106) > at > org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:157) > at >
[jira] [Commented] (SPARK-21453) Streaming kafka source (structured spark)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113218#comment-16113218 ] Shixiong Zhu commented on SPARK-21453: -- I'm aware of the Kafka producer issue. Right now a workaround is increasing "spark.kafka.producer.cache.timeout" to a large enough value to avoid Spark closing an in-used Kafka producer. > Streaming kafka source (structured spark) > - > > Key: SPARK-21453 > URL: https://issues.apache.org/jira/browse/SPARK-21453 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.2.0 > Environment: Spark 2.2.0 and kafka 0.10.2.0 >Reporter: Pablo Panero >Priority: Minor > > On a streaming job using built-in kafka source and sink (over SSL), with I > am getting the following exception: > Config of the source: > {code:java} > val df = spark.readStream > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("failOnDataLoss", value = false) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .option("subscribe", config.topicConfigList.keys.mkString(",")) > .load() > {code} > Config of the sink: > {code:java} > .writeStream > .option("checkpointLocation", > s"${config.checkpointDir}/${topicConfig._1}/") > .format("kafka") > .option("kafka.bootstrap.servers", config.bootstrapServers) > .option("kafka.connections.max.idle.ms", 360) > //SSL: this only applies to communication between Spark and Kafka > brokers; you are still responsible for separately securing Spark inter-node > communication. > .option("kafka.security.protocol", "SASL_SSL") > .option("kafka.sasl.mechanism", "GSSAPI") > .option("kafka.sasl.kerberos.service.name", "kafka") > .option("kafka.ssl.truststore.location", "/etc/pki/java/cacerts") > .option("kafka.ssl.truststore.password", "changeit") > .start() > {code} > {code:java} > 17/07/18 10:11:58 WARN SslTransportLayer: Failed to send SSL Close message > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:195) > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:163) > at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:731) > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:54) > at org.apache.kafka.common.network.Selector.doClose(Selector.java:540) > at org.apache.kafka.common.network.Selector.close(Selector.java:531) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:378) > at org.apache.kafka.common.network.Selector.poll(Selector.java:303) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:349) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:226) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1047) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:995) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.poll(CachedKafkaConsumer.scala:298) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.org$apache$spark$sql$kafka010$CachedKafkaConsumer$$fetchData(CachedKafkaConsumer.scala:206) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:117) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$get$1.apply(CachedKafkaConsumer.scala:106) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:85) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.runUninterruptiblyIfPossible(CachedKafkaConsumer.scala:68) > at > org.apache.spark.sql.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:106) > at >
[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins
[ https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113145#comment-16113145 ] Felix Cheung commented on SPARK-21367: -- still seeing it Warning messages: 1: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 2: In check_dep_version(pkg, version, compare) : Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 * installing *source* package 'SparkR' ... https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80213/console > R older version of Roxygen2 on Jenkins > -- > > Key: SPARK-21367 > URL: https://issues.apache.org/jira/browse/SPARK-21367 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Felix Cheung >Assignee: shane knapp > Attachments: R.paks > > > Getting this message from a recent build. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console > Warning messages: > 1: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > 2: In check_dep_version(pkg, version, compare) : > Need roxygen2 >= 5.0.0 but loaded version is 4.1.1 > * installing *source* package 'SparkR' ... > ** R > We have been running with 5.0.1 and haven't changed for a year. > NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21618) http(s) not accepted in spark-submit jar uri
[ https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113133#comment-16113133 ] John Zhuge commented on SPARK-21618: We have not backported HADOOP-14383 to CDH5. > http(s) not accepted in spark-submit jar uri > > > Key: SPARK-21618 > URL: https://issues.apache.org/jira/browse/SPARK-21618 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1, 2.2.0 > Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu > 16.04. >Reporter: Ben Mayne >Priority: Minor > Labels: documentation > > The documentation suggests I should be able to use an http(s) uri for a jar > in spark-submit, but I haven't been successful > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management > {noformat} > benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master > local[2] --class class.name.Test https://test.com/path/to/jar.jar > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > https > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at > org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > benmayne@Benjamins-MacBook-Pro ~ $ > {noformat} > If I replace the path with a valid hdfs path > (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the > same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 > on ubuntu. > this is the example that I'm trying to replicate from > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management: > > > Spark uses the following URL scheme to allow different strategies for > > disseminating jars: > > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file > > server, and every executor pulls the file from the driver HTTP server. > > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as > > expected > {noformat} > # Run on a Mesos cluster in cluster deploy mode with supervise > ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master mesos://207.184.161.138:7077 \ > --deploy-mode cluster \ > --supervise \ > --executor-memory 20G \ > --total-executor-cores 100 \ > http://path/to/examples.jar \ > 1000 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18838) High latency of event processing for large jobs
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113067#comment-16113067 ] Miles Crawford edited comment on SPARK-18838 at 8/3/17 5:13 PM: We have disabled our eventlog listener, which is unfortunate, but seemed to help a lot. Nevertheless, we still get dropped events, which causes the UI to screw up, jobs to hang, and so forth. Can we do anything to identify which listener is backing up? Are there any workarounds for this issue? was (Author: milesc): We have disabled our eventlog listener, which is unfortunate, but seemed to help alot. Nevertheless, we still get dropped events, which causes the UI to screw up, jobs to hang, and so forth. Can we do anything to identify which listener is backing up? Are there any workarounds for this issue? > High latency of event processing for large jobs > --- > > Key: SPARK-18838 > URL: https://issues.apache.org/jira/browse/SPARK-18838 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Sital Kedia > Attachments: perfResults.pdf, SparkListernerComputeTime.xlsx > > > Currently we are observing the issue of very high event processing delay in > driver's `ListenerBus` for large jobs with many tasks. Many critical > component of the scheduler like `ExecutorAllocationManager`, > `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might > hurt the job performance significantly or even fail the job. For example, a > significant delay in receiving the `SparkListenerTaskStart` might cause > `ExecutorAllocationManager` manager to mistakenly remove an executor which is > not idle. > The problem is that the event processor in `ListenerBus` is a single thread > which loops through all the Listeners for each event and processes each event > synchronously > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94. > This single threaded processor often becomes the bottleneck for large jobs. > Also, if one of the Listener is very slow, all the listeners will pay the > price of delay incurred by the slow listener. In addition to that a slow > listener can cause events to be dropped from the event queue which might be > fatal to the job. > To solve the above problems, we propose to get rid of the event queue and the > single threaded event processor. Instead each listener will have its own > dedicate single threaded executor service . When ever an event is posted, it > will be submitted to executor service of all the listeners. The Single > threaded executor service will guarantee in order processing of the events > per listener. The queue used for the executor service will be bounded to > guarantee we do not grow the memory indefinitely. The downside of this > approach is separate event queue per listener will increase the driver memory > footprint. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17669) Strange behavior using Datasets
[ https://issues.apache.org/jira/browse/SPARK-17669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113083#comment-16113083 ] Miles Crawford commented on SPARK-17669: This UI behavior is caused by SPARK-18838 - events are being dropped so the UI cannot show accurate status. > Strange behavior using Datasets > --- > > Key: SPARK-17669 > URL: https://issues.apache.org/jira/browse/SPARK-17669 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.0.0 >Reporter: Miles Crawford > > I recently migrated my application to Spark 2.0, and everything worked well, > except for one function that uses "toDS" and the ML libraries. > This stage used to complete in 15 minutes or so on 1.6.2, and now takes > almost two hours. > The UI shows very strange behavior - completed stages still being worked on, > concurrent work on tons of stages, including ones from downstream jobs: > https://dl.dropboxusercontent.com/u/231152/spark.png > The only source change I made was changing "toDF" to "toDS()" before handing > my RDDs to the ML libraries. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113067#comment-16113067 ] Miles Crawford commented on SPARK-18838: We have disabled our eventlog listener, which is unfortunate, but seemed to help alot. Nevertheless, we still get dropped events, which causes the UI to screw up, jobs to hang, and so forth. Can we do anything to identify which listener is backing up? > High latency of event processing for large jobs > --- > > Key: SPARK-18838 > URL: https://issues.apache.org/jira/browse/SPARK-18838 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Sital Kedia > Attachments: perfResults.pdf, SparkListernerComputeTime.xlsx > > > Currently we are observing the issue of very high event processing delay in > driver's `ListenerBus` for large jobs with many tasks. Many critical > component of the scheduler like `ExecutorAllocationManager`, > `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might > hurt the job performance significantly or even fail the job. For example, a > significant delay in receiving the `SparkListenerTaskStart` might cause > `ExecutorAllocationManager` manager to mistakenly remove an executor which is > not idle. > The problem is that the event processor in `ListenerBus` is a single thread > which loops through all the Listeners for each event and processes each event > synchronously > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94. > This single threaded processor often becomes the bottleneck for large jobs. > Also, if one of the Listener is very slow, all the listeners will pay the > price of delay incurred by the slow listener. In addition to that a slow > listener can cause events to be dropped from the event queue which might be > fatal to the job. > To solve the above problems, we propose to get rid of the event queue and the > single threaded event processor. Instead each listener will have its own > dedicate single threaded executor service . When ever an event is posted, it > will be submitted to executor service of all the listeners. The Single > threaded executor service will guarantee in order processing of the events > per listener. The queue used for the executor service will be bounded to > guarantee we do not grow the memory indefinitely. The downside of this > approach is separate event queue per listener will increase the driver memory > footprint. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18838) High latency of event processing for large jobs
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113067#comment-16113067 ] Miles Crawford edited comment on SPARK-18838 at 8/3/17 4:51 PM: We have disabled our eventlog listener, which is unfortunate, but seemed to help alot. Nevertheless, we still get dropped events, which causes the UI to screw up, jobs to hang, and so forth. Can we do anything to identify which listener is backing up? Are there any workarounds for this issue? was (Author: milesc): We have disabled our eventlog listener, which is unfortunate, but seemed to help alot. Nevertheless, we still get dropped events, which causes the UI to screw up, jobs to hang, and so forth. Can we do anything to identify which listener is backing up? > High latency of event processing for large jobs > --- > > Key: SPARK-18838 > URL: https://issues.apache.org/jira/browse/SPARK-18838 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Sital Kedia > Attachments: perfResults.pdf, SparkListernerComputeTime.xlsx > > > Currently we are observing the issue of very high event processing delay in > driver's `ListenerBus` for large jobs with many tasks. Many critical > component of the scheduler like `ExecutorAllocationManager`, > `HeartbeatReceiver` depend on the `ListenerBus` events and this delay might > hurt the job performance significantly or even fail the job. For example, a > significant delay in receiving the `SparkListenerTaskStart` might cause > `ExecutorAllocationManager` manager to mistakenly remove an executor which is > not idle. > The problem is that the event processor in `ListenerBus` is a single thread > which loops through all the Listeners for each event and processes each event > synchronously > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L94. > This single threaded processor often becomes the bottleneck for large jobs. > Also, if one of the Listener is very slow, all the listeners will pay the > price of delay incurred by the slow listener. In addition to that a slow > listener can cause events to be dropped from the event queue which might be > fatal to the job. > To solve the above problems, we propose to get rid of the event queue and the > single threaded event processor. Instead each listener will have its own > dedicate single threaded executor service . When ever an event is posted, it > will be submitted to executor service of all the listeners. The Single > threaded executor service will guarantee in order processing of the events > per listener. The queue used for the executor service will be bounded to > guarantee we do not grow the memory indefinitely. The downside of this > approach is separate event queue per listener will increase the driver memory > footprint. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21599) Collecting column statistics for datasource tables may fail with java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-21599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21599. - Resolution: Fixed Fix Version/s: 2.3.0 > Collecting column statistics for datasource tables may fail with > java.util.NoSuchElementException > - > > Key: SPARK-21599 > URL: https://issues.apache.org/jira/browse/SPARK-21599 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Dilip Biswal >Assignee: Dilip Biswal > Fix For: 2.3.0 > > > Collecting column level statistics for non compatible hive tables using > {code} > ANALYZE TABLE FOR COLUMNS > {code} > may fail with the following exception. > {code} > key not found: a > java.util.NoSuchElementException: key not found: a > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:59) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:657) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:656) > at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:656) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) > at > org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21598) Collect usability/events information from Spark History Server
[ https://issues.apache.org/jira/browse/SPARK-21598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113040#comment-16113040 ] Eric Vandenberg commented on SPARK-21598: - [~steve_l] Do you have any input / thoughts here? The goal here is to collect more information than is available in typical metrics. I would like to directly correlate the replay times with other replay activity attributes like job size, user impact (ie, was user waiting for a response in real time?), etc. This is usability more than operational, this information would make it be easier to target and measure specific improvements to the spark history server user experience. We often internal users who complain on history server performance and need a way to directly reference / understand their experience since spark history server is critical for our internal debugging. If there's a way to capture this information using metrics alone would like to like to learn more but from my understanding they aren't designed to capture this level of information. > Collect usability/events information from Spark History Server > -- > > Key: SPARK-21598 > URL: https://issues.apache.org/jira/browse/SPARK-21598 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.0.2 >Reporter: Eric Vandenberg >Priority: Minor > > The Spark History Server doesn't currently have a way to collect > usability/performance on its main activity, loading/replay of history files. > We'd like to collect this information to monitor, target and measure > improvements in the spark debugging experience (via history server usage.) > Once available these usability events could be analyzed using other analytics > tools. > The event info to collect: > SparkHistoryReplayEvent( > logPath: String, > logCompressionType: String, > logReplayException: String // if an error > logReplayAction: String // user replay, vs checkForLogs replay > logCompleteFlag: Boolean, > logFileSize: Long, > logFileSizeUncompressed: Long, > logLastModifiedTimestamp: Long, > logCreationTimestamp: Long, > logJobId: Long, > logNumEvents: Int, > logNumStages: Int, > logNumTasks: Int > logReplayDurationMillis: Long > ) > The main spark engine has a SparkListenerInterface through which all compute > engine events are broadcast. It probably doesn't make sense to reuse this > abstraction for broadcasting spark history server events since the "events" > are not related or compatible with one another. Also note the metrics > registry collects history caching metrics but doesn't provide the type of > above information. > Proposal here would be to add some basic event listener infrastructure to > capture history server activity events. This would work similar to how the > SparkListener infrastructure works. It could be configured in a similar > manner, eg. spark.history.listeners=MyHistoryListenerClass. > Open to feedback / suggestions / comments on the approach or alternatives. > cc: [~vanzin] [~cloud_fan] [~ajbozarth] [~jiangxb1987] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version
[ https://issues.apache.org/jira/browse/SPARK-21143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112974#comment-16112974 ] Sean Owen commented on SPARK-21143: --- [~baz33] please see the JIRA and make the change if you want to see it done. > Fail to fetch blocks >1MB in size in presence of conflicting Netty version > -- > > Key: SPARK-21143 > URL: https://issues.apache.org/jira/browse/SPARK-21143 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Ryan Williams >Priority: Minor > > One of my spark libraries inherited a transitive-dependency on Netty > 4.1.6.Final (vs. Spark's 4.0.42.Final), and I observed a strange failure I > wanted to document: fetches of blocks larger than 1MB (pre-compression, > afaict) seem to trigger a code path that results in {{AbstractMethodError}}'s > and ultimately stage failures. > I put a minimal repro in [this github > repo|https://github.com/ryan-williams/spark-bugs/tree/netty]: {{collect}} on > a 1-partition RDD with 1032 {{Array\[Byte\]}}'s of size 1000 works, but at > 1033 {{Array}}'s it dies in a confusing way. > Not sure what fixing/mitigating this in Spark would look like, other than > defensively shading+renaming netty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19552) Upgrade Netty version to 4.1.8 final
[ https://issues.apache.org/jira/browse/SPARK-19552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112973#comment-16112973 ] Sean Owen commented on SPARK-19552: --- You will still have to make Spark work with 4.1.x even if it's shaded, but you're welcome to do that. I think the linked PR above did that, and may still accomplish the necessary changes. We'd have to figure out whether it breaks any user code too. But yeah shading is probably the way to go, as with jetty. > Upgrade Netty version to 4.1.8 final > > > Key: SPARK-19552 > URL: https://issues.apache.org/jira/browse/SPARK-19552 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.1.0 >Reporter: Adam Roberts >Priority: Minor > > Netty 4.1.8 was recently released but isn't API compatible with previous > major versions (like Netty 4.0.x), see > http://netty.io/news/2017/01/30/4-0-44-Final-4-1-8-Final.html for details. > This version does include a fix for a security concern but not one we'd be > exposed to with Spark "out of the box". Let's upgrade the version we use to > be on the safe side as the security fix I'm especially interested in is not > available in the 4.0.x release line. > We should move up anyway to take on a bunch of other big fixes cited in the > release notes (and if anyone were to use Spark with netty and tcnative, they > shouldn't be exposed to the security problem) - we should be good citizens > and make this change. > As this 4.1 version involves API changes we'll need to implement a few > methods and possibly adjust the Sasl tests. This JIRA and associated pull > request starts the process which I'll work on - and any help would be much > appreciated! Currently I know: > {code} > @Override > public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise > promise) > throws Exception { > if (!foundEncryptionHandler) { > foundEncryptionHandler = > ctx.channel().pipeline().get(encryptHandlerName) != null; <-- this > returns false and causes test failures > } > ctx.write(msg, promise); > } > {code} > Here's what changes will be required (at least): > {code} > common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java{code} > requires touch, retain and transferred methods > {code} > common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java{code} > requires the above methods too > {code}common/network-common/src/test/java/org/apache/spark/network/protocol/MessageWithHeaderSuite.java{code} > With "dummy" implementations so we can at least compile and test, we'll see > five new test failures to address. > These are > {code} > org.apache.spark.network.sasl.SparkSaslSuite.testFileRegionEncryption > org.apache.spark.network.sasl.SparkSaslSuite.testSaslEncryption > org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testEncryption > org.apache.spark.rpc.netty.NettyRpcEnvSuite.send with SASL encryption > org.apache.spark.rpc.netty.NettyRpcEnvSuite.ask with SASL encryption > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21628) Explicitly specify Java version in maven compiler plugin so IntelliJ imports project correctly
[ https://issues.apache.org/jira/browse/SPARK-21628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21628. --- Resolution: Duplicate > Explicitly specify Java version in maven compiler plugin so IntelliJ imports > project correctly > -- > > Key: SPARK-21628 > URL: https://issues.apache.org/jira/browse/SPARK-21628 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.2.0 >Reporter: Andrew Ray >Priority: Minor > > see > https://stackoverflow.com/questions/27037657/stop-intellij-idea-to-switch-java-language-level-every-time-the-pom-is-reloaded -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21627) analyze hive table compute stats for columns with mixed case exception
[ https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated SPARK-21627: Summary: analyze hive table compute stats for columns with mixed case exception (was: hive compute stats for columns exception with column name camel case) > analyze hive table compute stats for columns with mixed case exception > -- > > Key: SPARK-21627 > URL: https://issues.apache.org/jira/browse/SPARK-21627 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bogdan Raducanu > > {code} > sql("create table tabel1(b int) partitioned by (partColumn int)") > sql("analyze table tabel1 compute statistics for columns partColumn, b") > {code} > {code} > java.util.NoSuchElementException: key not found: partColumn > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:59) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) > at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) > at > org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636) > ... 39 elided > {code} > Looks like regression introduced by https://github.com/apache/spark/pull/18248 > In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower > case column names. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21628) Explicitly specify Java version in maven compiler plugin so IntelliJ imports project correctly
Andrew Ray created SPARK-21628: -- Summary: Explicitly specify Java version in maven compiler plugin so IntelliJ imports project correctly Key: SPARK-21628 URL: https://issues.apache.org/jira/browse/SPARK-21628 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.2.0 Reporter: Andrew Ray Priority: Minor see https://stackoverflow.com/questions/27037657/stop-intellij-idea-to-switch-java-language-level-every-time-the-pom-is-reloaded -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19552) Upgrade Netty version to 4.1.8 final
[ https://issues.apache.org/jira/browse/SPARK-19552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112879#comment-16112879 ] BDeus commented on SPARK-19552: --- I have the same problem with gRPC too, if we don't want upgrade to 4.1.x, can we at least discuss about the possibility to shade it? > Upgrade Netty version to 4.1.8 final > > > Key: SPARK-19552 > URL: https://issues.apache.org/jira/browse/SPARK-19552 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.1.0 >Reporter: Adam Roberts >Priority: Minor > > Netty 4.1.8 was recently released but isn't API compatible with previous > major versions (like Netty 4.0.x), see > http://netty.io/news/2017/01/30/4-0-44-Final-4-1-8-Final.html for details. > This version does include a fix for a security concern but not one we'd be > exposed to with Spark "out of the box". Let's upgrade the version we use to > be on the safe side as the security fix I'm especially interested in is not > available in the 4.0.x release line. > We should move up anyway to take on a bunch of other big fixes cited in the > release notes (and if anyone were to use Spark with netty and tcnative, they > shouldn't be exposed to the security problem) - we should be good citizens > and make this change. > As this 4.1 version involves API changes we'll need to implement a few > methods and possibly adjust the Sasl tests. This JIRA and associated pull > request starts the process which I'll work on - and any help would be much > appreciated! Currently I know: > {code} > @Override > public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise > promise) > throws Exception { > if (!foundEncryptionHandler) { > foundEncryptionHandler = > ctx.channel().pipeline().get(encryptHandlerName) != null; <-- this > returns false and causes test failures > } > ctx.write(msg, promise); > } > {code} > Here's what changes will be required (at least): > {code} > common/network-common/src/main/java/org/apache/spark/network/crypto/TransportCipher.java{code} > requires touch, retain and transferred methods > {code} > common/network-common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java{code} > requires the above methods too > {code}common/network-common/src/test/java/org/apache/spark/network/protocol/MessageWithHeaderSuite.java{code} > With "dummy" implementations so we can at least compile and test, we'll see > five new test failures to address. > These are > {code} > org.apache.spark.network.sasl.SparkSaslSuite.testFileRegionEncryption > org.apache.spark.network.sasl.SparkSaslSuite.testSaslEncryption > org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testEncryption > org.apache.spark.rpc.netty.NettyRpcEnvSuite.send with SASL encryption > org.apache.spark.rpc.netty.NettyRpcEnvSuite.ask with SASL encryption > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version
[ https://issues.apache.org/jira/browse/SPARK-21143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112873#comment-16112873 ] Basile Deustua commented on SPARK-21143: I have the exact same issue with io.grpc which heavily use netty 4.1.x. It's very disapointing that spark community won't upgrade the netty version or at least shade the 4.0.x in the jar lib to let the choice of the version we want use. Be constrained to remain at 4.0.x version by spark dependency is a bit frustrating. > Fail to fetch blocks >1MB in size in presence of conflicting Netty version > -- > > Key: SPARK-21143 > URL: https://issues.apache.org/jira/browse/SPARK-21143 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Ryan Williams >Priority: Minor > > One of my spark libraries inherited a transitive-dependency on Netty > 4.1.6.Final (vs. Spark's 4.0.42.Final), and I observed a strange failure I > wanted to document: fetches of blocks larger than 1MB (pre-compression, > afaict) seem to trigger a code path that results in {{AbstractMethodError}}'s > and ultimately stage failures. > I put a minimal repro in [this github > repo|https://github.com/ryan-williams/spark-bugs/tree/netty]: {{collect}} on > a 1-partition RDD with 1032 {{Array\[Byte\]}}'s of size 1000 works, but at > 1033 {{Array}}'s it dies in a confusing way. > Not sure what fixing/mitigating this in Spark would look like, other than > defensively shading+renaming netty. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21570) File __spark_libs__XXX.zip does not exist on networked file system w/ yarn
[ https://issues.apache.org/jira/browse/SPARK-21570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112868#comment-16112868 ] Albert Chu commented on SPARK-21570: There's no scheme. Just using "file://" to treat like a local file system. > File __spark_libs__XXX.zip does not exist on networked file system w/ yarn > -- > > Key: SPARK-21570 > URL: https://issues.apache.org/jira/browse/SPARK-21570 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 2.2.0 >Reporter: Albert Chu > > I have a set of scripts that run Spark with data in a networked file system. > One of my unit tests to make sure things don't break between Spark releases > is to simply run a word count (via org.apache.spark.examples.JavaWordCount) > on a file in the networked file system. This test broke with Spark 2.2.0 > when I use yarn to launch the job (using the spark standalone scheduler > things still work). I'm currently using Hadoop 2.7.0. I get the following > error: > {noformat} > Diagnostics: File > file:/p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip > does not exist > java.io.FileNotFoundException: File > file:/p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {noformat} > While debugging, I sat and watched the directory and did see that > /p/lcratery/achu/testing/rawnetworkfs/test/1181015/node-0/spark/node-0/spark-292938be-7ae3-460f-aca7-294083ebb790/__spark_libs__695301535722158702.zip > does show up at some point. > Wondering if it's possible something racy was introduced. Nothing in the > Spark 2.2.0 release notes suggests any type of configuration change that > needs to be done. > Thanks -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21627) hive compute stats for columns exception with column name camel case
[ https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-21627: -- Affects Version/s: (was: 3.0.0) 2.3.0 master = 2.3.0 right now > hive compute stats for columns exception with column name camel case > > > Key: SPARK-21627 > URL: https://issues.apache.org/jira/browse/SPARK-21627 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Bogdan Raducanu > > {code} > sql("create table tabel1(b int) partitioned by (partColumn int)") > sql("analyze table tabel1 compute statistics for columns partColumn, b") > {code} > {code} > java.util.NoSuchElementException: key not found: partColumn > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:59) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) > at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) > at > org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636) > ... 39 elided > {code} > Looks like regression introduced by https://github.com/apache/spark/pull/18248 > In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower > case column names. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21627) hive compute stats for columns exception with column name camel case
[ https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112851#comment-16112851 ] Bogdan Raducanu commented on SPARK-21627: - I expect it fails only in master branch. That's why it's 3.0 > hive compute stats for columns exception with column name camel case > > > Key: SPARK-21627 > URL: https://issues.apache.org/jira/browse/SPARK-21627 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Bogdan Raducanu > > {code} > sql("create table tabel1(b int) partitioned by (partColumn int)") > sql("analyze table tabel1 compute statistics for columns partColumn, b") > {code} > {code} > java.util.NoSuchElementException: key not found: partColumn > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:59) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) > at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) > at > org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636) > ... 39 elided > {code} > Looks like regression introduced by https://github.com/apache/spark/pull/18248 > In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower > case column names. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21627) hive compute stats for columns exception with column name camel case
[ https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112821#comment-16112821 ] Hyukjin Kwon commented on SPARK-21627: -- Would you mind fixing {{Affects Version/s:}}? I guess we don't have Spark 3.0.0 yet. > hive compute stats for columns exception with column name camel case > > > Key: SPARK-21627 > URL: https://issues.apache.org/jira/browse/SPARK-21627 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Bogdan Raducanu > > {code} > sql("create table tabel1(b int) partitioned by (partColumn int)") > sql("analyze table tabel1 compute statistics for columns partColumn, b") > {code} > {code} > java.util.NoSuchElementException: key not found: partColumn > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:59) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) > at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) > at > org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) > at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636) > ... 39 elided > {code} > Looks like regression introduced by https://github.com/apache/spark/pull/18248 > In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower > case column names. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21602) Add map_keys and map_values functions to R
[ https://issues.apache.org/jira/browse/SPARK-21602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-21602: Assignee: Hyukjin Kwon > Add map_keys and map_values functions to R > -- > > Key: SPARK-21602 > URL: https://issues.apache.org/jira/browse/SPARK-21602 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon > Fix For: 2.3.0 > > > We have {{map_keys}} and {{map_values}} functions in other language APIs. > It should nicer to have both in R API too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21602) Add map_keys and map_values functions to R
[ https://issues.apache.org/jira/browse/SPARK-21602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-21602. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18809 [https://github.com/apache/spark/pull/18809] > Add map_keys and map_values functions to R > -- > > Key: SPARK-21602 > URL: https://issues.apache.org/jira/browse/SPARK-21602 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > Fix For: 2.3.0 > > > We have {{map_keys}} and {{map_values}} functions in other language APIs. > It should nicer to have both in R API too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21627) hive compute stats for columns exception with column name camel case
[ https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated SPARK-21627: Description: {code} sql("create table tabel1(b int) partitioned by (partColumn int)") sql("analyze table tabel1 compute statistics for columns partColumn, b") {code} {code} java.util.NoSuchElementException: key not found: partColumn at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:59) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:59) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) at org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035) at org.apache.spark.sql.Dataset.(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636) ... 39 elided {code} Looks like regression introduced by https://github.com/apache/spark/pull/18248 In {{HiveExternalCatalog.alterTableState}} {{colNameTypeMap}} contains lower case column names. was: {code} sql("create table tabel1(b int) partitioned by (partColumn int)") sql("analyze table tabel1 compute statistics for columns partColumn, b") {code} {code} java.util.NoSuchElementException: key not found: partColumn at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:59) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:59) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) at org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) at
[jira] [Updated] (SPARK-21627) hive compute stats for columns exception with column name camel case
[ https://issues.apache.org/jira/browse/SPARK-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated SPARK-21627: Description: {code} sql("create table tabel1(b int) partitioned by (partColumn int)") sql("analyze table tabel1 compute statistics for columns partColumn, b") {code} {code} java.util.NoSuchElementException: key not found: partColumn at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:59) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:59) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) at org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035) at org.apache.spark.sql.Dataset.(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636) ... 39 elided {code} Looks like regression introduced by https://github.com/apache/spark/pull/18248 In {code}HiveExternalCatalog.alterTableStats{code} {code}colNameTypeMap{code} contains lower case column names. was: {code} sql("create table tabel1(b int) partitioned by (partColumn int)") sql("analyze table tabel1 compute statistics for columns partColumn, b") {code} {code} java.util.NoSuchElementException: key not found: partColumn at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:59) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:59) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) at org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) at
[jira] [Created] (SPARK-21627) hive compute stats for columns exception with column name camel case
Bogdan Raducanu created SPARK-21627: --- Summary: hive compute stats for columns exception with column name camel case Key: SPARK-21627 URL: https://issues.apache.org/jira/browse/SPARK-21627 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Bogdan Raducanu {code} sql("create table tabel1(b int) partitioned by (partColumn int)") sql("analyze table tabel1 compute statistics for columns partColumn, b") {code} {code} java.util.NoSuchElementException: key not found: partColumn at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:59) at scala.collection.MapLike$class.apply(MapLike.scala:141) at scala.collection.AbstractMap.apply(Map.scala:59) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:648) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1$$anonfun$apply$mcV$sp$3.apply(HiveExternalCatalog.scala:647) at scala.collection.immutable.Map$Map2.foreach(Map.scala:137) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply$mcV$sp(HiveExternalCatalog.scala:647) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$alterTableStats$1.apply(HiveExternalCatalog.scala:634) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:634) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:375) at org.apache.spark.sql.execution.command.AnalyzeColumnCommand.run(AnalyzeColumnCommand.scala:57) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:78) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:91) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:185) at org.apache.spark.sql.Dataset$$anonfun$47.apply(Dataset.scala:3036) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3035) at org.apache.spark.sql.Dataset.(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:70) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:636) ... 39 elided {code} Looks like regression introduced by https://github.com/apache/spark/pull/18248 in {code}HiveExternalCatalog.alterTable{code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21086) CrossValidator, TrainValidationSplit should preserve all models after fitting
[ https://issues.apache.org/jira/browse/SPARK-21086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112623#comment-16112623 ] Nick Pentreath commented on SPARK-21086: I just want to understand _why_ folks want to keep all the models? Is it actually the models (and model data) they want, or a way (well, easier "official API" way) to link the param permutations with the cross-val score to see what param combinations result in what scores? (In which case, https://issues.apache.org/jira/browse/SPARK-18704 is actually the solution). > CrossValidator, TrainValidationSplit should preserve all models after fitting > - > > Key: SPARK-21086 > URL: https://issues.apache.org/jira/browse/SPARK-21086 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley > > I've heard multiple requests for having CrossValidatorModel and > TrainValidationSplitModel preserve the full list of fitted models. This > sounds very valuable. > One decision should be made before we do this: Should we save and load the > models in ML persistence? That could blow up the size of a saved Pipeline if > the models are large. > * I suggest *not* saving the models by default but allowing saving if > specified. We could specify whether to save the model as an extra Param for > CrossValidatorModelWriter, but we would have to make sure to expose > CrossValidatorModelWriter as a public API and modify the return type of > CrossValidatorModel.write to be CrossValidatorModelWriter (but this will not > be a breaking change). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20922) Unsafe deserialization in Spark LauncherConnection
[ https://issues.apache.org/jira/browse/SPARK-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112602#comment-16112602 ] Sean Owen commented on SPARK-20922: --- If you'd email a suggested CVE description to priv...@spark.apache.org, we can go through the motions of reporting it as one. The ASF process is: https://www.apache.org/security/ https://www.apache.org/security/projects.html > Unsafe deserialization in Spark LauncherConnection > -- > > Key: SPARK-20922 > URL: https://issues.apache.org/jira/browse/SPARK-20922 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.1.1 >Reporter: Aditya Sharad >Assignee: Marcelo Vanzin > Labels: security > Fix For: 2.0.3, 2.1.2, 2.2.0, 2.3.0 > > Attachments: spark-deserialize-master.zip > > > The {{run()}} method of the class > {{org.apache.spark.launcher.LauncherConnection}} performs unsafe > deserialization of data received by its socket. This makes Spark applications > launched programmatically using the {{SparkLauncher}} framework potentially > vulnerable to remote code execution by an attacker with access to any user > account on the local machine. Such an attacker could send a malicious > serialized Java object to multiple ports on the local machine, and if this > port matches the one (randomly) chosen by the Spark launcher, the malicious > object will be deserialized. By making use of gadget chains in code present > on the Spark application classpath, the deserialization process can lead to > RCE or privilege escalation. > This vulnerability is identified by the “Unsafe deserialization” rule on > lgtm.com: > https://lgtm.com/projects/g/apache/spark/snapshot/80fdc2c9d1693f5b3402a79ca4ec76f6e422ff13/files/launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java#V58 > > Attached is a proof-of-concept exploit involving a simple > {{SparkLauncher}}-based application and a known gadget chain in the Apache > Commons Beanutils library referenced by Spark. > See the readme file for demonstration instructions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20922) Unsafe deserialization in Spark LauncherConnection
[ https://issues.apache.org/jira/browse/SPARK-20922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112600#comment-16112600 ] Aditya Sharad commented on SPARK-20922: --- Apologies for the delay in getting back to you. I believe we first got in touch privately to report this, but in future we'll discuss the details and fix on private@ first if that fits better into your workflow. The scope is indeed limited to attacks from local users and the issue is now publicly disclosed. However, I would argue neither of these points disqualifies the vulnerability reported here for the purposes of getting a CVE assigned. Depending on the configuration and the intentions of an attacker, the repercussions of this vulnerability are potentially extremely severe despite the limited scope: - The worst case is obviously when Spark runs as an administrative user. - In the more common case where Spark runs under a user account that is also responsible for other services (like Hadoop, HDFS), the repercussions can be very severe. This is the case in the default Cloudera setup, for example. In that particular scenario, an attacker can cause a widespread outage by simply wiping all data that belongs to the 'hdfs' user. The repercussions reach far beyond Spark itself. - In the 'best' case, Spark is set up to use a dedicated user account. Here we're looking at a DoS to Spark specifically, with a severe risk for data loss. An attacker can stop the service and wipe all of Spark's data. We have seen significantly less severe vulnerabilities for which a CVE is assigned. The prime reasons for doing so are to advise users and to maintain a visible record of the issue that isn't project-specific, which I think would be appropriate in this case. Please let me know if there's anything I can help with. I am willing to file separately for the CVE if that is easier, but I do not wish to do so without first having your agreement and finding out if Spark has a preferred CVE route. If you'd like to discuss this further off-list, please feel free to contact me on adi...@semmle.com. > Unsafe deserialization in Spark LauncherConnection > -- > > Key: SPARK-20922 > URL: https://issues.apache.org/jira/browse/SPARK-20922 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.1.1 >Reporter: Aditya Sharad >Assignee: Marcelo Vanzin > Labels: security > Fix For: 2.0.3, 2.1.2, 2.2.0, 2.3.0 > > Attachments: spark-deserialize-master.zip > > > The {{run()}} method of the class > {{org.apache.spark.launcher.LauncherConnection}} performs unsafe > deserialization of data received by its socket. This makes Spark applications > launched programmatically using the {{SparkLauncher}} framework potentially > vulnerable to remote code execution by an attacker with access to any user > account on the local machine. Such an attacker could send a malicious > serialized Java object to multiple ports on the local machine, and if this > port matches the one (randomly) chosen by the Spark launcher, the malicious > object will be deserialized. By making use of gadget chains in code present > on the Spark application classpath, the deserialization process can lead to > RCE or privilege escalation. > This vulnerability is identified by the “Unsafe deserialization” rule on > lgtm.com: > https://lgtm.com/projects/g/apache/spark/snapshot/80fdc2c9d1693f5b3402a79ca4ec76f6e422ff13/files/launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java#V58 > > Attached is a proof-of-concept exploit involving a simple > {{SparkLauncher}}-based application and a known gadget chain in the Apache > Commons Beanutils library referenced by Spark. > See the readme file for demonstration instructions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-21625) sqrt(negative number) should be null
[ https://issues.apache.org/jira/browse/SPARK-21625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panbingkun updated SPARK-21625: --- Comment: was deleted (was: case class Sqrt(child: Expression) extends UnaryMathExpression(math.sqrt, "SQRT") { protected override def nullSafeEval(input: Any): Any = { if (input.asInstanceOf[Double] < 0) { null } else { f(input.asInstanceOf[Double]) } } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { s""" if ($c < 0) { ${ev.isNull} = true; } else { ${ev.value} = java.lang.Math.sqrt($c); } """ }) } }) > sqrt(negative number) should be null > > > Key: SPARK-21625 > URL: https://issues.apache.org/jira/browse/SPARK-21625 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yuming Wang > > Both Hive and MySQL are null: > {code:sql} > hive> select SQRT(-10.0); > OK > NULL > Time taken: 0.384 seconds, Fetched: 1 row(s) > {code} > {code:sql} > mysql> select sqrt(-10.0); > +---+ > | sqrt(-10.0) | > +---+ > | NULL | > +---+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21625) sqrt(negative number) should be null
[ https://issues.apache.org/jira/browse/SPARK-21625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112571#comment-16112571 ] panbingkun commented on SPARK-21625: case class Sqrt(child: Expression) extends UnaryMathExpression(math.sqrt, "SQRT") { protected override def nullSafeEval(input: Any): Any = { if (input.asInstanceOf[Double] < 0) { null } else { f(input.asInstanceOf[Double]) } } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { nullSafeCodeGen(ctx, ev, c => { s""" if ($c < 0) { ${ev.isNull} = true; } else { ${ev.value} = java.lang.Math.sqrt($c); } """ }) } } > sqrt(negative number) should be null > > > Key: SPARK-21625 > URL: https://issues.apache.org/jira/browse/SPARK-21625 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Yuming Wang > > Both Hive and MySQL are null: > {code:sql} > hive> select SQRT(-10.0); > OK > NULL > Time taken: 0.384 seconds, Fetched: 1 row(s) > {code} > {code:sql} > mysql> select sqrt(-10.0); > +---+ > | sqrt(-10.0) | > +---+ > | NULL | > +---+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21605) Let IntelliJ IDEA correctly detect Language level and Target byte code version
[ https://issues.apache.org/jira/browse/SPARK-21605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21605: - Assignee: Chang chen > Let IntelliJ IDEA correctly detect Language level and Target byte code version > -- > > Key: SPARK-21605 > URL: https://issues.apache.org/jira/browse/SPARK-21605 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.3.0 >Reporter: Chang chen >Assignee: Chang chen >Priority: Minor > Labels: IDE, maven > Fix For: 2.3.0 > > > With SPARK-21592, removing source and target properties from > maven-compiler-plugin lets IntelliJ IDEA use default Language level and > Target byte code version which are 1.4. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21605) Let IntelliJ IDEA correctly detect Language level and Target byte code version
[ https://issues.apache.org/jira/browse/SPARK-21605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21605. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18808 [https://github.com/apache/spark/pull/18808] > Let IntelliJ IDEA correctly detect Language level and Target byte code version > -- > > Key: SPARK-21605 > URL: https://issues.apache.org/jira/browse/SPARK-21605 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.3.0 >Reporter: Chang chen >Priority: Minor > Labels: IDE, maven > Fix For: 2.3.0 > > > With SPARK-21592, removing source and target properties from > maven-compiler-plugin lets IntelliJ IDEA use default Language level and > Target byte code version which are 1.4. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21626) "WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"
[ https://issues.apache.org/jira/browse/SPARK-21626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112564#comment-16112564 ] Gu Chao commented on SPARK-21626: - [~srowen] I can solve this problem, but I do not know why. {code:shell} export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH {code} > "WARN NativeCodeLoader: Unable to load native-hadoop library for your > platform... using builtin-java classes where applicable" > -- > > Key: SPARK-21626 > URL: https://issues.apache.org/jira/browse/SPARK-21626 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.2.0 >Reporter: Gu Chao > > After starting spark-shell, It output: > 17/08/03 18:24:16 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21626) "WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"
[ https://issues.apache.org/jira/browse/SPARK-21626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112564#comment-16112564 ] Gu Chao edited comment on SPARK-21626 at 8/3/17 10:57 AM: -- [~srowen] I can solve this problem, but I do not know why. {code:none} export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH {code} was (Author: gu chao): [~srowen] I can solve this problem, but I do not know why. {code:shell} export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/:$LD_LIBRARY_PATH {code} > "WARN NativeCodeLoader: Unable to load native-hadoop library for your > platform... using builtin-java classes where applicable" > -- > > Key: SPARK-21626 > URL: https://issues.apache.org/jira/browse/SPARK-21626 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.2.0 >Reporter: Gu Chao > > After starting spark-shell, It output: > 17/08/03 18:24:16 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21626) "WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"
[ https://issues.apache.org/jira/browse/SPARK-21626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21626. --- Resolution: Not A Problem Not a problem, not even specific to Spark. It means what it says, and is not an error. Search the internet. > "WARN NativeCodeLoader: Unable to load native-hadoop library for your > platform... using builtin-java classes where applicable" > -- > > Key: SPARK-21626 > URL: https://issues.apache.org/jira/browse/SPARK-21626 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.2.0 >Reporter: Gu Chao > > After starting spark-shell, It output: > 17/08/03 18:24:16 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21626) "WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"
Gu Chao created SPARK-21626: --- Summary: "WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable" Key: SPARK-21626 URL: https://issues.apache.org/jira/browse/SPARK-21626 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 2.2.0 Reporter: Gu Chao After starting spark-shell, It output: 17/08/03 18:24:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21618) http(s) not accepted in spark-submit jar uri
[ https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112529#comment-16112529 ] Steve Loughran edited comment on SPARK-21618 at 8/3/17 10:09 AM: - If you're relying on hadoop-common to provide the FS connection, no, not yet, and's not something I'm in a rush to backport, given it's unexpected consequences. Once I'm happy it could go into 2.8.x, but I think it'd need more explicit spark tests for that —something to bring up Jetty and serve over HTTPS, perhaps. Actually, maybe a test could just use a JAR off maven central...the JAR classes don't actually need to be executed, and for security reasons you wouldn't (the artifact wouldn't have its checksums/signatures verified, after all). was (Author: ste...@apache.org): If you're relying on hadoop-common to provide the FS connection, no, not yet > http(s) not accepted in spark-submit jar uri > > > Key: SPARK-21618 > URL: https://issues.apache.org/jira/browse/SPARK-21618 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1, 2.2.0 > Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu > 16.04. >Reporter: Ben Mayne >Priority: Minor > Labels: documentation > > The documentation suggests I should be able to use an http(s) uri for a jar > in spark-submit, but I haven't been successful > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management > {noformat} > benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master > local[2] --class class.name.Test https://test.com/path/to/jar.jar > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > https > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at > org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > benmayne@Benjamins-MacBook-Pro ~ $ > {noformat} > If I replace the path with a valid hdfs path > (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the > same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 > on ubuntu. > this is the example that I'm trying to replicate from > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management: > > > Spark uses the following URL scheme to allow different strategies for > > disseminating jars: > > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file > > server, and every executor pulls the file from the driver HTTP server. > > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as > > expected > {noformat} > # Run on a Mesos cluster in cluster deploy mode with supervise > ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master mesos://207.184.161.138:7077 \ > --deploy-mode cluster \ > --supervise \ > --executor-memory 20G \ > --total-executor-cores 100 \ > http://path/to/examples.jar \ > 1000 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-21611) Error class name for log in several classes.
[ https://issues.apache.org/jira/browse/SPARK-21611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21611. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18816 [https://github.com/apache/spark/pull/18816] > Error class name for log in several classes. > > > Key: SPARK-21611 > URL: https://issues.apache.org/jira/browse/SPARK-21611 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: zuotingbing >Assignee: zuotingbing >Priority: Trivial > Fix For: 2.3.0 > > > Error class name for log in several classes. such as: > 2017-08-02 16:43:37,695 INFO CompositeService: Operation log root directory > is created: /tmp/mr/operation_logs > "Operation log root directory is created" is in SessionManager.java actually -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21618) http(s) not accepted in spark-submit jar uri
[ https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112529#comment-16112529 ] Steve Loughran commented on SPARK-21618: If you're relying on hadoop-common to provide the FS connection, no, not yet > http(s) not accepted in spark-submit jar uri > > > Key: SPARK-21618 > URL: https://issues.apache.org/jira/browse/SPARK-21618 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1, 2.2.0 > Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu > 16.04. >Reporter: Ben Mayne >Priority: Minor > Labels: documentation > > The documentation suggests I should be able to use an http(s) uri for a jar > in spark-submit, but I haven't been successful > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management > {noformat} > benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master > local[2] --class class.name.Test https://test.com/path/to/jar.jar > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > https > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at > org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > benmayne@Benjamins-MacBook-Pro ~ $ > {noformat} > If I replace the path with a valid hdfs path > (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the > same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 > on ubuntu. > this is the example that I'm trying to replicate from > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management: > > > Spark uses the following URL scheme to allow different strategies for > > disseminating jars: > > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file > > server, and every executor pulls the file from the driver HTTP server. > > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as > > expected > {noformat} > # Run on a Mesos cluster in cluster deploy mode with supervise > ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master mesos://207.184.161.138:7077 \ > --deploy-mode cluster \ > --supervise \ > --executor-memory 20G \ > --total-executor-cores 100 \ > http://path/to/examples.jar \ > 1000 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21618) http(s) not accepted in spark-submit jar uri
[ https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112524#comment-16112524 ] Sean Owen commented on SPARK-21618: --- I see, so this may really not work in general. At least we'd update the Spark docs then. > http(s) not accepted in spark-submit jar uri > > > Key: SPARK-21618 > URL: https://issues.apache.org/jira/browse/SPARK-21618 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1, 2.2.0 > Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu > 16.04. >Reporter: Ben Mayne >Priority: Minor > Labels: documentation > > The documentation suggests I should be able to use an http(s) uri for a jar > in spark-submit, but I haven't been successful > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management > {noformat} > benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master > local[2] --class class.name.Test https://test.com/path/to/jar.jar > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > https > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at > org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > benmayne@Benjamins-MacBook-Pro ~ $ > {noformat} > If I replace the path with a valid hdfs path > (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the > same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 > on ubuntu. > this is the example that I'm trying to replicate from > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management: > > > Spark uses the following URL scheme to allow different strategies for > > disseminating jars: > > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file > > server, and every executor pulls the file from the driver HTTP server. > > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as > > expected > {noformat} > # Run on a Mesos cluster in cluster deploy mode with supervise > ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master mesos://207.184.161.138:7077 \ > --deploy-mode cluster \ > --supervise \ > --executor-memory 20G \ > --total-executor-cores 100 \ > http://path/to/examples.jar \ > 1000 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21618) http(s) not accepted in spark-submit jar uri
[ https://issues.apache.org/jira/browse/SPARK-21618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112520#comment-16112520 ] Steve Loughran commented on SPARK-21618: BTW, we haven't backported HADOOP-14383 into HDP; don't know about CDH (check with [~jzhuge]?), and I'm assuming EMR doesn't have it either, as S3 is their distribution mechanism > http(s) not accepted in spark-submit jar uri > > > Key: SPARK-21618 > URL: https://issues.apache.org/jira/browse/SPARK-21618 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.1.1, 2.2.0 > Environment: pre-built for hadoop 2.6 and 2.7 on mac and ubuntu > 16.04. >Reporter: Ben Mayne >Priority: Minor > Labels: documentation > > The documentation suggests I should be able to use an http(s) uri for a jar > in spark-submit, but I haven't been successful > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management > {noformat} > benmayne@Benjamins-MacBook-Pro ~ $ spark-submit --deploy-mode client --master > local[2] --class class.name.Test https://test.com/path/to/jar.jar > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > https > at > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) > at > org.apache.spark.deploy.SparkSubmit$.downloadFile(SparkSubmit.scala:865) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at > org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$1.apply(SparkSubmit.scala:316) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:316) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > benmayne@Benjamins-MacBook-Pro ~ $ > {noformat} > If I replace the path with a valid hdfs path > (hdfs:///user/benmayne/valid-jar.jar), it works as expected. I've seen the > same behavior across 2.2.0 (hadoop 2.6 & 2.7 on mac and ubuntu) and on 2.1.1 > on ubuntu. > this is the example that I'm trying to replicate from > https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management: > > > Spark uses the following URL scheme to allow different strategies for > > disseminating jars: > > file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file > > server, and every executor pulls the file from the driver HTTP server. > > hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as > > expected > {noformat} > # Run on a Mesos cluster in cluster deploy mode with supervise > ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master mesos://207.184.161.138:7077 \ > --deploy-mode cluster \ > --supervise \ > --executor-memory 20G \ > --total-executor-cores 100 \ > http://path/to/examples.jar \ > 1000 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org