[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs
[ https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665908#comment-16665908 ] Felix Cheung commented on SPARK-12172: -- sounds good > Consider removing SparkR internal RDD APIs > -- > > Key: SPARK-12172 > URL: https://issues.apache.org/jira/browse/SPARK-12172 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Felix Cheung >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25828) Bumping Version of kubernetes.client to latest version
[ https://issues.apache.org/jira/browse/SPARK-25828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Erlandson reassigned SPARK-25828: -- Assignee: Ilan Filonenko > Bumping Version of kubernetes.client to latest version > -- > > Key: SPARK-25828 > URL: https://issues.apache.org/jira/browse/SPARK-25828 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Ilan Filonenko >Assignee: Ilan Filonenko >Priority: Minor > Fix For: 3.0.0 > > > Upgrade the Kubernetes client version to at least > [4.0.0|https://mvnrepository.com/artifact/io.fabric8/kubernetes-client/4.0.0] > as we are falling behind on fabric8 updates. This will be an update to both > kubernetes/core and kubernetes/integration-tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25828) Bumping Version of kubernetes.client to latest version
[ https://issues.apache.org/jira/browse/SPARK-25828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Erlandson resolved SPARK-25828. Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22820 [https://github.com/apache/spark/pull/22820] > Bumping Version of kubernetes.client to latest version > -- > > Key: SPARK-25828 > URL: https://issues.apache.org/jira/browse/SPARK-25828 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Ilan Filonenko >Assignee: Ilan Filonenko >Priority: Minor > Fix For: 3.0.0 > > > Upgrade the Kubernetes client version to at least > [4.0.0|https://mvnrepository.com/artifact/io.fabric8/kubernetes-client/4.0.0] > as we are falling behind on fabric8 updates. This will be an update to both > kubernetes/core and kubernetes/integration-tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25858) Passing Field Metadata to Parquet
[ https://issues.apache.org/jira/browse/SPARK-25858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang updated SPARK-25858: Description: h1. Problem Statement The Spark WriteSupport class for Parquet is hardcoded to use org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which is not configurable. Currently, this class doesn’t carry over the field metadata in StructType to MessageType. However, Parquet column encryption (Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of Parquet, so that the metadata can be used to control column encryption. h1. Technical Solution # Extend SparkToParquetSchemaConverter class and override convert() method to add the functionality of carrying over the field metadata # Extend ParquetWriteSupport and use the extended converter in #1. The extension avoids changing the built-in WriteSupport to mitigate the risk. # Change Spark code to make the WriteSupport class configurable to let the user configure to use the extended WriteSupport in #2. The default WriteSupport is still org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. h1. Technical Details {{Note: The code below kind of in messy format. The link below shows correct format. }} h2. Extend SparkToParquetSchemaConverter class *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter { *override* def convert(catalystSchema: StructType): MessageType = { Types ._buildMessage_() .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*) .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_) } private def *convertFieldWithMetadata*(field: StructField) : Type = { val extField = new ExtType[Any](convertField(field)) val metaBuilder = new MetadataBuilder().withMetadata(field.metadata) val metaData = metaBuilder.getMap extField.setMetadata(metaData) return extField } } h2. Extend ParquetWriteSupport class CryptoParquetWriteSupport extends ParquetWriteSupport { *override* def init(configuration: Configuration): WriteContext = { val converter = new *SparkToParquetMetadataSchemaConverter*(configuration) createContext(configuration, converter) } } h2. Make WriteSupport configurable class ParquetFileFormat{ ** override def prepareWrite(...) { … *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) {* ParquetOutputFormat._setWriteSupportClass_(job, _classOf_[ParquetWriteSupport]) ** ... } } h1. Verification The [ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java] in the github repository [parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions] has a sample verification of passing down the field metadata and perform column encryption. h1. Dependency * Parquet-1178 * Parquet-1396 * Parquet-1397 was: h1. Problem Statement The Spark WriteSupport class for Parquet is hardcoded to use org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which is not configurable. Currently, this class doesn’t carry over the field metadata in StructType to MessageType. However, Parquet column encryption (Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of Parquet, so that the metadata can be used to control column encryption. h1. Technical Solution # Extend SparkToParquetSchemaConverter class and override convert() method to add the functionality of carrying over the field metadata # Extend ParquetWriteSupport and use the extended converter in #1. The extension avoids changing the built-in WriteSupport to mitigate the risk. # Change Spark code to make the WriteSupport class configurable to let the user configure to use the extended WriteSupport in #2. The default WriteSupport is still org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. h1. Technical Details h2. Extend SparkToParquetSchemaConverter class *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter { *override* def convert(catalystSchema: StructType): MessageType = { Types ._buildMessage_() .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*) .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_) } private def *convertFieldWithMetadata*(field: StructField) : Type = { val extField = new ExtType[Any](convertField(field)) val metaBuilder = new MetadataBuilder().withMetadata(field.metadata) val
[jira] [Updated] (SPARK-25858) Passing Field Metadata to Parquet
[ https://issues.apache.org/jira/browse/SPARK-25858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang updated SPARK-25858: Description: h1. Problem Statement The Spark WriteSupport class for Parquet is hardcoded to use org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which is not configurable. Currently, this class doesn’t carry over the field metadata in StructType to MessageType. However, Parquet column encryption (Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of Parquet, so that the metadata can be used to control column encryption. h1. Technical Solution # Extend SparkToParquetSchemaConverter class and override convert() method to add the functionality of carrying over the field metadata # Extend ParquetWriteSupport and use the extended converter in #1. The extension avoids changing the built-in WriteSupport to mitigate the risk. # Change Spark code to make the WriteSupport class configurable to let the user configure to use the extended WriteSupport in #2. The default WriteSupport is still org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. h1. Technical Details h2. Extend SparkToParquetSchemaConverter class *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter { *override* def convert(catalystSchema: StructType): MessageType = { Types ._buildMessage_() .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*) .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_) } private def *convertFieldWithMetadata*(field: StructField) : Type = { val extField = new ExtType[Any](convertField(field)) val metaBuilder = new MetadataBuilder().withMetadata(field.metadata) val metaData = metaBuilder.getMap extField.setMetadata(metaData) return extField } } h2. Extend ParquetWriteSupport class CryptoParquetWriteSupport extends ParquetWriteSupport { *override* def init(configuration: Configuration): WriteContext = { val converter = new *SparkToParquetMetadataSchemaConverter*(configuration) createContext(configuration, converter) } } h2. Make WriteSupport configurable class ParquetFileFormat{ ** override def prepareWrite(...) { … *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) {* ParquetOutputFormat._setWriteSupportClass_(job, _classOf_[ParquetWriteSupport]) ** ... } } h1. Verification The [ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java] in the github repository [parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions] has a sample verification of passing down the field metadata and perform column encryption. h1. Dependency * Parquet-1178 * Parquet-1396 * Parquet-1397 was: h1. Problem Statement The Spark WriteSupport class for Parquet is hardcoded to use org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which is not configurable. Currently, this class doesn’t carry over the field metadata in StructType to MessageType. However, Parquet column encryption (Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of Parquet, so that the metadata can be used to control column encryption. h1. Technical Solution # Extend SparkToParquetSchemaConverter class and override convert() method to add the functionality of carrying over the field metadata # Extend ParquetWriteSupport and use the extended converter in #1. The extension avoids changing the built-in WriteSupport to mitigate the risk. # Change Spark code to make the WriteSupport class configurable to let the user configure to use the extended WriteSupport in #2. The default WriteSupport is still org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. h1. Technical Details h2. Extend SparkToParquetSchemaConverter class *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter { *override* def convert(catalystSchema: StructType): MessageType = { Types ._buildMessage_() .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*) .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_) } private def *convertFieldWithMetadata*(field: StructField) : Type = { val extField = new ExtType[Any](convertField(field)) val metaBuilder = new MetadataBuilder().withMetadata(field.metadata) val metaData = metaBuilder.getMap extField.setMetadata(metaData) return extField
[jira] [Updated] (SPARK-25858) Passing Field Metadata to Parquet
[ https://issues.apache.org/jira/browse/SPARK-25858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang updated SPARK-25858: Description: h1. Problem Statement The Spark WriteSupport class for Parquet is hardcoded to use org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which is not configurable. Currently, this class doesn’t carry over the field metadata in StructType to MessageType. However, Parquet column encryption (Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of Parquet, so that the metadata can be used to control column encryption. h1. Technical Solution # Extend SparkToParquetSchemaConverter class and override convert() method to add the functionality of carrying over the field metadata # Extend ParquetWriteSupport and use the extended converter in #1. The extension avoids changing the built-in WriteSupport to mitigate the risk. # Change Spark code to make the WriteSupport class configurable to let the user configure to use the extended WriteSupport in #2. The default WriteSupport is still org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. h1. Technical Details h2. Extend SparkToParquetSchemaConverter class *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter { *override* def convert(catalystSchema: StructType): MessageType = { Types ._buildMessage_() .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*) .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_) } private def *convertFieldWithMetadata*(field: StructField) : Type = { val extField = new ExtType[Any](convertField(field)) val metaBuilder = new MetadataBuilder().withMetadata(field.metadata) val metaData = metaBuilder.getMap extField.setMetadata(metaData) return extField } } h2. Extend ParquetWriteSupport class CryptoParquetWriteSupport extends ParquetWriteSupport { *override* def init(configuration: Configuration): WriteContext = { val converter = new *SparkToParquetMetadataSchemaConverter*(configuration) createContext(configuration, converter) } } h2. Make WriteSupport configurable class ParquetFileFormat{ ** override def prepareWrite(...) { … *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) {* ParquetOutputFormat._setWriteSupportClass_(job, _classOf_[ParquetWriteSupport]) ** ... } } h1. Verification The [ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java] in the github repository [parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions] has a sample verification of passing down the field metadata and perform column encryption. h1. Dependency * Parquet-1178 * Parquet-1396 * Parquet-1397 was: h1. Problem Statement The Spark WriteSupport class for Parquet is hardcoded to use org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which is not configurable. Currently, this class doesn’t carry over the field metadata in StructType to MessageType. However, Parquet column encryption (Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of Parquet, so that the metadata can be used to control column encryption. h1. Technical Solution # Extend SparkToParquetSchemaConverter class and override convert() method to add the functionality of carrying over the field metadata # Extend ParquetWriteSupport and use the extended converter in #1. The extension avoids changing the built-in WriteSupport to mitigate the risk. # Change Spark code to make the WriteSupport class configurable to let the user configure to use the extended WriteSupport in #2. The default WriteSupport is still org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. h1. Technical Details h2. Extend SparkToParquetSchemaConverter class *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter { *override* def convert(catalystSchema: StructType): MessageType = { Types ._buildMessage_() .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*) .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_) } private def *convertFieldWithMetadata*(field: StructField) : Type = { val extField = new ExtType[Any](convertField(field)) val metaBuilder = new MetadataBuilder().withMetadata(field.metadata) val metaData = metaBuilder.getMap extField.setMetadata(metaData) return extField } } h2. Extend ParquetWriteSupport class CryptoParquetWriteSupport extends ParquetWriteSupport { *override*
[jira] [Created] (SPARK-25858) Passing Field Metadata to Parquet
Xinli Shang created SPARK-25858: --- Summary: Passing Field Metadata to Parquet Key: SPARK-25858 URL: https://issues.apache.org/jira/browse/SPARK-25858 Project: Spark Issue Type: New Feature Components: Input/Output Affects Versions: 2.3.2 Reporter: Xinli Shang h1. Problem Statement The Spark WriteSupport class for Parquet is hardcoded to use org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which is not configurable. Currently, this class doesn’t carry over the field metadata in StructType to MessageType. However, Parquet column encryption (Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of Parquet, so that the metadata can be used to control column encryption. h1. Technical Solution # Extend SparkToParquetSchemaConverter class and override convert() method to add the functionality of carrying over the field metadata # Extend ParquetWriteSupport and use the extended converter in #1. The extension avoids changing the built-in WriteSupport to mitigate the risk. # Change Spark code to make the WriteSupport class configurable to let the user configure to use the extended WriteSupport in #2. The default WriteSupport is still org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. h1. Technical Details h2. Extend SparkToParquetSchemaConverter class *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter { *override* def convert(catalystSchema: StructType): MessageType = { Types ._buildMessage_() .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*) .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_) } private def *convertFieldWithMetadata*(field: StructField) : Type = { val extField = new ExtType[Any](convertField(field)) val metaBuilder = new MetadataBuilder().withMetadata(field.metadata) val metaData = metaBuilder.getMap extField.setMetadata(metaData) return extField } } h2. Extend ParquetWriteSupport class CryptoParquetWriteSupport extends ParquetWriteSupport { *override* def init(configuration: Configuration): WriteContext = { val converter = new *SparkToParquetMetadataSchemaConverter*(configuration) createContext(configuration, converter) } } h2. Make WriteSupport configurable class ParquetFileFormat{ ** override def prepareWrite(...) { … *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) {* ParquetOutputFormat._setWriteSupportClass_(job, _classOf_[ParquetWriteSupport]) ** ... } } h1. Verification The [ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java] in the github repository [parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions] has a sample verification of passing down the field metadata and perform column encryption. h1. Dependency * Parquet-1178 * Parquet-1396 * Parquet-1397 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25857) Document delegation token code in Spark
Marcelo Vanzin created SPARK-25857: -- Summary: Document delegation token code in Spark Key: SPARK-25857 URL: https://issues.apache.org/jira/browse/SPARK-25857 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Marcelo Vanzin By this I mean not user documentation, but documenting the functionality provided in the {{org.apache.spark.deploy.security}} and related packages, so that other developers making changes there can refer to it. It seems to be a source of confusion every time somebody needs touch that code, so it would be good to have a document explaining how it all works, including how it's hooked up to different resource managers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25821) Remove SQLContext methods deprecated as of Spark 1.4
[ https://issues.apache.org/jira/browse/SPARK-25821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25821. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22815 [https://github.com/apache/spark/pull/22815] > Remove SQLContext methods deprecated as of Spark 1.4 > > > Key: SPARK-25821 > URL: https://issues.apache.org/jira/browse/SPARK-25821 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Major > Fix For: 3.0.0 > > > There are several SQLContext methods that have been deprecate since Spark > 1.4, like: > {code:java} > @deprecated("Use read.parquet() instead.", "1.4.0") > @scala.annotation.varargs > def parquetFile(paths: String*): DataFrame = { > if (paths.isEmpty) { > emptyDataFrame > } else { > read.parquet(paths : _*) > } > }{code} > Let's remove them in Spark 3. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25851) Fix deprecated API warning in SQLListener
[ https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25851. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22848 [https://github.com/apache/spark/pull/22848] > Fix deprecated API warning in SQLListener > - > > Key: SPARK-25851 > URL: https://issues.apache.org/jira/browse/SPARK-25851 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Trivial > Fix For: 3.0.0 > > > In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6. > There are some deprecated API warnings in SQLListener. > Create a trivial PR to fix them. > ``` > [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) > [warn] > [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(objectType, objectType)) > [warn] > [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long]) > [warn] > [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(longType, longType)) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25851) Fix deprecated API warning in SQLListener
[ https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-25851: - Assignee: Gengliang Wang > Fix deprecated API warning in SQLListener > - > > Key: SPARK-25851 > URL: https://issues.apache.org/jira/browse/SPARK-25851 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Trivial > Fix For: 3.0.0 > > > In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6. > There are some deprecated API warnings in SQLListener. > Create a trivial PR to fix them. > ``` > [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) > [warn] > [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(objectType, objectType)) > [warn] > [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long]) > [warn] > [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(longType, longType)) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail
[ https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25854. --- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 2.3.3 2.2.3 Issue resolved by pull request 22854 [https://github.com/apache/spark/pull/22854] > mvn helper script always exits w/1, causing mvn builds to fail > -- > > Key: SPARK-25854 > URL: https://issues.apache.org/jira/browse/SPARK-25854 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.2, 2.3.2, 2.4.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Critical > Fix For: 2.2.3, 2.3.3, 3.0.0, 2.4.0 > > > the final line in the mvn helper script in build/ attempts to shut down the > zinc server. due to the zinc server being set up w/a 30min timeout, by the > time the mvn test instantiation finishes, the server times out. > this means that when the mvn script tries to shut down zinc, it returns w/an > exit code of 1. this will then automatically fail the entire build (even if > the build passes). > i propose the following: > 1) up the timeout to 3h > 2) put some logic at the end of the script to better handle killing the zinc > server > PR coming now. > [~srowen] [~cloud_fan] [~joshrosen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25855) Don't use Erasure Coding for event log files
[ https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665690#comment-16665690 ] Thomas Graves commented on SPARK-25855: --- it seems like it depends on whether you care to see the event logs before its finished. If you are using the driver UI then generally people would use it while its running and once its finished it sounds like it would show up and you could see from history server. So probably not a problem there. But if you are using history server to view all UI's and expect logs to be there, it would be a big problem. So it does sound like its better off by default as to not confuse users. Were you going to make it configurable? > Don't use Erasure Coding for event log files > > > Key: SPARK-25855 > URL: https://issues.apache.org/jira/browse/SPARK-25855 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > > While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a > bug with the event logs. The main issue was a bug in hdfs (HDFS-14027), but > it did make us wonder whether Spark should be using EC for event log files in > general. Its a poor choice because EC currently implements {{hflush()}} or > {{hsync()}} as no-ops, which mean you won't see anything in your event logs > until the app is complete. That isn't necessarily a bug, but isn't really > great. So I think we should ensure EC is always off for event logs. > IIUC there is *not* a problem with applications which die without properly > closing the outputstream. It'll take a while for the NN to realize the > client is gone and finish the block, but the data should get there eventually. > Also related are SPARK-24787 & SPARK-19531. > The space savings from EC would be nice as the event logs can get somewhat > large, but I think other factors outweigh this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25856) Remove AverageLike and CountLike classes.
[ https://issues.apache.org/jira/browse/SPARK-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25856: Assignee: (was: Apache Spark) > Remove AverageLike and CountLike classes. > - > > Key: SPARK-25856 > URL: https://issues.apache.org/jira/browse/SPARK-25856 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Dilip Biswal >Priority: Minor > > These two classes were added for regr_ expression support ( > SPARK-23907). These have been removed and hence we can remove these base > classes and inline the logic in the concrete classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25856) Remove AverageLike and CountLike classes.
[ https://issues.apache.org/jira/browse/SPARK-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665649#comment-16665649 ] Apache Spark commented on SPARK-25856: -- User 'dilipbiswal' has created a pull request for this issue: https://github.com/apache/spark/pull/22856 > Remove AverageLike and CountLike classes. > - > > Key: SPARK-25856 > URL: https://issues.apache.org/jira/browse/SPARK-25856 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Dilip Biswal >Priority: Minor > > These two classes were added for regr_ expression support ( > SPARK-23907). These have been removed and hence we can remove these base > classes and inline the logic in the concrete classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25856) Remove AverageLike and CountLike classes.
[ https://issues.apache.org/jira/browse/SPARK-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25856: Assignee: Apache Spark > Remove AverageLike and CountLike classes. > - > > Key: SPARK-25856 > URL: https://issues.apache.org/jira/browse/SPARK-25856 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.1 >Reporter: Dilip Biswal >Assignee: Apache Spark >Priority: Minor > > These two classes were added for regr_ expression support ( > SPARK-23907). These have been removed and hence we can remove these base > classes and inline the logic in the concrete classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25856) Remove AverageLike and CountLike classes.
Dilip Biswal created SPARK-25856: Summary: Remove AverageLike and CountLike classes. Key: SPARK-25856 URL: https://issues.apache.org/jira/browse/SPARK-25856 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.1 Reporter: Dilip Biswal These two classes were added for regr_ expression support ( SPARK-23907). These have been removed and hence we can remove these base classes and inline the logic in the concrete classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25816) Functions does not resolve Columns correctly
[ https://issues.apache.org/jira/browse/SPARK-25816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Zhang updated SPARK-25816: Attachment: final_allDatatypes_Spark.avro > Functions does not resolve Columns correctly > > > Key: SPARK-25816 > URL: https://issues.apache.org/jira/browse/SPARK-25816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1 >Reporter: Brian Zhang >Priority: Critical > Attachments: final_allDatatypes_Spark.avro, source.snappy.parquet > > > When there is a duplicate column name in the current Dataframe and orginal > Dataframe where current df is selected from, Spark in 2.3.0 and 2.3.1 does > not resolve the column correctly when using it in the expression, hence > causing casting issue. The same code is working in Spark 2.2.1 > Please see below code to reproduce the issue > import org.apache.spark._ > import org.apache.spark.rdd._ > import org.apache.spark.storage.StorageLevel._ > import org.apache.spark.sql._ > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.catalyst.expressions._ > import org.apache.spark.sql.Column > val v0 = spark.read.parquet("/data/home/bzinfa/bz/source.snappy.parquet") > val v00 = v0.toDF(v0.schema.fields.indices.view.map("" + _):_*) > val v5 = v00.select($"13".as("0"),$"14".as("1"),$"15".as("2")) > val v5_2 = $"2" > v5.where(lit(500).<(v5_2(new Column(new MapKeys(v5_2.expr))(lit(0) > //v00's 3rdcolumn is binary and 16th is map > Error: > org.apache.spark.sql.AnalysisException: cannot resolve 'map_keys(`2`)' due to > data type mismatch: argument 1 requires map type, however, '`2`' is of binary > type.; > > 'Project [0#1591, 1#1592, 2#1593] +- 'Filter (500 < > {color:#FF}2#1593{color}[map_keys({color:#FF}2#1561{color})[0]]) +- > Project [13#1572 AS 0#1591, 14#1573 AS 1#1592, 15#1574 AS 2#1593, 2#1561] +- > Project [c_bytes#1527 AS 0#1559, c_union#1528 AS 1#1560, c_fixed#1529 AS > 2#1561, c_boolean#1530 AS 3#1562, c_float#1531 AS 4#1563, c_double#1532 AS > 5#1564, c_int#1533 AS 6#1565, c_long#1534L AS 7#1566L, c_string#1535 AS > 8#1567, c_decimal_18_2#1536 AS 9#1568, c_decimal_28_2#1537 AS 10#1569, > c_decimal_38_2#1538 AS 11#1570, c_date#1539 AS 12#1571, simple_struct#1540 AS > 13#1572, simple_array#1541 AS 14#1573, simple_map#1542 AS 15#1574] +- > Relation[c_bytes#1527,c_union#1528,c_fixed#1529,c_boolean#1530,c_float#1531,c_double#1532,c_int#1533,c_long#1534L,c_string#1535,c_decimal_18_2#1536,c_decimal_28_2#1537,c_decimal_38_2#1538,c_date#1539,simple_struct#1540,simple_array#1541,simple_map#1542] > parquet -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25816) Functions does not resolve Columns correctly
[ https://issues.apache.org/jira/browse/SPARK-25816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665637#comment-16665637 ] Brian Zhang commented on SPARK-25816: - Here is another reproduce that should be related to this same issue: val v0 = sqlContext.read.avro("final_allDatatypes_Spark.avro"); val v00 = v0.toDF(v0.schema.fields.indices.view.map("" + _):_*) val v001 = v00.select($"0".as("0"), $"1".as("1"),$"2".as("2"),$"3".as("3"),$"4".as("4"),$"5".as("5"),$"6".as("6"),$"7".as("7"),$"8".as("8")) val v013 = $"8" val v010 = map(v013, v013) v001.where(map(v013, v010)(v013)(v013)==="dummy") org.apache.spark.sql.AnalysisException: Reference '8' is ambiguous, could be: 8, 8.; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:97) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$36.apply(Analyzer.scala:822) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$36.apply(Analyzer.scala:824) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:821) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:830) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:830) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$36.apply(Analyzer.scala:891) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$36.apply(Analyzer.scala:891) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:891) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:833) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665628#comment-16665628 ] Edwina Lu commented on SPARK-23206: --- [~irashid], yes, I am planning to work on the other tasks for adding the metrics at the stage level and in the UI. I am planning to see how the final APIs will look with SPARK-23206, and want to include these metrics for stage and UI as well. > Additional Memory Tuning Metrics > > > Key: SPARK-23206 > URL: https://issues.apache.org/jira/browse/SPARK-23206 > Project: Spark > Issue Type: Umbrella > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > Attachments: ExecutorsTab.png, ExecutorsTab2.png, > MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png > > > At LinkedIn, we have multiple clusters, running thousands of Spark > applications, and these numbers are growing rapidly. We need to ensure that > these Spark applications are well tuned – cluster resources, including > memory, should be used efficiently so that the cluster can support running > more applications concurrently, and applications should run quickly and > reliably. > Currently there is limited visibility into how much memory executors are > using, and users are guessing numbers for executor and driver memory sizing. > These estimates are often much larger than needed, leading to memory wastage. > Examining the metrics for one cluster for a month, the average percentage of > used executor memory (max JVM used memory across executors / > spark.executor.memory) is 35%, leading to an average of 591GB unused memory > per application (number of executors * (spark.executor.memory - max JVM used > memory)). Spark has multiple memory regions (user memory, execution memory, > storage memory, and overhead memory), and to understand how memory is being > used and fine-tune allocation between regions, it would be useful to have > information about how much memory is being used for the different regions. > To improve visibility into memory usage for the driver and executors and > different memory regions, the following additional memory metrics can be be > tracked for each executor and driver: > * JVM used memory: the JVM heap size for the executor/driver. > * Execution memory: memory used for computation in shuffles, joins, sorts > and aggregations. > * Storage memory: memory used caching and propagating internal data across > the cluster. > * Unified memory: sum of execution and storage memory. > The peak values for each memory metric can be tracked for each executor, and > also per stage. This information can be shown in the Spark UI and the REST > APIs. Information for peak JVM used memory can help with determining > appropriate values for spark.executor.memory and spark.driver.memory, and > information about the unified memory region can help with determining > appropriate values for spark.memory.fraction and > spark.memory.storageFraction. Stage memory information can help identify > which stages are most memory intensive, and users can look into the relevant > code to determine if it can be optimized. > The memory metrics can be gathered by adding the current JVM used memory, > execution memory and storage memory to the heartbeat. SparkListeners are > modified to collect the new metrics for the executors, stages and Spark > history log. Only interesting values (peak values per stage per executor) are > recorded in the Spark history log, to minimize the amount of additional > logging. > We have attached our design documentation with this ticket and would like to > receive feedback from the community for this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24793) Make spark-submit more useful with k8s
[ https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665578#comment-16665578 ] Stavros Kontopoulos edited comment on SPARK-24793 at 10/26/18 7:48 PM: --- >From a quick glance you can't just use the k8s backend to check status of the >driver. Standalone and mesos mode can support this because they are using the >rest cient which is a common api always available at spark core. We cant add >k8s dependency by default at that point of code. You then either use >reflection if k8s master is passed to load a class from the backend side or >query the K8s api server by extending that rest client and mapping pod status >to drivers status to keep UX the same. I will try the reflection thing as it >is used elsewhere as well, especially yarn stuff. was (Author: skonto): >From a quick glance you can't just use the k8s backend to check status of the >driver. Standalone and mesos mode can support this because they are using the >rest cient which is a common api always available at spark core. We cant add >k8s dependency by default at that point of code. You then either use >reflection if k8s master is passed to load a class from the backend side or >query the K8s api server by extending that rest client and mapping pod status >to drivers status to keep UX the same. > Make spark-submit more useful with k8s > -- > > Key: SPARK-24793 > URL: https://issues.apache.org/jira/browse/SPARK-24793 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Anirudh Ramanathan >Assignee: Anirudh Ramanathan >Priority: Major > > Support controlling the lifecycle of Spark Application through spark-submit. > For example: > {{ > --kill app_name If given, kills the driver specified. > --status app_name If given, requests the status of the driver > specified. > }} > Potentially also --list to list all spark drivers running. > Given that our submission client can actually launch jobs into many different > namespaces, we'll need an additional specification of the namespace through a > --namespace flag potentially. > I think this is pretty useful to have instead of forcing a user to use > kubectl to manage the lifecycle of any k8s Spark Application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23206) Additional Memory Tuning Metrics
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665594#comment-16665594 ] Imran Rashid edited comment on SPARK-23206 at 10/26/18 7:34 PM: Hi [~elu], just wondering are you still planning on working on the other tasks here related to getting these metrics in the UI? was (Author: irashid): Hi [~elu], just wondering are you still planning on working on the other tasks here related to get these metrics in the UI? > Additional Memory Tuning Metrics > > > Key: SPARK-23206 > URL: https://issues.apache.org/jira/browse/SPARK-23206 > Project: Spark > Issue Type: Umbrella > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > Attachments: ExecutorsTab.png, ExecutorsTab2.png, > MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png > > > At LinkedIn, we have multiple clusters, running thousands of Spark > applications, and these numbers are growing rapidly. We need to ensure that > these Spark applications are well tuned – cluster resources, including > memory, should be used efficiently so that the cluster can support running > more applications concurrently, and applications should run quickly and > reliably. > Currently there is limited visibility into how much memory executors are > using, and users are guessing numbers for executor and driver memory sizing. > These estimates are often much larger than needed, leading to memory wastage. > Examining the metrics for one cluster for a month, the average percentage of > used executor memory (max JVM used memory across executors / > spark.executor.memory) is 35%, leading to an average of 591GB unused memory > per application (number of executors * (spark.executor.memory - max JVM used > memory)). Spark has multiple memory regions (user memory, execution memory, > storage memory, and overhead memory), and to understand how memory is being > used and fine-tune allocation between regions, it would be useful to have > information about how much memory is being used for the different regions. > To improve visibility into memory usage for the driver and executors and > different memory regions, the following additional memory metrics can be be > tracked for each executor and driver: > * JVM used memory: the JVM heap size for the executor/driver. > * Execution memory: memory used for computation in shuffles, joins, sorts > and aggregations. > * Storage memory: memory used caching and propagating internal data across > the cluster. > * Unified memory: sum of execution and storage memory. > The peak values for each memory metric can be tracked for each executor, and > also per stage. This information can be shown in the Spark UI and the REST > APIs. Information for peak JVM used memory can help with determining > appropriate values for spark.executor.memory and spark.driver.memory, and > information about the unified memory region can help with determining > appropriate values for spark.memory.fraction and > spark.memory.storageFraction. Stage memory information can help identify > which stages are most memory intensive, and users can look into the relevant > code to determine if it can be optimized. > The memory metrics can be gathered by adding the current JVM used memory, > execution memory and storage memory to the heartbeat. SparkListeners are > modified to collect the new metrics for the executors, stages and Spark > history log. Only interesting values (peak values per stage per executor) are > recorded in the Spark history log, to minimize the amount of additional > logging. > We have attached our design documentation with this ticket and would like to > receive feedback from the community for this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665594#comment-16665594 ] Imran Rashid commented on SPARK-23206: -- Hi [~elu], just wondering are you still planning on working on the other tasks here related to get these metrics in the UI? > Additional Memory Tuning Metrics > > > Key: SPARK-23206 > URL: https://issues.apache.org/jira/browse/SPARK-23206 > Project: Spark > Issue Type: Umbrella > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: Edwina Lu >Priority: Major > Attachments: ExecutorsTab.png, ExecutorsTab2.png, > MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png > > > At LinkedIn, we have multiple clusters, running thousands of Spark > applications, and these numbers are growing rapidly. We need to ensure that > these Spark applications are well tuned – cluster resources, including > memory, should be used efficiently so that the cluster can support running > more applications concurrently, and applications should run quickly and > reliably. > Currently there is limited visibility into how much memory executors are > using, and users are guessing numbers for executor and driver memory sizing. > These estimates are often much larger than needed, leading to memory wastage. > Examining the metrics for one cluster for a month, the average percentage of > used executor memory (max JVM used memory across executors / > spark.executor.memory) is 35%, leading to an average of 591GB unused memory > per application (number of executors * (spark.executor.memory - max JVM used > memory)). Spark has multiple memory regions (user memory, execution memory, > storage memory, and overhead memory), and to understand how memory is being > used and fine-tune allocation between regions, it would be useful to have > information about how much memory is being used for the different regions. > To improve visibility into memory usage for the driver and executors and > different memory regions, the following additional memory metrics can be be > tracked for each executor and driver: > * JVM used memory: the JVM heap size for the executor/driver. > * Execution memory: memory used for computation in shuffles, joins, sorts > and aggregations. > * Storage memory: memory used caching and propagating internal data across > the cluster. > * Unified memory: sum of execution and storage memory. > The peak values for each memory metric can be tracked for each executor, and > also per stage. This information can be shown in the Spark UI and the REST > APIs. Information for peak JVM used memory can help with determining > appropriate values for spark.executor.memory and spark.driver.memory, and > information about the unified memory region can help with determining > appropriate values for spark.memory.fraction and > spark.memory.storageFraction. Stage memory information can help identify > which stages are most memory intensive, and users can look into the relevant > code to determine if it can be optimized. > The memory metrics can be gathered by adding the current JVM used memory, > execution memory and storage memory to the heartbeat. SparkListeners are > modified to collect the new metrics for the executors, stages and Spark > history log. Only interesting values (peak values per stage per executor) are > recorded in the Spark history log, to minimize the amount of additional > logging. > We have attached our design documentation with this ticket and would like to > receive feedback from the community for this proposal. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24793) Make spark-submit more useful with k8s
[ https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665578#comment-16665578 ] Stavros Kontopoulos edited comment on SPARK-24793 at 10/26/18 7:15 PM: --- >From a quick glance you can't just use the k8s backend to check status of the >driver. Standalone and mesos mode can support this because they are using the >rest cient which is a common api always available at spark core. We cant add >k8s dependency by default at that point of code. You then either use >reflection if k8s master is passed to load a class from the backend side or >query the K8s api server by extending that rest client and mapping pod status >to drivers status to keep UX the same. was (Author: skonto): >From a quick glance you can't just use the k8s backend to check status of the >driver. Standalone and mesos mode can support this because they are using the >rest cient which is a common api always available at spark core. We cant add >k8s dependency by default at that point of code. You then either use >reflection or hit the api server with a rest api. > Make spark-submit more useful with k8s > -- > > Key: SPARK-24793 > URL: https://issues.apache.org/jira/browse/SPARK-24793 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Anirudh Ramanathan >Assignee: Anirudh Ramanathan >Priority: Major > > Support controlling the lifecycle of Spark Application through spark-submit. > For example: > {{ > --kill app_name If given, kills the driver specified. > --status app_name If given, requests the status of the driver > specified. > }} > Potentially also --list to list all spark drivers running. > Given that our submission client can actually launch jobs into many different > namespaces, we'll need an additional specification of the namespace through a > --namespace flag potentially. > I think this is pretty useful to have instead of forcing a user to use > kubectl to manage the lifecycle of any k8s Spark Application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24793) Make spark-submit more useful with k8s
[ https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665578#comment-16665578 ] Stavros Kontopoulos commented on SPARK-24793: - >From a quick glance you can't just use the k8s backend to check status of the >driver. Standalone and mesos mode can support this because they are using the >rest cient which is a common api always available at spark core. We cant add >k8s dependency by default at that point of code. You then either use >reflection or hit the api server with a rest api. > Make spark-submit more useful with k8s > -- > > Key: SPARK-24793 > URL: https://issues.apache.org/jira/browse/SPARK-24793 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Anirudh Ramanathan >Assignee: Anirudh Ramanathan >Priority: Major > > Support controlling the lifecycle of Spark Application through spark-submit. > For example: > {{ > --kill app_name If given, kills the driver specified. > --status app_name If given, requests the status of the driver > specified. > }} > Potentially also --list to list all spark drivers running. > Given that our submission client can actually launch jobs into many different > namespaces, we'll need an additional specification of the namespace through a > --namespace flag potentially. > I think this is pretty useful to have instead of forcing a user to use > kubectl to manage the lifecycle of any k8s Spark Application. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25839) Implement use of KryoPool in KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-25839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25839: Assignee: (was: Apache Spark) > Implement use of KryoPool in KryoSerializer > --- > > Key: SPARK-25839 > URL: https://issues.apache.org/jira/browse/SPARK-25839 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.2, 2.3.1, 2.3.2 >Reporter: Patrick Brown >Priority: Minor > > The current implementation of KryoSerializer does not use KryoPool, which is > recommended by Kryo due to the creation of a Kryo instance being slow. > > The current implementation references the KryoSerializerInstance private > variable cachedKryo as effectively being a pool of size 1. However (in my > admittedly somewhat limited research) it seems that frequently (such as in > the ClosureCleaner ensureSerializable method) a new instance of > KryoSerializerInstance is created, which in turn forces a new instance of > Kryo itself to be created, this instance is then dropped from scope, causing > the "pool" not to be re-used. > > I have a small set of proposed changes we have been using on an internal > production application (running 24x7 for 6+ months, processing 10k+ jobs a > day) which implements using a KryoPool inside KryoSerializer which is then > used by each KryoSerializerInstance to borrow a Kryo instance. > > I believe this is mainly a performance improvement for applications > processing a large number of small jobs, where the cost of instantiating Kryo > instances is a larger portion of execution time compared to larger jobs. > > I have discussed this proposed change in the dev mailing list and it was > suggested I create this issue and a PR. It was also suggested I accompany > that with some performance metrics, which it is my plan to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25839) Implement use of KryoPool in KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-25839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25839: Assignee: Apache Spark > Implement use of KryoPool in KryoSerializer > --- > > Key: SPARK-25839 > URL: https://issues.apache.org/jira/browse/SPARK-25839 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.2, 2.3.1, 2.3.2 >Reporter: Patrick Brown >Assignee: Apache Spark >Priority: Minor > > The current implementation of KryoSerializer does not use KryoPool, which is > recommended by Kryo due to the creation of a Kryo instance being slow. > > The current implementation references the KryoSerializerInstance private > variable cachedKryo as effectively being a pool of size 1. However (in my > admittedly somewhat limited research) it seems that frequently (such as in > the ClosureCleaner ensureSerializable method) a new instance of > KryoSerializerInstance is created, which in turn forces a new instance of > Kryo itself to be created, this instance is then dropped from scope, causing > the "pool" not to be re-used. > > I have a small set of proposed changes we have been using on an internal > production application (running 24x7 for 6+ months, processing 10k+ jobs a > day) which implements using a KryoPool inside KryoSerializer which is then > used by each KryoSerializerInstance to borrow a Kryo instance. > > I believe this is mainly a performance improvement for applications > processing a large number of small jobs, where the cost of instantiating Kryo > instances is a larger portion of execution time compared to larger jobs. > > I have discussed this proposed change in the dev mailing list and it was > suggested I create this issue and a PR. It was also suggested I accompany > that with some performance metrics, which it is my plan to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25839) Implement use of KryoPool in KryoSerializer
[ https://issues.apache.org/jira/browse/SPARK-25839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665574#comment-16665574 ] Apache Spark commented on SPARK-25839: -- User 'patrickbrownsync' has created a pull request for this issue: https://github.com/apache/spark/pull/22855 > Implement use of KryoPool in KryoSerializer > --- > > Key: SPARK-25839 > URL: https://issues.apache.org/jira/browse/SPARK-25839 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.0.2, 2.3.1, 2.3.2 >Reporter: Patrick Brown >Priority: Minor > > The current implementation of KryoSerializer does not use KryoPool, which is > recommended by Kryo due to the creation of a Kryo instance being slow. > > The current implementation references the KryoSerializerInstance private > variable cachedKryo as effectively being a pool of size 1. However (in my > admittedly somewhat limited research) it seems that frequently (such as in > the ClosureCleaner ensureSerializable method) a new instance of > KryoSerializerInstance is created, which in turn forces a new instance of > Kryo itself to be created, this instance is then dropped from scope, causing > the "pool" not to be re-used. > > I have a small set of proposed changes we have been using on an internal > production application (running 24x7 for 6+ months, processing 10k+ jobs a > day) which implements using a KryoPool inside KryoSerializer which is then > used by each KryoSerializerInstance to borrow a Kryo instance. > > I believe this is mainly a performance improvement for applications > processing a large number of small jobs, where the cost of instantiating Kryo > instances is a larger portion of execution time compared to larger jobs. > > I have discussed this proposed change in the dev mailing list and it was > suggested I create this issue and a PR. It was also suggested I accompany > that with some performance metrics, which it is my plan to do. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25855) Don't use Erasure Coding for event log files
[ https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665500#comment-16665500 ] Xiao Chen commented on SPARK-25855: --- +1 to the idea. For spark event log behavior to be compatible by hflush'ing / hsync'ing, it should not use EC. If the file ends up being large, one can do a post-processing to convert it to EC after the file is closed. > Don't use Erasure Coding for event log files > > > Key: SPARK-25855 > URL: https://issues.apache.org/jira/browse/SPARK-25855 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > > While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a > bug with the event logs. The main issue was a bug in hdfs (HDFS-14027), but > it did make us wonder whether Spark should be using EC for event log files in > general. Its a poor choice because EC currently implements {{hflush()}} or > {{hsync()}} as no-ops, which mean you won't see anything in your event logs > until the app is complete. That isn't necessarily a bug, but isn't really > great. So I think we should ensure EC is always off for event logs. > IIUC there is *not* a problem with applications which die without properly > closing the outputstream. It'll take a while for the NN to realize the > client is gone and finish the block, but the data should get there eventually. > Also related are SPARK-24787 & SPARK-19531. > The space savings from EC would be nice as the event logs can get somewhat > large, but I think other factors outweigh this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25855) Don't use Erasure Coding for event log files
[ https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665487#comment-16665487 ] Imran Rashid commented on SPARK-25855: -- cc [~tgraves] [~ste...@apache.org] [~vanzin] who might be interested in this. Also [~xiaochen] as he helped explain the hdfs side to me and to make sure I didn't make a mistake. I'll post a pr shortly but would appreciate opinions on whether or not this is a good idea. > Don't use Erasure Coding for event log files > > > Key: SPARK-25855 > URL: https://issues.apache.org/jira/browse/SPARK-25855 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > > While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a > bug with the event logs. The main issue was a bug in hdfs (HDFS-14027), but > it did make us wonder whether Spark should be using EC for event log files in > general. Its a poor choice because EC currently implements {{hflush()}} or > {{hsync()}} as no-ops, which mean you won't see anything in your event logs > until the app is complete. That isn't necessarily a bug, but isn't really > great. So I think we should ensure EC is always off for event logs. > IIUC there is *not* a problem with applications which die without properly > closing the outputstream. It'll take a while for the NN to realize the > client is gone and finish the block, but the data should get there eventually. > Also related are SPARK-24787 & SPARK-19531. > The space savings from EC would be nice as the event logs can get somewhat > large, but I think other factors outweigh this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25855) Don't use Erasure Coding for event log files
[ https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-25855: - Description: While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a bug with the event logs. The main issue was a bug in hdfs (HDFS-14027), but it did make us wonder whether Spark should be using EC for event log files in general. Its a poor choice because EC currently implements {{hflush()}} or {{hsync()}} as no-ops, which mean you won't see anything in your event logs until the app is complete. That isn't necessarily a bug, but isn't really great. So I think we should ensure EC is always off for event logs. IIUC there is *not* a problem with applications which die without properly closing the outputstream. It'll take a while for the NN to realize the client is gone and finish the block, but the data should get there eventually. Also related are SPARK-24787 & SPARK-19531. The space savings from EC would be nice as the event logs can get somewhat large, but I think other factors outweigh this. was: While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a bug with the event logs. The main issue was a bug in hdfs (HDFS-14027), but it did make us wonder whether Spark should be using EC for event log files in general. Its a poor choice because EC currently implements {{hflush()}} or {{hsync()}} as no-ops, which mean you won't see anything in your event logs until the app is complete. That isn't necessarily a bug, but isn't really great. So I think we should ensure EC is always off for event logs. Also related are SPARK-24787 & SPARK-19531. The space savings from EC would be nice as the event logs can get somewhat large, but I think other factors outweigh this. > Don't use Erasure Coding for event log files > > > Key: SPARK-25855 > URL: https://issues.apache.org/jira/browse/SPARK-25855 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > > While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a > bug with the event logs. The main issue was a bug in hdfs (HDFS-14027), but > it did make us wonder whether Spark should be using EC for event log files in > general. Its a poor choice because EC currently implements {{hflush()}} or > {{hsync()}} as no-ops, which mean you won't see anything in your event logs > until the app is complete. That isn't necessarily a bug, but isn't really > great. So I think we should ensure EC is always off for event logs. > IIUC there is *not* a problem with applications which die without properly > closing the outputstream. It'll take a while for the NN to realize the > client is gone and finish the block, but the data should get there eventually. > Also related are SPARK-24787 & SPARK-19531. > The space savings from EC would be nice as the event logs can get somewhat > large, but I think other factors outweigh this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25855) Don't use Erasure Coding for event log files
Imran Rashid created SPARK-25855: Summary: Don't use Erasure Coding for event log files Key: SPARK-25855 URL: https://issues.apache.org/jira/browse/SPARK-25855 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.0 Reporter: Imran Rashid While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a bug with the event logs. The main issue was a bug in hdfs (HDFS-14027), but it did make us wonder whether Spark should be using EC for event log files in general. Its a poor choice because EC currently implements {{hflush()}} or {{hsync()}} as no-ops, which mean you won't see anything in your event logs until the app is complete. That isn't necessarily a bug, but isn't really great. So I think we should ensure EC is always off for event logs. Also related are SPARK-24787 & SPARK-19531. The space savings from EC would be nice as the event logs can get somewhat large, but I think other factors outweigh this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25804) JDOPersistenceManager leak when query via JDBC
[ https://issues.apache.org/jira/browse/SPARK-25804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25804: Attachment: image-2018-10-27-01-44-07-972.png > JDOPersistenceManager leak when query via JDBC > -- > > Key: SPARK-25804 > URL: https://issues.apache.org/jira/browse/SPARK-25804 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: pin_zhang >Priority: Major > Attachments: image-2018-10-27-01-44-07-972.png > > > 1. start-thriftserver.sh under SPARK2.3.1 > 2. Create Table and insert values > create table test_leak (id string, index int); > insert into test_leak values('id1',1) > 3. Create JDBC Client query the table > import java.sql.*; > public class HiveClient { > public static void main(String[] args) throws Exception { > String driverName = "org.apache.hive.jdbc.HiveDriver"; > Class.forName(driverName); > Connection con = DriverManager.getConnection( > "jdbc:hive2://localhost:1/default", "test", "test"); > Statement stmt = con.createStatement(); > String sql = "select * from test_leak"; > int loop = 100; > while ( loop – > 0) { > ResultSet rs = stmt.executeQuery(sql); > rs.next(); > System.out.println(new java.sql.Timestamp(System.currentTimeMillis()) +" > : " + rs.getString(1)); > rs.close(); > if( loop % 100 ==0){ > Thread.sleep(1); > } > } > con.close(); > } > } > 4. Dump HS2 heap org.datanucleus.api.jdo.JDOPersistenceManager instances keep > increasing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25255) Add getActiveSession to SparkSession in PySpark
[ https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-25255. - Resolution: Fixed Thanks for the PR and fixing this issue :) > Add getActiveSession to SparkSession in PySpark > --- > > Key: SPARK-25255 > URL: https://issues.apache.org/jira/browse/SPARK-25255 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: holdenk >Assignee: Huaxin Gao >Priority: Trivial > Labels: starter > Fix For: 3.0.0 > > > Add getActiveSession to PySpark session API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25255) Add getActiveSession to SparkSession in PySpark
[ https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-25255: Fix Version/s: 3.0.0 > Add getActiveSession to SparkSession in PySpark > --- > > Key: SPARK-25255 > URL: https://issues.apache.org/jira/browse/SPARK-25255 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: holdenk >Priority: Trivial > Labels: starter > Fix For: 3.0.0 > > > Add getActiveSession to PySpark session API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25255) Add getActiveSession to SparkSession in PySpark
[ https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-25255: --- Assignee: Huaxin Gao > Add getActiveSession to SparkSession in PySpark > --- > > Key: SPARK-25255 > URL: https://issues.apache.org/jira/browse/SPARK-25255 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: holdenk >Assignee: Huaxin Gao >Priority: Trivial > Labels: starter > Fix For: 3.0.0 > > > Add getActiveSession to PySpark session API. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail
[ https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665351#comment-16665351 ] Apache Spark commented on SPARK-25854: -- User 'shaneknapp' has created a pull request for this issue: https://github.com/apache/spark/pull/22854 > mvn helper script always exits w/1, causing mvn builds to fail > -- > > Key: SPARK-25854 > URL: https://issues.apache.org/jira/browse/SPARK-25854 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.2, 2.3.2, 2.4.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Critical > > the final line in the mvn helper script in build/ attempts to shut down the > zinc server. due to the zinc server being set up w/a 30min timeout, by the > time the mvn test instantiation finishes, the server times out. > this means that when the mvn script tries to shut down zinc, it returns w/an > exit code of 1. this will then automatically fail the entire build (even if > the build passes). > i propose the following: > 1) up the timeout to 3h > 2) put some logic at the end of the script to better handle killing the zinc > server > PR coming now. > [~srowen] [~cloud_fan] [~joshrosen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail
[ https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665350#comment-16665350 ] Apache Spark commented on SPARK-25854: -- User 'shaneknapp' has created a pull request for this issue: https://github.com/apache/spark/pull/22854 > mvn helper script always exits w/1, causing mvn builds to fail > -- > > Key: SPARK-25854 > URL: https://issues.apache.org/jira/browse/SPARK-25854 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.2, 2.3.2, 2.4.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Critical > > the final line in the mvn helper script in build/ attempts to shut down the > zinc server. due to the zinc server being set up w/a 30min timeout, by the > time the mvn test instantiation finishes, the server times out. > this means that when the mvn script tries to shut down zinc, it returns w/an > exit code of 1. this will then automatically fail the entire build (even if > the build passes). > i propose the following: > 1) up the timeout to 3h > 2) put some logic at the end of the script to better handle killing the zinc > server > PR coming now. > [~srowen] [~cloud_fan] [~joshrosen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail
[ https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25854: Assignee: shane knapp (was: Apache Spark) > mvn helper script always exits w/1, causing mvn builds to fail > -- > > Key: SPARK-25854 > URL: https://issues.apache.org/jira/browse/SPARK-25854 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.2, 2.3.2, 2.4.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Critical > > the final line in the mvn helper script in build/ attempts to shut down the > zinc server. due to the zinc server being set up w/a 30min timeout, by the > time the mvn test instantiation finishes, the server times out. > this means that when the mvn script tries to shut down zinc, it returns w/an > exit code of 1. this will then automatically fail the entire build (even if > the build passes). > i propose the following: > 1) up the timeout to 3h > 2) put some logic at the end of the script to better handle killing the zinc > server > PR coming now. > [~srowen] [~cloud_fan] [~joshrosen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail
[ https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25854: Assignee: Apache Spark (was: shane knapp) > mvn helper script always exits w/1, causing mvn builds to fail > -- > > Key: SPARK-25854 > URL: https://issues.apache.org/jira/browse/SPARK-25854 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.2, 2.3.2, 2.4.1 >Reporter: shane knapp >Assignee: Apache Spark >Priority: Critical > > the final line in the mvn helper script in build/ attempts to shut down the > zinc server. due to the zinc server being set up w/a 30min timeout, by the > time the mvn test instantiation finishes, the server times out. > this means that when the mvn script tries to shut down zinc, it returns w/an > exit code of 1. this will then automatically fail the entire build (even if > the build passes). > i propose the following: > 1) up the timeout to 3h > 2) put some logic at the end of the script to better handle killing the zinc > server > PR coming now. > [~srowen] [~cloud_fan] [~joshrosen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail
[ https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665349#comment-16665349 ] shane knapp commented on SPARK-25854: - https://github.com/apache/spark/pull/22854 > mvn helper script always exits w/1, causing mvn builds to fail > -- > > Key: SPARK-25854 > URL: https://issues.apache.org/jira/browse/SPARK-25854 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.2, 2.3.2, 2.4.1 >Reporter: shane knapp >Assignee: shane knapp >Priority: Critical > > the final line in the mvn helper script in build/ attempts to shut down the > zinc server. due to the zinc server being set up w/a 30min timeout, by the > time the mvn test instantiation finishes, the server times out. > this means that when the mvn script tries to shut down zinc, it returns w/an > exit code of 1. this will then automatically fail the entire build (even if > the build passes). > i propose the following: > 1) up the timeout to 3h > 2) put some logic at the end of the script to better handle killing the zinc > server > PR coming now. > [~srowen] [~cloud_fan] [~joshrosen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail
shane knapp created SPARK-25854: --- Summary: mvn helper script always exits w/1, causing mvn builds to fail Key: SPARK-25854 URL: https://issues.apache.org/jira/browse/SPARK-25854 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.3.2, 2.2.2, 2.4.1 Reporter: shane knapp Assignee: shane knapp the final line in the mvn helper script in build/ attempts to shut down the zinc server. due to the zinc server being set up w/a 30min timeout, by the time the mvn test instantiation finishes, the server times out. this means that when the mvn script tries to shut down zinc, it returns w/an exit code of 1. this will then automatically fail the entire build (even if the build passes). i propose the following: 1) up the timeout to 3h 2) put some logic at the end of the script to better handle killing the zinc server PR coming now. [~srowen] [~cloud_fan] [~joshrosen] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25845) Fix MatchError for calendar interval type in rangeBetween
[ https://issues.apache.org/jira/browse/SPARK-25845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665330#comment-16665330 ] Apache Spark commented on SPARK-25845: -- User 'jiangxb1987' has created a pull request for this issue: https://github.com/apache/spark/pull/22853 > Fix MatchError for calendar interval type in rangeBetween > - > > Key: SPARK-25845 > URL: https://issues.apache.org/jira/browse/SPARK-25845 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Priority: Major > > WindowSpecDefinition checks start < less, but CalendarIntervalType is not > comparable, so it would throw the following exception at runtime: > > > {noformat} > scala.MatchError: CalendarIntervalType (of class > org.apache.spark.sql.types.CalendarIntervalType$) at > org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:58) > at > org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering$lzycompute(predicates.scala:592) > at > org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering(predicates.scala:592) > at > org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:797) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:496) > at > org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.isGreaterThan(windowExpressions.scala:245) > at > org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.checkInputDataTypes(windowExpressions.scala:216) > at > org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:171) > at > org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:171) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43) > at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:48) at > org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved$lzycompute(windowExpressions.scala:48) > at > org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved(windowExpressions.scala:48) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83) >{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25845) Fix MatchError for calendar interval type in rangeBetween
[ https://issues.apache.org/jira/browse/SPARK-25845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665329#comment-16665329 ] Apache Spark commented on SPARK-25845: -- User 'jiangxb1987' has created a pull request for this issue: https://github.com/apache/spark/pull/22853 > Fix MatchError for calendar interval type in rangeBetween > - > > Key: SPARK-25845 > URL: https://issues.apache.org/jira/browse/SPARK-25845 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Priority: Major > > WindowSpecDefinition checks start < less, but CalendarIntervalType is not > comparable, so it would throw the following exception at runtime: > > > {noformat} > scala.MatchError: CalendarIntervalType (of class > org.apache.spark.sql.types.CalendarIntervalType$) at > org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:58) > at > org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering$lzycompute(predicates.scala:592) > at > org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering(predicates.scala:592) > at > org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:797) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:496) > at > org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.isGreaterThan(windowExpressions.scala:245) > at > org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.checkInputDataTypes(windowExpressions.scala:216) > at > org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:171) > at > org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:171) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43) > at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:48) at > org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved$lzycompute(windowExpressions.scala:48) > at > org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved(windowExpressions.scala:48) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83) >{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25845) Fix MatchError for calendar interval type in rangeBetween
[ https://issues.apache.org/jira/browse/SPARK-25845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25845: Assignee: Apache Spark > Fix MatchError for calendar interval type in rangeBetween > - > > Key: SPARK-25845 > URL: https://issues.apache.org/jira/browse/SPARK-25845 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Assignee: Apache Spark >Priority: Major > > WindowSpecDefinition checks start < less, but CalendarIntervalType is not > comparable, so it would throw the following exception at runtime: > > > {noformat} > scala.MatchError: CalendarIntervalType (of class > org.apache.spark.sql.types.CalendarIntervalType$) at > org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:58) > at > org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering$lzycompute(predicates.scala:592) > at > org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering(predicates.scala:592) > at > org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:797) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:496) > at > org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.isGreaterThan(windowExpressions.scala:245) > at > org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.checkInputDataTypes(windowExpressions.scala:216) > at > org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:171) > at > org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:171) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43) > at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:48) at > org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved$lzycompute(windowExpressions.scala:48) > at > org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved(windowExpressions.scala:48) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83) >{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25845) Fix MatchError for calendar interval type in rangeBetween
[ https://issues.apache.org/jira/browse/SPARK-25845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25845: Assignee: (was: Apache Spark) > Fix MatchError for calendar interval type in rangeBetween > - > > Key: SPARK-25845 > URL: https://issues.apache.org/jira/browse/SPARK-25845 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Priority: Major > > WindowSpecDefinition checks start < less, but CalendarIntervalType is not > comparable, so it would throw the following exception at runtime: > > > {noformat} > scala.MatchError: CalendarIntervalType (of class > org.apache.spark.sql.types.CalendarIntervalType$) at > org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:58) > at > org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering$lzycompute(predicates.scala:592) > at > org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering(predicates.scala:592) > at > org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:797) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:496) > at > org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.isGreaterThan(windowExpressions.scala:245) > at > org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.checkInputDataTypes(windowExpressions.scala:216) > at > org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:171) > at > org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:171) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38) > at > scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43) > at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:48) at > org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved$lzycompute(windowExpressions.scala:48) > at > org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved(windowExpressions.scala:48) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183) > at > scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83) >{noformat} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs
[ https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665254#comment-16665254 ] Shivaram Venkataraman commented on SPARK-12172: --- +1 - I think if spark.lapply uses only one or two functions we could even inline them > Consider removing SparkR internal RDD APIs > -- > > Key: SPARK-12172 > URL: https://issues.apache.org/jira/browse/SPARK-12172 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Felix Cheung >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25023) Clarify Spark security documentation
[ https://issues.apache.org/jira/browse/SPARK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-25023: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) > Clarify Spark security documentation > > > Key: SPARK-25023 > URL: https://issues.apache.org/jira/browse/SPARK-25023 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.2.2 >Reporter: Thomas Graves >Priority: Minor > > I was reading through our deployment docs and security docs and its not clear > at all what deployment modes support exactly what for security. I think we > should clarify the deployments that security is off by default on all > deployments. We may also want to clarify the types of communication used > that would need to be secured. We may also want to clarify multi-tenant safe > vs other things, like standalone mode for instance in my opinion is just note > secure, we do talk about using spark.authenticate for a secret but all > applications would use the same secret. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25836) (Temporarily) disable automatic build/test of kubernetes-integration-tests
[ https://issues.apache.org/jira/browse/SPARK-25836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25836. --- Resolution: Duplicate Fix Version/s: 2.4.0 Target Version/s: (was: 2.4.0) > (Temporarily) disable automatic build/test of kubernetes-integration-tests > -- > > Key: SPARK-25836 > URL: https://issues.apache.org/jira/browse/SPARK-25836 > Project: Spark > Issue Type: Task > Components: Build, Kubernetes >Affects Versions: 2.4.0 >Reporter: Sean Owen >Priority: Minor > Fix For: 2.4.0 > > > During 2.4.0 RC4 testing, we noticed an issue with > kubernetes-integration-tests and Scala 2.12 (SPARK-25835), and that the build > was actually publishing kubernetes-integration-tests. The tests are also > complicated in some ways and require some setup to run. This is being > simplified in SPARK-25809 for later. > These tests, it seems, can be instead run ad hoc manually for now, given the > above. A quick fix is to not enable this module even with the kubernetes > profile is active. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25835) Propagate scala 2.12 profile in k8s integration tests
[ https://issues.apache.org/jira/browse/SPARK-25835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25835. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22838 [https://github.com/apache/spark/pull/22838] > Propagate scala 2.12 profile in k8s integration tests > - > > Key: SPARK-25835 > URL: https://issues.apache.org/jira/browse/SPARK-25835 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos >Priority: Minor > Fix For: 2.4.0 > > > The > [line|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106] > that calls k8s integration tests ignores the scala version: -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25835) Propagate scala 2.12 profile in k8s integration tests
[ https://issues.apache.org/jira/browse/SPARK-25835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-25835: - Assignee: Stavros Kontopoulos > Propagate scala 2.12 profile in k8s integration tests > - > > Key: SPARK-25835 > URL: https://issues.apache.org/jira/browse/SPARK-25835 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Stavros Kontopoulos >Assignee: Stavros Kontopoulos >Priority: Minor > Fix For: 2.4.0 > > > The > [line|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106] > that calls k8s integration tests ignores the scala version: -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25023) Clarify Spark security documentation
[ https://issues.apache.org/jira/browse/SPARK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25023: Assignee: (was: Apache Spark) > Clarify Spark security documentation > > > Key: SPARK-25023 > URL: https://issues.apache.org/jira/browse/SPARK-25023 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.2 >Reporter: Thomas Graves >Priority: Major > > I was reading through our deployment docs and security docs and its not clear > at all what deployment modes support exactly what for security. I think we > should clarify the deployments that security is off by default on all > deployments. We may also want to clarify the types of communication used > that would need to be secured. We may also want to clarify multi-tenant safe > vs other things, like standalone mode for instance in my opinion is just note > secure, we do talk about using spark.authenticate for a secret but all > applications would use the same secret. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25023) Clarify Spark security documentation
[ https://issues.apache.org/jira/browse/SPARK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665197#comment-16665197 ] Apache Spark commented on SPARK-25023: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/22852 > Clarify Spark security documentation > > > Key: SPARK-25023 > URL: https://issues.apache.org/jira/browse/SPARK-25023 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.2 >Reporter: Thomas Graves >Priority: Major > > I was reading through our deployment docs and security docs and its not clear > at all what deployment modes support exactly what for security. I think we > should clarify the deployments that security is off by default on all > deployments. We may also want to clarify the types of communication used > that would need to be secured. We may also want to clarify multi-tenant safe > vs other things, like standalone mode for instance in my opinion is just note > secure, we do talk about using spark.authenticate for a secret but all > applications would use the same secret. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25023) Clarify Spark security documentation
[ https://issues.apache.org/jira/browse/SPARK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25023: Assignee: Apache Spark > Clarify Spark security documentation > > > Key: SPARK-25023 > URL: https://issues.apache.org/jira/browse/SPARK-25023 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.2 >Reporter: Thomas Graves >Assignee: Apache Spark >Priority: Major > > I was reading through our deployment docs and security docs and its not clear > at all what deployment modes support exactly what for security. I think we > should clarify the deployments that security is off by default on all > deployments. We may also want to clarify the types of communication used > that would need to be secured. We may also want to clarify multi-tenant safe > vs other things, like standalone mode for instance in my opinion is just note > secure, we do talk about using spark.authenticate for a secret but all > applications would use the same secret. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Priority: Major (was: Minor) > we should filter the workOffers of which freeCores>0 for better performance > --- > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Major > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 for better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Description: We should filter the workOffers of which freeCores=0 for better performance. (was: We should filter the workOffers of which freeCores=0 when make fake resource offers on all executors.) > we should filter the workOffers of which freeCores>0 for better performance > --- > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 for better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Summary: we should filter the workOffers of which freeCores>0 for better performance (was: we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors) > we should filter the workOffers of which freeCores>0 for better performance > --- > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25797) Views created via 2.1 cannot be read via 2.2+
[ https://issues.apache.org/jira/browse/SPARK-25797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665072#comment-16665072 ] Apache Spark commented on SPARK-25797: -- User 'seancxmao' has created a pull request for this issue: https://github.com/apache/spark/pull/22851 > Views created via 2.1 cannot be read via 2.2+ > - > > Key: SPARK-25797 > URL: https://issues.apache.org/jira/browse/SPARK-25797 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2 >Reporter: Chenxiao Mao >Priority: Major > > We ran into this issue when we update our Spark from 2.1 to 2.3. Below's a > simple example to reproduce the issue. > Create views via Spark 2.1 > {code:sql} > create view v1 as > select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1; > {code} > Query views via Spark 2.3 > {code:sql} > select * from v1; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: > decimal(19,0) as it may truncate > {code} > After investigation, we found that this is because when a view is created via > Spark 2.1, the expanded text is saved instead of the original text. > Unfortunately, the blow expanded text is buggy. > {code:sql} > spark-sql> desc extended v1; > c1 decimal(19,0) NULL > Detailed Table Information > Database default > Table v1 > Type VIEW > View Text SELECT `gen_attr_0` AS `c1` FROM (SELECT (CAST(CAST(1 AS > DECIMAL(18,0)) AS DECIMAL(19,0)) + CAST(CAST(1 AS DECIMAL(18,0)) AS > DECIMAL(19,0))) AS `gen_attr_0`) AS gen_subquery_0 > {code} > We can see that c1 is decimal(19,0), however in the expanded text there is > decimal(19,0) + decimal(19,0) which results in decimal(20,0). Since Spark > 2.2, decimal(20,0) in query is not allowed to cast to view definition column > decimal(19,0). ([https://github.com/apache/spark/pull/16561]) > I further tested other decimal calculations. Only add/subtract has this issue. > Create views via 2.1: > {code:sql} > create view v1 as > select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1; > create view v2 as > select (cast(1 as decimal(18,0)) - cast(1 as decimal(18,0))) c1; > create view v3 as > select (cast(1 as decimal(18,0)) * cast(1 as decimal(18,0))) c1; > create view v4 as > select (cast(1 as decimal(18,0)) / cast(1 as decimal(18,0))) c1; > create view v5 as > select (cast(1 as decimal(18,0)) % cast(1 as decimal(18,0))) c1; > create view v6 as > select cast(1 as decimal(18,0)) c1 > union > select cast(1 as decimal(19,0)) c1; > {code} > Query views via Spark 2.3 > {code:sql} > select * from v1; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: > decimal(19,0) as it may truncate > select * from v2; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3909: > decimal(19,0) as it may truncate > select * from v3; > 1 > select * from v4; > 1 > select * from v5; > 0 > select * from v6; > 1 > {code} > Views created via Spark 2.2+ don't have this issue because Spark 2.2+ does > not generate expanded text for view > (https://issues.apache.org/jira/browse/SPARK-18209). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25853) Parts of spark components (DAG Visualizationand executors page) not available in Internet Explorer
[ https://issues.apache.org/jira/browse/SPARK-25853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] aastha updated SPARK-25853: --- Summary: Parts of spark components (DAG Visualizationand executors page) not available in Internet Explorer (was: Parts of spark components not available in Internet Explorer) > Parts of spark components (DAG Visualizationand executors page) not available > in Internet Explorer > -- > > Key: SPARK-25853 > URL: https://issues.apache.org/jira/browse/SPARK-25853 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.2.0, 2.3.2 >Reporter: aastha >Priority: Major > Fix For: 2.3.3 > > Attachments: dag_error_ie.png, dag_not_rendered_ie.png, > dag_on_chrome.png, execuotrs_not_rendered_ie.png, executors_error_ie.png, > executors_on_chrome.png > > > Spark UI has come limitations when working with Internet Explorer. The DAG > component as well as Executors page does not render, it works on Firefox and > Chrome. I have tested on recent Inter Explorer 11.483.15063.0. Since it works > on Chrome and Firefox their versions should not matter. > For executors page, the root cause is that document.baseURI property is > undefined in Internet Explorer. When I debug by providing the property > myself, it shows up fine. > For DAG component, developer tools haven't helped. > Attaching screenshots for Chrome and IE UI and debug console messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25853) Parts of spark components not available in Internet Explorer
[ https://issues.apache.org/jira/browse/SPARK-25853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] aastha updated SPARK-25853: --- Attachment: executors_on_chrome.png executors_error_ie.png execuotrs_not_rendered_ie.png dag_on_chrome.png dag_not_rendered_ie.png dag_error_ie.png > Parts of spark components not available in Internet Explorer > > > Key: SPARK-25853 > URL: https://issues.apache.org/jira/browse/SPARK-25853 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.2.0, 2.3.2 >Reporter: aastha >Priority: Major > Fix For: 2.3.3 > > Attachments: dag_error_ie.png, dag_not_rendered_ie.png, > dag_on_chrome.png, execuotrs_not_rendered_ie.png, executors_error_ie.png, > executors_on_chrome.png > > > Spark UI has come limitations when working with Internet Explorer. The DAG > component as well as Executors page does not render, it works on Firefox and > Chrome. I have tested on recent Inter Explorer 11.483.15063.0. Since it works > on Chrome and Firefox their versions should not matter. > For executors page, the root cause is that document.baseURI property is > undefined in Internet Explorer. When I debug by providing the property > myself, it shows up fine. > For DAG component, developer tools haven't helped. > Attaching screenshots for Chrome and IE UI and debug console messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25853) Parts of spark components not available in Internet Explorer
aastha created SPARK-25853: -- Summary: Parts of spark components not available in Internet Explorer Key: SPARK-25853 URL: https://issues.apache.org/jira/browse/SPARK-25853 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.3.2, 2.2.0 Reporter: aastha Fix For: 2.3.3 Spark UI has come limitations when working with Internet Explorer. The DAG component as well as Executors page does not render, it works on Firefox and Chrome. I have tested on recent Inter Explorer 11.483.15063.0. Since it works on Chrome and Firefox their versions should not matter. For executors page, the root cause is that document.baseURI property is undefined in Internet Explorer. When I debug by providing the property myself, it shows up fine. For DAG component, developer tools haven't helped. Attaching screenshots for Chrome and IE UI and debug console messages. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Component/s: (was: Spark Core) Scheduler > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25852: Assignee: (was: Apache Spark) > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Attachment: 2018-10-26_162822.png > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664948#comment-16664948 ] Apache Spark commented on SPARK-25852: -- User 'zuotingbing' has created a pull request for this issue: https://github.com/apache/spark/pull/22849 > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25852: Assignee: Apache Spark > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: zuotingbing >Assignee: Apache Spark >Priority: Minor > Attachments: 2018-10-26_162822.png > > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors
[ https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zuotingbing updated SPARK-25852: Summary: we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors (was: we should filter the workOffers of which freeCores=0 when make fake resource offers on all executors) > we should filter the workOffers of which freeCores>0 when make fake resource > offers on all executors > > > Key: SPARK-25852 > URL: https://issues.apache.org/jira/browse/SPARK-25852 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: zuotingbing >Priority: Minor > > We should filter the workOffers of which freeCores=0 when make fake resource > offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25852) we should filter the workOffers of which freeCores=0 when make fake resource offers on all executors
zuotingbing created SPARK-25852: --- Summary: we should filter the workOffers of which freeCores=0 when make fake resource offers on all executors Key: SPARK-25852 URL: https://issues.apache.org/jira/browse/SPARK-25852 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.2 Reporter: zuotingbing We should filter the workOffers of which freeCores=0 when make fake resource offers on all executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25851) Fix deprecated API warning in SQLListener
[ https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25851: Assignee: (was: Apache Spark) > Fix deprecated API warning in SQLListener > - > > Key: SPARK-25851 > URL: https://issues.apache.org/jira/browse/SPARK-25851 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Trivial > > In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6. > There are some deprecated API warnings in SQLListener. > Create a trivial PR to fix them. > ``` > [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) > [warn] > [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(objectType, objectType)) > [warn] > [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long]) > [warn] > [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(longType, longType)) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25851) Fix deprecated API warning in SQLListener
[ https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664891#comment-16664891 ] Apache Spark commented on SPARK-25851: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/22848 > Fix deprecated API warning in SQLListener > - > > Key: SPARK-25851 > URL: https://issues.apache.org/jira/browse/SPARK-25851 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Trivial > > In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6. > There are some deprecated API warnings in SQLListener. > Create a trivial PR to fix them. > ``` > [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) > [warn] > [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(objectType, objectType)) > [warn] > [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long]) > [warn] > [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(longType, longType)) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25851) Fix deprecated API warning in SQLListener
[ https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25851: Assignee: Apache Spark > Fix deprecated API warning in SQLListener > - > > Key: SPARK-25851 > URL: https://issues.apache.org/jira/browse/SPARK-25851 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Trivial > > In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6. > There are some deprecated API warnings in SQLListener. > Create a trivial PR to fix them. > ``` > [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) > [warn] > [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(objectType, objectType)) > [warn] > [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long]) > [warn] > [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory > is deprecated: see corresponding Javadoc for more information. > [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], > Array(longType, longType)) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25851) Fix deprecated API warning in SQLListener
Gengliang Wang created SPARK-25851: -- Summary: Fix deprecated API warning in SQLListener Key: SPARK-25851 URL: https://issues.apache.org/jira/browse/SPARK-25851 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Gengliang Wang In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6. There are some deprecated API warnings in SQLListener. Create a trivial PR to fix them. ``` [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object]) [warn] [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(objectType, objectType)) [warn] [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long]) [warn] [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory is deprecated: see corresponding Javadoc for more information. [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], Array(longType, longType)) ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25850) Make the split threshold for the code generated method configurable
[ https://issues.apache.org/jira/browse/SPARK-25850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664857#comment-16664857 ] Apache Spark commented on SPARK-25850: -- User 'yucai' has created a pull request for this issue: https://github.com/apache/spark/pull/22847 > Make the split threshold for the code generated method configurable > --- > > Key: SPARK-25850 > URL: https://issues.apache.org/jira/browse/SPARK-25850 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: yucai >Priority: Major > > As per the discussion in > [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new > configuration spark.sql.codegen.methodSplitThreshold to make the split > threshold for the code generated method configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25850) Make the split threshold for the code generated method configurable
[ https://issues.apache.org/jira/browse/SPARK-25850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25850: Assignee: Apache Spark > Make the split threshold for the code generated method configurable > --- > > Key: SPARK-25850 > URL: https://issues.apache.org/jira/browse/SPARK-25850 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: yucai >Assignee: Apache Spark >Priority: Major > > As per the discussion in > [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new > configuration spark.sql.codegen.methodSplitThreshold to make the split > threshold for the code generated method configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25850) Make the split threshold for the code generated method configurable
[ https://issues.apache.org/jira/browse/SPARK-25850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25850: Assignee: (was: Apache Spark) > Make the split threshold for the code generated method configurable > --- > > Key: SPARK-25850 > URL: https://issues.apache.org/jira/browse/SPARK-25850 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: yucai >Priority: Major > > As per the discussion in > [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new > configuration spark.sql.codegen.methodSplitThreshold to make the split > threshold for the code generated method configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25850) Make the split threshold for the code generated method configurable
[ https://issues.apache.org/jira/browse/SPARK-25850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664854#comment-16664854 ] Apache Spark commented on SPARK-25850: -- User 'yucai' has created a pull request for this issue: https://github.com/apache/spark/pull/22847 > Make the split threshold for the code generated method configurable > --- > > Key: SPARK-25850 > URL: https://issues.apache.org/jira/browse/SPARK-25850 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: yucai >Priority: Major > > As per the discussion in > [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new > configuration spark.sql.codegen.methodSplitThreshold to make the split > threshold for the code generated method configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25850) Make the split threshold for the code generated method configurable
yucai created SPARK-25850: - Summary: Make the split threshold for the code generated method configurable Key: SPARK-25850 URL: https://issues.apache.org/jira/browse/SPARK-25850 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: yucai As per the discussion in [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new configuration spark.sql.codegen.methodSplitThreshold to make the split threshold for the code generated method configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25797) Views created via 2.1 cannot be read via 2.2+
[ https://issues.apache.org/jira/browse/SPARK-25797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25797: Assignee: (was: Apache Spark) > Views created via 2.1 cannot be read via 2.2+ > - > > Key: SPARK-25797 > URL: https://issues.apache.org/jira/browse/SPARK-25797 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2 >Reporter: Chenxiao Mao >Priority: Major > > We ran into this issue when we update our Spark from 2.1 to 2.3. Below's a > simple example to reproduce the issue. > Create views via Spark 2.1 > {code:sql} > create view v1 as > select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1; > {code} > Query views via Spark 2.3 > {code:sql} > select * from v1; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: > decimal(19,0) as it may truncate > {code} > After investigation, we found that this is because when a view is created via > Spark 2.1, the expanded text is saved instead of the original text. > Unfortunately, the blow expanded text is buggy. > {code:sql} > spark-sql> desc extended v1; > c1 decimal(19,0) NULL > Detailed Table Information > Database default > Table v1 > Type VIEW > View Text SELECT `gen_attr_0` AS `c1` FROM (SELECT (CAST(CAST(1 AS > DECIMAL(18,0)) AS DECIMAL(19,0)) + CAST(CAST(1 AS DECIMAL(18,0)) AS > DECIMAL(19,0))) AS `gen_attr_0`) AS gen_subquery_0 > {code} > We can see that c1 is decimal(19,0), however in the expanded text there is > decimal(19,0) + decimal(19,0) which results in decimal(20,0). Since Spark > 2.2, decimal(20,0) in query is not allowed to cast to view definition column > decimal(19,0). ([https://github.com/apache/spark/pull/16561]) > I further tested other decimal calculations. Only add/subtract has this issue. > Create views via 2.1: > {code:sql} > create view v1 as > select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1; > create view v2 as > select (cast(1 as decimal(18,0)) - cast(1 as decimal(18,0))) c1; > create view v3 as > select (cast(1 as decimal(18,0)) * cast(1 as decimal(18,0))) c1; > create view v4 as > select (cast(1 as decimal(18,0)) / cast(1 as decimal(18,0))) c1; > create view v5 as > select (cast(1 as decimal(18,0)) % cast(1 as decimal(18,0))) c1; > create view v6 as > select cast(1 as decimal(18,0)) c1 > union > select cast(1 as decimal(19,0)) c1; > {code} > Query views via Spark 2.3 > {code:sql} > select * from v1; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: > decimal(19,0) as it may truncate > select * from v2; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3909: > decimal(19,0) as it may truncate > select * from v3; > 1 > select * from v4; > 1 > select * from v5; > 0 > select * from v6; > 1 > {code} > Views created via Spark 2.2+ don't have this issue because Spark 2.2+ does > not generate expanded text for view > (https://issues.apache.org/jira/browse/SPARK-18209). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25797) Views created via 2.1 cannot be read via 2.2+
[ https://issues.apache.org/jira/browse/SPARK-25797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25797: Assignee: Apache Spark > Views created via 2.1 cannot be read via 2.2+ > - > > Key: SPARK-25797 > URL: https://issues.apache.org/jira/browse/SPARK-25797 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2 >Reporter: Chenxiao Mao >Assignee: Apache Spark >Priority: Major > > We ran into this issue when we update our Spark from 2.1 to 2.3. Below's a > simple example to reproduce the issue. > Create views via Spark 2.1 > {code:sql} > create view v1 as > select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1; > {code} > Query views via Spark 2.3 > {code:sql} > select * from v1; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: > decimal(19,0) as it may truncate > {code} > After investigation, we found that this is because when a view is created via > Spark 2.1, the expanded text is saved instead of the original text. > Unfortunately, the blow expanded text is buggy. > {code:sql} > spark-sql> desc extended v1; > c1 decimal(19,0) NULL > Detailed Table Information > Database default > Table v1 > Type VIEW > View Text SELECT `gen_attr_0` AS `c1` FROM (SELECT (CAST(CAST(1 AS > DECIMAL(18,0)) AS DECIMAL(19,0)) + CAST(CAST(1 AS DECIMAL(18,0)) AS > DECIMAL(19,0))) AS `gen_attr_0`) AS gen_subquery_0 > {code} > We can see that c1 is decimal(19,0), however in the expanded text there is > decimal(19,0) + decimal(19,0) which results in decimal(20,0). Since Spark > 2.2, decimal(20,0) in query is not allowed to cast to view definition column > decimal(19,0). ([https://github.com/apache/spark/pull/16561]) > I further tested other decimal calculations. Only add/subtract has this issue. > Create views via 2.1: > {code:sql} > create view v1 as > select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1; > create view v2 as > select (cast(1 as decimal(18,0)) - cast(1 as decimal(18,0))) c1; > create view v3 as > select (cast(1 as decimal(18,0)) * cast(1 as decimal(18,0))) c1; > create view v4 as > select (cast(1 as decimal(18,0)) / cast(1 as decimal(18,0))) c1; > create view v5 as > select (cast(1 as decimal(18,0)) % cast(1 as decimal(18,0))) c1; > create view v6 as > select cast(1 as decimal(18,0)) c1 > union > select cast(1 as decimal(19,0)) c1; > {code} > Query views via Spark 2.3 > {code:sql} > select * from v1; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: > decimal(19,0) as it may truncate > select * from v2; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3909: > decimal(19,0) as it may truncate > select * from v3; > 1 > select * from v4; > 1 > select * from v5; > 0 > select * from v6; > 1 > {code} > Views created via Spark 2.2+ don't have this issue because Spark 2.2+ does > not generate expanded text for view > (https://issues.apache.org/jira/browse/SPARK-18209). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25797) Views created via 2.1 cannot be read via 2.2+
[ https://issues.apache.org/jira/browse/SPARK-25797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664798#comment-16664798 ] Apache Spark commented on SPARK-25797: -- User 'seancxmao' has created a pull request for this issue: https://github.com/apache/spark/pull/22846 > Views created via 2.1 cannot be read via 2.2+ > - > > Key: SPARK-25797 > URL: https://issues.apache.org/jira/browse/SPARK-25797 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2 >Reporter: Chenxiao Mao >Priority: Major > > We ran into this issue when we update our Spark from 2.1 to 2.3. Below's a > simple example to reproduce the issue. > Create views via Spark 2.1 > {code:sql} > create view v1 as > select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1; > {code} > Query views via Spark 2.3 > {code:sql} > select * from v1; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: > decimal(19,0) as it may truncate > {code} > After investigation, we found that this is because when a view is created via > Spark 2.1, the expanded text is saved instead of the original text. > Unfortunately, the blow expanded text is buggy. > {code:sql} > spark-sql> desc extended v1; > c1 decimal(19,0) NULL > Detailed Table Information > Database default > Table v1 > Type VIEW > View Text SELECT `gen_attr_0` AS `c1` FROM (SELECT (CAST(CAST(1 AS > DECIMAL(18,0)) AS DECIMAL(19,0)) + CAST(CAST(1 AS DECIMAL(18,0)) AS > DECIMAL(19,0))) AS `gen_attr_0`) AS gen_subquery_0 > {code} > We can see that c1 is decimal(19,0), however in the expanded text there is > decimal(19,0) + decimal(19,0) which results in decimal(20,0). Since Spark > 2.2, decimal(20,0) in query is not allowed to cast to view definition column > decimal(19,0). ([https://github.com/apache/spark/pull/16561]) > I further tested other decimal calculations. Only add/subtract has this issue. > Create views via 2.1: > {code:sql} > create view v1 as > select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1; > create view v2 as > select (cast(1 as decimal(18,0)) - cast(1 as decimal(18,0))) c1; > create view v3 as > select (cast(1 as decimal(18,0)) * cast(1 as decimal(18,0))) c1; > create view v4 as > select (cast(1 as decimal(18,0)) / cast(1 as decimal(18,0))) c1; > create view v5 as > select (cast(1 as decimal(18,0)) % cast(1 as decimal(18,0))) c1; > create view v6 as > select cast(1 as decimal(18,0)) c1 > union > select cast(1 as decimal(19,0)) c1; > {code} > Query views via Spark 2.3 > {code:sql} > select * from v1; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: > decimal(19,0) as it may truncate > select * from v2; > Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3909: > decimal(19,0) as it may truncate > select * from v3; > 1 > select * from v4; > 1 > select * from v5; > 0 > select * from v6; > 1 > {code} > Views created via Spark 2.2+ don't have this issue because Spark 2.2+ does > not generate expanded text for view > (https://issues.apache.org/jira/browse/SPARK-18209). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23084) Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark
[ https://issues.apache.org/jira/browse/SPARK-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-23084. - Resolution: Won't Fix Fix Version/s: (was: 2.4.0) This was merged but then reverted due to https://issues.apache.org/jira/browse/SPARK-25842 > Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark > --- > > Key: SPARK-23084 > URL: https://issues.apache.org/jira/browse/SPARK-23084 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Huaxin Gao >Priority: Major > > Add the new APIs (introduced by https://github.com/apache/spark/pull/18814) > to PySpark. Also update the rangeBetween API > {noformat} > /** > * Window function: returns the special frame boundary that represents the > first row in the > * window partition. > * > * @group window_funcs > * @since 2.3.0 > */ > def unboundedPreceding(): Column = Column(UnboundedPreceding) > /** > * Window function: returns the special frame boundary that represents the > last row in the > * window partition. > * > * @group window_funcs > * @since 2.3.0 > */ > def unboundedFollowing(): Column = Column(UnboundedFollowing) > /** > * Window function: returns the special frame boundary that represents the > current row in the > * window partition. > * > * @group window_funcs > * @since 2.3.0 > */ > def currentRow(): Column = Column(CurrentRow) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-23084) Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark
[ https://issues.apache.org/jira/browse/SPARK-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin reopened SPARK-23084: - > Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark > --- > > Key: SPARK-23084 > URL: https://issues.apache.org/jira/browse/SPARK-23084 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 2.3.0 >Reporter: Xiao Li >Assignee: Huaxin Gao >Priority: Major > > Add the new APIs (introduced by https://github.com/apache/spark/pull/18814) > to PySpark. Also update the rangeBetween API > {noformat} > /** > * Window function: returns the special frame boundary that represents the > first row in the > * window partition. > * > * @group window_funcs > * @since 2.3.0 > */ > def unboundedPreceding(): Column = Column(UnboundedPreceding) > /** > * Window function: returns the special frame boundary that represents the > last row in the > * window partition. > * > * @group window_funcs > * @since 2.3.0 > */ > def unboundedFollowing(): Column = Column(UnboundedFollowing) > /** > * Window function: returns the special frame boundary that represents the > current row in the > * window partition. > * > * @group window_funcs > * @since 2.3.0 > */ > def currentRow(): Column = Column(CurrentRow) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org