[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs

2018-10-26 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665908#comment-16665908
 ] 

Felix Cheung commented on SPARK-12172:
--

sounds good

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25828) Bumping Version of kubernetes.client to latest version

2018-10-26 Thread Erik Erlandson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Erlandson reassigned SPARK-25828:
--

Assignee: Ilan Filonenko

> Bumping Version of kubernetes.client to latest version
> --
>
> Key: SPARK-25828
> URL: https://issues.apache.org/jira/browse/SPARK-25828
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Ilan Filonenko
>Assignee: Ilan Filonenko
>Priority: Minor
> Fix For: 3.0.0
>
>
> Upgrade the Kubernetes client version to at least 
> [4.0.0|https://mvnrepository.com/artifact/io.fabric8/kubernetes-client/4.0.0] 
> as we are falling behind on fabric8 updates. This will be an update to both 
> kubernetes/core and kubernetes/integration-tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25828) Bumping Version of kubernetes.client to latest version

2018-10-26 Thread Erik Erlandson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Erlandson resolved SPARK-25828.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 22820
[https://github.com/apache/spark/pull/22820]

> Bumping Version of kubernetes.client to latest version
> --
>
> Key: SPARK-25828
> URL: https://issues.apache.org/jira/browse/SPARK-25828
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Ilan Filonenko
>Assignee: Ilan Filonenko
>Priority: Minor
> Fix For: 3.0.0
>
>
> Upgrade the Kubernetes client version to at least 
> [4.0.0|https://mvnrepository.com/artifact/io.fabric8/kubernetes-client/4.0.0] 
> as we are falling behind on fabric8 updates. This will be an update to both 
> kubernetes/core and kubernetes/integration-tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25858) Passing Field Metadata to Parquet

2018-10-26 Thread Xinli Shang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinli Shang updated SPARK-25858:

Description: 
h1. Problem Statement

The Spark WriteSupport class for Parquet is hardcoded to use 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which 
is not configurable. Currently, this class doesn’t carry over the field 
metadata in StructType to MessageType. However, Parquet column encryption 
(Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of 
Parquet, so that the metadata can be used to control column encryption.
h1. Technical Solution
 # Extend SparkToParquetSchemaConverter class and override convert() method to 
add the functionality of carrying over the field metadata
 # Extend ParquetWriteSupport and use the extended converter in #1. The 
extension avoids changing the built-in WriteSupport to mitigate the risk.
 # Change Spark code to make the WriteSupport class configurable to let the 
user configure to use the extended WriteSupport in #2.  The default 
WriteSupport is still 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.

h1. Technical Details

{{Note: The code below kind of in messy format. The link below shows correct 
format. }}
h2. Extend SparkToParquetSchemaConverter class

 *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter {

 

  *override* def convert(catalystSchema: StructType): MessageType =

{                   Types                   ._buildMessage_()                  
.addFields(catalystSchema.map(*convertFieldWithMetadata*): _*)                 
.named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_)                  
 }

 

  private def *convertFieldWithMetadata*(field: StructField) : Type =

{               val extField  = new ExtType[Any](convertField(field))           
    val metaBuilder = new MetadataBuilder().withMetadata(field.metadata)        
       val metaData = metaBuilder.getMap              
extField.setMetadata(metaData)              return extField         }

 }
h2. Extend ParquetWriteSupport

class CryptoParquetWriteSupport extends ParquetWriteSupport {

  *override* def init(configuration: Configuration): WriteContext =

{           val converter = new 
*SparkToParquetMetadataSchemaConverter*(configuration)   
createContext(configuration, converter)    }

}
h2. Make WriteSupport configurable

class ParquetFileFormat{

 

   **    override def prepareWrite(...) {

   …

   *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) {*

   ParquetOutputFormat._setWriteSupportClass_(job, 
_classOf_[ParquetWriteSupport])

   ** 

  ...

   }

}
h1. Verification

The 
[ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java]
 in the github repository 
[parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions]
 has a sample verification of passing down the field metadata and perform 
column encryption.
h1. Dependency
 * Parquet-1178
 * Parquet-1396
 * Parquet-1397

  was:
h1. Problem Statement

The Spark WriteSupport class for Parquet is hardcoded to use 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which 
is not configurable. Currently, this class doesn’t carry over the field 
metadata in StructType to MessageType. However, Parquet column encryption 
(Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of 
Parquet, so that the metadata can be used to control column encryption.
h1. Technical Solution
 # Extend SparkToParquetSchemaConverter class and override convert() method to 
add the functionality of carrying over the field metadata
 # Extend ParquetWriteSupport and use the extended converter in #1. The 
extension avoids changing the built-in WriteSupport to mitigate the risk.
 # Change Spark code to make the WriteSupport class configurable to let the 
user configure to use the extended WriteSupport in #2.  The default 
WriteSupport is still 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.

h1. Technical Details
h2. Extend SparkToParquetSchemaConverter class

 *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter {

 

  *override* def convert(catalystSchema: StructType): MessageType = {       
 

          Types

                  ._buildMessage_()

                 .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*)

                .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_)  
        

        }

 

  private def *convertFieldWithMetadata*(field: StructField) : Type =

{               val extField  = new ExtType[Any](convertField(field))           
    val metaBuilder = new MetadataBuilder().withMetadata(field.metadata)        
       val 

[jira] [Updated] (SPARK-25858) Passing Field Metadata to Parquet

2018-10-26 Thread Xinli Shang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinli Shang updated SPARK-25858:

Description: 
h1. Problem Statement

The Spark WriteSupport class for Parquet is hardcoded to use 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which 
is not configurable. Currently, this class doesn’t carry over the field 
metadata in StructType to MessageType. However, Parquet column encryption 
(Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of 
Parquet, so that the metadata can be used to control column encryption.
h1. Technical Solution
 # Extend SparkToParquetSchemaConverter class and override convert() method to 
add the functionality of carrying over the field metadata
 # Extend ParquetWriteSupport and use the extended converter in #1. The 
extension avoids changing the built-in WriteSupport to mitigate the risk.
 # Change Spark code to make the WriteSupport class configurable to let the 
user configure to use the extended WriteSupport in #2.  The default 
WriteSupport is still 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.

h1. Technical Details
h2. Extend SparkToParquetSchemaConverter class

 *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter {

 

  *override* def convert(catalystSchema: StructType): MessageType = {       
 

          Types

                  ._buildMessage_()

                 .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*)

                .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_)  
        

        }

 

  private def *convertFieldWithMetadata*(field: StructField) : Type =

{               val extField  = new ExtType[Any](convertField(field))           
    val metaBuilder = new MetadataBuilder().withMetadata(field.metadata)        
       val metaData = metaBuilder.getMap              
extField.setMetadata(metaData)              return extField         }

 }
h2. Extend ParquetWriteSupport

class CryptoParquetWriteSupport extends ParquetWriteSupport {

  *override* def init(configuration: Configuration): WriteContext =

{           val converter = new 
*SparkToParquetMetadataSchemaConverter*(configuration)   
createContext(configuration, converter)    }

}
h2. Make WriteSupport configurable

class ParquetFileFormat{

 

   **    override def prepareWrite(...) {

   …

   *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) {*

   ParquetOutputFormat._setWriteSupportClass_(job, 
_classOf_[ParquetWriteSupport])

   ** 

  ...

   }

}
h1. Verification

The 
[ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java]
 in the github repository 
[parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions]
 has a sample verification of passing down the field metadata and perform 
column encryption.
h1. Dependency
 * Parquet-1178
 * Parquet-1396
 * Parquet-1397

  was:
h1. Problem Statement

The Spark WriteSupport class for Parquet is hardcoded to use 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which 
is not configurable. Currently, this class doesn’t carry over the field 
metadata in StructType to MessageType. However, Parquet column encryption 
(Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of 
Parquet, so that the metadata can be used to control column encryption.
h1. Technical Solution
 # Extend SparkToParquetSchemaConverter class and override convert() method to 
add the functionality of carrying over the field metadata
 # Extend ParquetWriteSupport and use the extended converter in #1. The 
extension avoids changing the built-in WriteSupport to mitigate the risk.
 # Change Spark code to make the WriteSupport class configurable to let the 
user configure to use the extended WriteSupport in #2.  The default 
WriteSupport is still 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.

h1. Technical Details
h2. Extend SparkToParquetSchemaConverter class

 *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter {

 

  *override* def convert(catalystSchema: StructType): MessageType = {

        Types   

               ._buildMessage_()   

               .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*)

                .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_)  
  

      }

 

  private def *convertFieldWithMetadata*(field: StructField) : Type = {

              val extField  = new ExtType[Any](convertField(field))

              val metaBuilder = new 
MetadataBuilder().withMetadata(field.metadata)

              val metaData = metaBuilder.getMap

             extField.setMetadata(metaData)

             return extField    

[jira] [Updated] (SPARK-25858) Passing Field Metadata to Parquet

2018-10-26 Thread Xinli Shang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinli Shang updated SPARK-25858:

Description: 
h1. Problem Statement

The Spark WriteSupport class for Parquet is hardcoded to use 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which 
is not configurable. Currently, this class doesn’t carry over the field 
metadata in StructType to MessageType. However, Parquet column encryption 
(Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of 
Parquet, so that the metadata can be used to control column encryption.
h1. Technical Solution
 # Extend SparkToParquetSchemaConverter class and override convert() method to 
add the functionality of carrying over the field metadata
 # Extend ParquetWriteSupport and use the extended converter in #1. The 
extension avoids changing the built-in WriteSupport to mitigate the risk.
 # Change Spark code to make the WriteSupport class configurable to let the 
user configure to use the extended WriteSupport in #2.  The default 
WriteSupport is still 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.

h1. Technical Details
h2. Extend SparkToParquetSchemaConverter class

 *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter {

 

  *override* def convert(catalystSchema: StructType): MessageType = {

        Types   

               ._buildMessage_()   

               .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*)

                .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_)  
  

      }

 

  private def *convertFieldWithMetadata*(field: StructField) : Type = {

              val extField  = new ExtType[Any](convertField(field))

              val metaBuilder = new 
MetadataBuilder().withMetadata(field.metadata)

              val metaData = metaBuilder.getMap

             extField.setMetadata(metaData)

             return extField  

      }

 }
h2. Extend ParquetWriteSupport

class CryptoParquetWriteSupport extends ParquetWriteSupport {

  *override* def init(configuration: Configuration): WriteContext = {   

       val converter = new 
*SparkToParquetMetadataSchemaConverter*(configuration)

  createContext(configuration, converter)

   }

}
h2. Make WriteSupport configurable

class ParquetFileFormat{

 

   **    override def prepareWrite(...) {

   …

   *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) {*

   ParquetOutputFormat._setWriteSupportClass_(job, 
_classOf_[ParquetWriteSupport])

   ** 

  ...

   }

}
h1. Verification

The 
[ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java]
 in the github repository 
[parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions]
 has a sample verification of passing down the field metadata and perform 
column encryption.
h1. Dependency
 * Parquet-1178
 * Parquet-1396
 * Parquet-1397

  was:
h1. Problem Statement

The Spark WriteSupport class for Parquet is hardcoded to use 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which 
is not configurable. Currently, this class doesn’t carry over the field 
metadata in StructType to MessageType. However, Parquet column encryption 
(Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of 
Parquet, so that the metadata can be used to control column encryption. 
h1. Technical Solution 
 # Extend SparkToParquetSchemaConverter class and override convert() method to 
add the functionality of carrying over the field metadata
 # Extend ParquetWriteSupport and use the extended converter in #1. The 
extension avoids changing the built-in WriteSupport to mitigate the risk.
 # Change Spark code to make the WriteSupport class configurable to let the 
user configure to use the extended WriteSupport in #2.  The default 
WriteSupport is still 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. 

h1. Technical Details 
h2. Extend SparkToParquetSchemaConverter class 

  *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter 
{

 

   *override* def convert(catalystSchema: StructType): MessageType = {

 Types

   ._buildMessage_()

   .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*)

   .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_)

    }

 

   private def *convertFieldWithMetadata*(field: StructField) : Type = {

 val extField  = new ExtType[Any](convertField(field))

 val metaBuilder = new MetadataBuilder().withMetadata(field.metadata)

 val metaData = metaBuilder.getMap

 extField.setMetadata(metaData)

 return extField

   }

  }
h2. Extend ParquetWriteSupport

class CryptoParquetWriteSupport extends ParquetWriteSupport {

 

   *override* 

[jira] [Created] (SPARK-25858) Passing Field Metadata to Parquet

2018-10-26 Thread Xinli Shang (JIRA)
Xinli Shang created SPARK-25858:
---

 Summary: Passing Field Metadata to Parquet
 Key: SPARK-25858
 URL: https://issues.apache.org/jira/browse/SPARK-25858
 Project: Spark
  Issue Type: New Feature
  Components: Input/Output
Affects Versions: 2.3.2
Reporter: Xinli Shang


h1. Problem Statement

The Spark WriteSupport class for Parquet is hardcoded to use 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which 
is not configurable. Currently, this class doesn’t carry over the field 
metadata in StructType to MessageType. However, Parquet column encryption 
(Parquet-1396, Parquet-1178) requires the field metadata inside MessageType of 
Parquet, so that the metadata can be used to control column encryption. 
h1. Technical Solution 
 # Extend SparkToParquetSchemaConverter class and override convert() method to 
add the functionality of carrying over the field metadata
 # Extend ParquetWriteSupport and use the extended converter in #1. The 
extension avoids changing the built-in WriteSupport to mitigate the risk.
 # Change Spark code to make the WriteSupport class configurable to let the 
user configure to use the extended WriteSupport in #2.  The default 
WriteSupport is still 
org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. 

h1. Technical Details 
h2. Extend SparkToParquetSchemaConverter class 

  *SparkToParquetMetadataSchemaConverter* extends SparkToParquetSchemaConverter 
{

 

   *override* def convert(catalystSchema: StructType): MessageType = {

 Types

   ._buildMessage_()

   .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*)

   .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_)

    }

 

   private def *convertFieldWithMetadata*(field: StructField) : Type = {

 val extField  = new ExtType[Any](convertField(field))

 val metaBuilder = new MetadataBuilder().withMetadata(field.metadata)

 val metaData = metaBuilder.getMap

 extField.setMetadata(metaData)

 return extField

   }

  }
h2. Extend ParquetWriteSupport

class CryptoParquetWriteSupport extends ParquetWriteSupport {

 

   *override* def init(configuration: Configuration): WriteContext = {

   val converter = new 
*SparkToParquetMetadataSchemaConverter*(configuration)

   createContext(configuration, converter)

   }

}
h2. Make WriteSupport configurable

class ParquetFileFormat{

 

    **    override def prepareWrite(...) {

    …

    *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) {*

    ParquetOutputFormat._setWriteSupportClass_(job, 
_classOf_[ParquetWriteSupport])

    ** 

   ...

    }

}
h1. Verification 

The 
[ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java]
 in the github repository 
[parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions]
 has a sample verification of passing down the field metadata and perform 
column encryption. 
h1. Dependency
 * Parquet-1178
 * Parquet-1396
 * Parquet-1397



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25857) Document delegation token code in Spark

2018-10-26 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-25857:
--

 Summary: Document delegation token code in Spark
 Key: SPARK-25857
 URL: https://issues.apache.org/jira/browse/SPARK-25857
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


By this I mean not user documentation, but documenting the functionality 
provided in the {{org.apache.spark.deploy.security}} and related packages, so 
that other developers making changes there can refer to it.

It seems to be a source of confusion every time somebody needs touch that code, 
so it would be good to have a document explaining how it all works, including 
how it's hooked up to different resource managers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25821) Remove SQLContext methods deprecated as of Spark 1.4

2018-10-26 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25821.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 22815
[https://github.com/apache/spark/pull/22815]

> Remove SQLContext methods deprecated as of Spark 1.4
> 
>
> Key: SPARK-25821
> URL: https://issues.apache.org/jira/browse/SPARK-25821
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
> Fix For: 3.0.0
>
>
> There are several SQLContext methods that have been deprecate since Spark 
> 1.4, like:
> {code:java}
> @deprecated("Use read.parquet() instead.", "1.4.0")
> @scala.annotation.varargs
> def parquetFile(paths: String*): DataFrame = {
>   if (paths.isEmpty) {
> emptyDataFrame
>   } else {
> read.parquet(paths : _*)
>   }
> }{code}
> Let's remove them in Spark 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25851) Fix deprecated API warning in SQLListener

2018-10-26 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25851.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 22848
[https://github.com/apache/spark/pull/22848]

> Fix deprecated API warning in SQLListener
> -
>
> Key: SPARK-25851
> URL: https://issues.apache.org/jira/browse/SPARK-25851
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Trivial
> Fix For: 3.0.0
>
>
> In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6.
> There are some deprecated API warnings in SQLListener.
> Create a trivial PR to fix them.
> ```
> [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
> [warn] 
> [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(objectType, objectType))
> [warn] 
> [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long])
> [warn] 
> [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(longType, longType))
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25851) Fix deprecated API warning in SQLListener

2018-10-26 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-25851:
-

Assignee: Gengliang Wang

> Fix deprecated API warning in SQLListener
> -
>
> Key: SPARK-25851
> URL: https://issues.apache.org/jira/browse/SPARK-25851
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Trivial
> Fix For: 3.0.0
>
>
> In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6.
> There are some deprecated API warnings in SQLListener.
> Create a trivial PR to fix them.
> ```
> [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
> [warn] 
> [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(objectType, objectType))
> [warn] 
> [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long])
> [warn] 
> [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(longType, longType))
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail

2018-10-26 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25854.
---
   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   2.3.3
   2.2.3

Issue resolved by pull request 22854
[https://github.com/apache/spark/pull/22854]

> mvn helper script always exits w/1, causing mvn builds to fail
> --
>
> Key: SPARK-25854
> URL: https://issues.apache.org/jira/browse/SPARK-25854
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.2, 2.3.2, 2.4.1
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Critical
> Fix For: 2.2.3, 2.3.3, 3.0.0, 2.4.0
>
>
> the final line in the mvn helper script in build/ attempts to shut down the 
> zinc server.  due to the zinc server being set up w/a 30min timeout, by the 
> time the mvn test instantiation finishes, the server times out.
> this means that when the mvn script tries to shut down zinc, it returns w/an 
> exit code of 1.  this will then automatically fail the entire build (even if 
> the build passes).
> i propose the following:
> 1) up the timeout to 3h
> 2) put some logic at the end of the script to better handle killing the zinc 
> server
> PR coming now.
> [~srowen] [~cloud_fan] [~joshrosen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25855) Don't use Erasure Coding for event log files

2018-10-26 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665690#comment-16665690
 ] 

Thomas Graves commented on SPARK-25855:
---

it seems like it depends on whether you care to see the event logs before its 
finished.  If you are using the driver UI then generally people would use it 
while its running and once its finished it sounds like it would show up and you 
could see from history server.  So probably not a problem there.  But if you 
are using history server to view all UI's and expect logs to be there, it would 
be a big problem.

So it does sound like its better off by default as to not confuse users.  Were 
you going to make it configurable?

> Don't use Erasure Coding for event log files
> 
>
> Key: SPARK-25855
> URL: https://issues.apache.org/jira/browse/SPARK-25855
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>
> While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a 
> bug with the event logs.  The main issue was a bug in hdfs (HDFS-14027), but 
> it did make us wonder whether Spark should be using EC for event log files in 
> general.  Its a poor choice because EC currently implements {{hflush()}} or 
> {{hsync()}} as no-ops, which mean you won't see anything in your event logs 
> until the app is complete.  That isn't necessarily a bug, but isn't really 
> great.  So I think we should ensure EC is always off for event logs.
> IIUC there is *not* a problem with applications which die without properly 
> closing the outputstream.  It'll take a while for the NN to realize the 
> client is gone and finish the block, but the data should get there eventually.
> Also related are SPARK-24787 & SPARK-19531.
> The space savings from EC would be nice as the event logs can get somewhat 
> large, but I think other factors outweigh this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25856) Remove AverageLike and CountLike classes.

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25856:


Assignee: (was: Apache Spark)

> Remove AverageLike and CountLike classes.
> -
>
> Key: SPARK-25856
> URL: https://issues.apache.org/jira/browse/SPARK-25856
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Dilip Biswal
>Priority: Minor
>
> These two classes were added for regr_ expression support (
> SPARK-23907). These have been removed and hence we can remove these base 
> classes and inline the logic in the concrete classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25856) Remove AverageLike and CountLike classes.

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665649#comment-16665649
 ] 

Apache Spark commented on SPARK-25856:
--

User 'dilipbiswal' has created a pull request for this issue:
https://github.com/apache/spark/pull/22856

> Remove AverageLike and CountLike classes.
> -
>
> Key: SPARK-25856
> URL: https://issues.apache.org/jira/browse/SPARK-25856
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Dilip Biswal
>Priority: Minor
>
> These two classes were added for regr_ expression support (
> SPARK-23907). These have been removed and hence we can remove these base 
> classes and inline the logic in the concrete classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25856) Remove AverageLike and CountLike classes.

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25856:


Assignee: Apache Spark

> Remove AverageLike and CountLike classes.
> -
>
> Key: SPARK-25856
> URL: https://issues.apache.org/jira/browse/SPARK-25856
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Dilip Biswal
>Assignee: Apache Spark
>Priority: Minor
>
> These two classes were added for regr_ expression support (
> SPARK-23907). These have been removed and hence we can remove these base 
> classes and inline the logic in the concrete classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25856) Remove AverageLike and CountLike classes.

2018-10-26 Thread Dilip Biswal (JIRA)
Dilip Biswal created SPARK-25856:


 Summary: Remove AverageLike and CountLike classes.
 Key: SPARK-25856
 URL: https://issues.apache.org/jira/browse/SPARK-25856
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.1
Reporter: Dilip Biswal


These two classes were added for regr_ expression support (

SPARK-23907). These have been removed and hence we can remove these base 
classes and inline the logic in the concrete classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25816) Functions does not resolve Columns correctly

2018-10-26 Thread Brian Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Zhang updated SPARK-25816:

Attachment: final_allDatatypes_Spark.avro

> Functions does not resolve Columns correctly
> 
>
> Key: SPARK-25816
> URL: https://issues.apache.org/jira/browse/SPARK-25816
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Brian Zhang
>Priority: Critical
> Attachments: final_allDatatypes_Spark.avro, source.snappy.parquet
>
>
> When there is a duplicate column name in the current Dataframe and orginal 
> Dataframe where current df is selected from, Spark in 2.3.0 and 2.3.1 does 
> not resolve the column correctly when using it in the expression, hence 
> causing casting issue. The same code is working in Spark 2.2.1
> Please see below code to reproduce the issue
> import org.apache.spark._
> import org.apache.spark.rdd._
> import org.apache.spark.storage.StorageLevel._
> import org.apache.spark.sql._
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.catalyst.expressions._
> import org.apache.spark.sql.Column
> val v0 = spark.read.parquet("/data/home/bzinfa/bz/source.snappy.parquet")
> val v00 = v0.toDF(v0.schema.fields.indices.view.map("" + _):_*)
> val v5 = v00.select($"13".as("0"),$"14".as("1"),$"15".as("2"))
> val v5_2 = $"2"
> v5.where(lit(500).<(v5_2(new Column(new MapKeys(v5_2.expr))(lit(0)
> //v00's 3rdcolumn is binary and 16th is map
> Error:
> org.apache.spark.sql.AnalysisException: cannot resolve 'map_keys(`2`)' due to 
> data type mismatch: argument 1 requires map type, however, '`2`' is of binary 
> type.;
>  
>  'Project [0#1591, 1#1592, 2#1593] +- 'Filter (500 < 
> {color:#FF}2#1593{color}[map_keys({color:#FF}2#1561{color})[0]]) +- 
> Project [13#1572 AS 0#1591, 14#1573 AS 1#1592, 15#1574 AS 2#1593, 2#1561] +- 
> Project [c_bytes#1527 AS 0#1559, c_union#1528 AS 1#1560, c_fixed#1529 AS 
> 2#1561, c_boolean#1530 AS 3#1562, c_float#1531 AS 4#1563, c_double#1532 AS 
> 5#1564, c_int#1533 AS 6#1565, c_long#1534L AS 7#1566L, c_string#1535 AS 
> 8#1567, c_decimal_18_2#1536 AS 9#1568, c_decimal_28_2#1537 AS 10#1569, 
> c_decimal_38_2#1538 AS 11#1570, c_date#1539 AS 12#1571, simple_struct#1540 AS 
> 13#1572, simple_array#1541 AS 14#1573, simple_map#1542 AS 15#1574] +- 
> Relation[c_bytes#1527,c_union#1528,c_fixed#1529,c_boolean#1530,c_float#1531,c_double#1532,c_int#1533,c_long#1534L,c_string#1535,c_decimal_18_2#1536,c_decimal_28_2#1537,c_decimal_38_2#1538,c_date#1539,simple_struct#1540,simple_array#1541,simple_map#1542]
>  parquet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25816) Functions does not resolve Columns correctly

2018-10-26 Thread Brian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665637#comment-16665637
 ] 

Brian Zhang commented on SPARK-25816:
-

Here is another reproduce that should be related to this same issue:

val v0 = sqlContext.read.avro("final_allDatatypes_Spark.avro");
val v00 = v0.toDF(v0.schema.fields.indices.view.map("" + _):_*)
val v001 = v00.select($"0".as("0"), 
$"1".as("1"),$"2".as("2"),$"3".as("3"),$"4".as("4"),$"5".as("5"),$"6".as("6"),$"7".as("7"),$"8".as("8"))
val v013 = $"8"
val v010 = map(v013, v013)
 
v001.where(map(v013, v010)(v013)(v013)==="dummy")

 

org.apache.spark.sql.AnalysisException: Reference '8' is ambiguous, could be: 
8, 8.;
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:97)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$36.apply(Analyzer.scala:822)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$36.apply(Analyzer.scala:824)
 at 
org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:821)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:830)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve$2.apply(Analyzer.scala:830)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
 at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveReferences$$resolve(Analyzer.scala:830)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$36.apply(Analyzer.scala:891)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$36.apply(Analyzer.scala:891)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
 at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:127)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:891)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9.applyOrElse(Analyzer.scala:833)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
 at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
 

[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-10-26 Thread Edwina Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665628#comment-16665628
 ] 

Edwina Lu commented on SPARK-23206:
---

[~irashid], yes, I am planning to work on the other tasks for adding the 
metrics at the stage level and in the UI. I am planning to see how the final 
APIs will look with SPARK-23206, and want to include these metrics for stage 
and UI as well.

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24793) Make spark-submit more useful with k8s

2018-10-26 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665578#comment-16665578
 ] 

Stavros Kontopoulos edited comment on SPARK-24793 at 10/26/18 7:48 PM:
---

>From a quick glance you can't just use the k8s backend to check status of the 
>driver. Standalone and mesos mode can support this because they are using the 
>rest cient which is a common api always available at spark core. We cant add 
>k8s dependency by default at that point of code. You then either use 
>reflection if k8s master is passed to load a class from the backend side or 
>query the K8s api server by extending that rest client and mapping pod status 
>to drivers status to keep UX the same. I will try the reflection thing as it 
>is used elsewhere as well, especially yarn stuff.


was (Author: skonto):
>From a quick glance you can't just use the k8s backend to check status of the 
>driver. Standalone and mesos mode can support this because they are using the 
>rest cient which is a common api always available at spark core. We cant add 
>k8s dependency by default at that point of code. You then either use 
>reflection if k8s master is passed to load a class from the backend side or 
>query the K8s api server by extending that rest client and mapping pod status 
>to drivers status to keep UX the same.

> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23206) Additional Memory Tuning Metrics

2018-10-26 Thread Imran Rashid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665594#comment-16665594
 ] 

Imran Rashid edited comment on SPARK-23206 at 10/26/18 7:34 PM:


Hi [~elu], just wondering are you still planning on working on the other tasks 
here related to getting these metrics in the UI?


was (Author: irashid):
Hi [~elu], just wondering are you still planning on working on the other tasks 
here related to get these metrics in the UI?

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-10-26 Thread Imran Rashid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665594#comment-16665594
 ] 

Imran Rashid commented on SPARK-23206:
--

Hi [~elu], just wondering are you still planning on working on the other tasks 
here related to get these metrics in the UI?

> Additional Memory Tuning Metrics
> 
>
> Key: SPARK-23206
> URL: https://issues.apache.org/jira/browse/SPARK-23206
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
> Attachments: ExecutorsTab.png, ExecutorsTab2.png, 
> MemoryTuningMetricsDesignDoc.pdf, SPARK-23206 Design Doc.pdf, StageTab.png
>
>
> At LinkedIn, we have multiple clusters, running thousands of Spark 
> applications, and these numbers are growing rapidly. We need to ensure that 
> these Spark applications are well tuned – cluster resources, including 
> memory, should be used efficiently so that the cluster can support running 
> more applications concurrently, and applications should run quickly and 
> reliably.
> Currently there is limited visibility into how much memory executors are 
> using, and users are guessing numbers for executor and driver memory sizing. 
> These estimates are often much larger than needed, leading to memory wastage. 
> Examining the metrics for one cluster for a month, the average percentage of 
> used executor memory (max JVM used memory across executors /  
> spark.executor.memory) is 35%, leading to an average of 591GB unused memory 
> per application (number of executors * (spark.executor.memory - max JVM used 
> memory)). Spark has multiple memory regions (user memory, execution memory, 
> storage memory, and overhead memory), and to understand how memory is being 
> used and fine-tune allocation between regions, it would be useful to have 
> information about how much memory is being used for the different regions.
> To improve visibility into memory usage for the driver and executors and 
> different memory regions, the following additional memory metrics can be be 
> tracked for each executor and driver:
>  * JVM used memory: the JVM heap size for the executor/driver.
>  * Execution memory: memory used for computation in shuffles, joins, sorts 
> and aggregations.
>  * Storage memory: memory used caching and propagating internal data across 
> the cluster.
>  * Unified memory: sum of execution and storage memory.
> The peak values for each memory metric can be tracked for each executor, and 
> also per stage. This information can be shown in the Spark UI and the REST 
> APIs. Information for peak JVM used memory can help with determining 
> appropriate values for spark.executor.memory and spark.driver.memory, and 
> information about the unified memory region can help with determining 
> appropriate values for spark.memory.fraction and 
> spark.memory.storageFraction. Stage memory information can help identify 
> which stages are most memory intensive, and users can look into the relevant 
> code to determine if it can be optimized.
> The memory metrics can be gathered by adding the current JVM used memory, 
> execution memory and storage memory to the heartbeat. SparkListeners are 
> modified to collect the new metrics for the executors, stages and Spark 
> history log. Only interesting values (peak values per stage per executor) are 
> recorded in the Spark history log, to minimize the amount of additional 
> logging.
> We have attached our design documentation with this ticket and would like to 
> receive feedback from the community for this proposal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24793) Make spark-submit more useful with k8s

2018-10-26 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665578#comment-16665578
 ] 

Stavros Kontopoulos edited comment on SPARK-24793 at 10/26/18 7:15 PM:
---

>From a quick glance you can't just use the k8s backend to check status of the 
>driver. Standalone and mesos mode can support this because they are using the 
>rest cient which is a common api always available at spark core. We cant add 
>k8s dependency by default at that point of code. You then either use 
>reflection if k8s master is passed to load a class from the backend side or 
>query the K8s api server by extending that rest client and mapping pod status 
>to drivers status to keep UX the same.


was (Author: skonto):
>From a quick glance you can't just use the k8s backend to check status of the 
>driver. Standalone and mesos mode can support this because they are using the 
>rest cient which is a common api always available at spark core. We cant add 
>k8s dependency by default at that point of code. You then either use 
>reflection or hit the api server with a rest api.

> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24793) Make spark-submit more useful with k8s

2018-10-26 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665578#comment-16665578
 ] 

Stavros Kontopoulos commented on SPARK-24793:
-

>From a quick glance you can't just use the k8s backend to check status of the 
>driver. Standalone and mesos mode can support this because they are using the 
>rest cient which is a common api always available at spark core. We cant add 
>k8s dependency by default at that point of code. You then either use 
>reflection or hit the api server with a rest api.

> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25839) Implement use of KryoPool in KryoSerializer

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25839:


Assignee: (was: Apache Spark)

> Implement use of KryoPool in KryoSerializer
> ---
>
> Key: SPARK-25839
> URL: https://issues.apache.org/jira/browse/SPARK-25839
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.3.1, 2.3.2
>Reporter: Patrick Brown
>Priority: Minor
>
> The current implementation of KryoSerializer does not use KryoPool, which is 
> recommended by Kryo due to the creation of a Kryo instance being slow.
>  
> The current implementation references the KryoSerializerInstance private 
> variable cachedKryo as effectively being a pool of size 1. However (in my 
> admittedly somewhat limited research) it seems that frequently (such as in 
> the ClosureCleaner ensureSerializable method) a new instance of 
> KryoSerializerInstance is created, which in turn forces a new instance of 
> Kryo itself to be created, this instance is then dropped from scope, causing 
> the "pool" not to be re-used.
>  
> I have a small set of proposed changes we have been using on an internal 
> production application (running 24x7 for 6+ months, processing 10k+ jobs a 
> day) which implements using a KryoPool inside KryoSerializer which is then 
> used by each KryoSerializerInstance to borrow a Kryo instance.
>  
> I believe this is mainly a performance improvement for applications 
> processing a large number of small jobs, where the cost of instantiating Kryo 
> instances is a larger portion of execution time compared to larger jobs.
>  
> I have discussed this proposed change in the dev mailing list and it was 
> suggested I create this issue and a PR. It was also suggested I accompany 
> that with some performance metrics, which it is my plan to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25839) Implement use of KryoPool in KryoSerializer

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25839:


Assignee: Apache Spark

> Implement use of KryoPool in KryoSerializer
> ---
>
> Key: SPARK-25839
> URL: https://issues.apache.org/jira/browse/SPARK-25839
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.3.1, 2.3.2
>Reporter: Patrick Brown
>Assignee: Apache Spark
>Priority: Minor
>
> The current implementation of KryoSerializer does not use KryoPool, which is 
> recommended by Kryo due to the creation of a Kryo instance being slow.
>  
> The current implementation references the KryoSerializerInstance private 
> variable cachedKryo as effectively being a pool of size 1. However (in my 
> admittedly somewhat limited research) it seems that frequently (such as in 
> the ClosureCleaner ensureSerializable method) a new instance of 
> KryoSerializerInstance is created, which in turn forces a new instance of 
> Kryo itself to be created, this instance is then dropped from scope, causing 
> the "pool" not to be re-used.
>  
> I have a small set of proposed changes we have been using on an internal 
> production application (running 24x7 for 6+ months, processing 10k+ jobs a 
> day) which implements using a KryoPool inside KryoSerializer which is then 
> used by each KryoSerializerInstance to borrow a Kryo instance.
>  
> I believe this is mainly a performance improvement for applications 
> processing a large number of small jobs, where the cost of instantiating Kryo 
> instances is a larger portion of execution time compared to larger jobs.
>  
> I have discussed this proposed change in the dev mailing list and it was 
> suggested I create this issue and a PR. It was also suggested I accompany 
> that with some performance metrics, which it is my plan to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25839) Implement use of KryoPool in KryoSerializer

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665574#comment-16665574
 ] 

Apache Spark commented on SPARK-25839:
--

User 'patrickbrownsync' has created a pull request for this issue:
https://github.com/apache/spark/pull/22855

> Implement use of KryoPool in KryoSerializer
> ---
>
> Key: SPARK-25839
> URL: https://issues.apache.org/jira/browse/SPARK-25839
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.3.1, 2.3.2
>Reporter: Patrick Brown
>Priority: Minor
>
> The current implementation of KryoSerializer does not use KryoPool, which is 
> recommended by Kryo due to the creation of a Kryo instance being slow.
>  
> The current implementation references the KryoSerializerInstance private 
> variable cachedKryo as effectively being a pool of size 1. However (in my 
> admittedly somewhat limited research) it seems that frequently (such as in 
> the ClosureCleaner ensureSerializable method) a new instance of 
> KryoSerializerInstance is created, which in turn forces a new instance of 
> Kryo itself to be created, this instance is then dropped from scope, causing 
> the "pool" not to be re-used.
>  
> I have a small set of proposed changes we have been using on an internal 
> production application (running 24x7 for 6+ months, processing 10k+ jobs a 
> day) which implements using a KryoPool inside KryoSerializer which is then 
> used by each KryoSerializerInstance to borrow a Kryo instance.
>  
> I believe this is mainly a performance improvement for applications 
> processing a large number of small jobs, where the cost of instantiating Kryo 
> instances is a larger portion of execution time compared to larger jobs.
>  
> I have discussed this proposed change in the dev mailing list and it was 
> suggested I create this issue and a PR. It was also suggested I accompany 
> that with some performance metrics, which it is my plan to do.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25855) Don't use Erasure Coding for event log files

2018-10-26 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665500#comment-16665500
 ] 

Xiao Chen commented on SPARK-25855:
---

+1 to the idea. For spark event log behavior to be compatible by hflush'ing / 
hsync'ing, it should not use EC.

If the file ends up being large, one can do a post-processing to convert it to 
EC after the file is closed.

> Don't use Erasure Coding for event log files
> 
>
> Key: SPARK-25855
> URL: https://issues.apache.org/jira/browse/SPARK-25855
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>
> While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a 
> bug with the event logs.  The main issue was a bug in hdfs (HDFS-14027), but 
> it did make us wonder whether Spark should be using EC for event log files in 
> general.  Its a poor choice because EC currently implements {{hflush()}} or 
> {{hsync()}} as no-ops, which mean you won't see anything in your event logs 
> until the app is complete.  That isn't necessarily a bug, but isn't really 
> great.  So I think we should ensure EC is always off for event logs.
> IIUC there is *not* a problem with applications which die without properly 
> closing the outputstream.  It'll take a while for the NN to realize the 
> client is gone and finish the block, but the data should get there eventually.
> Also related are SPARK-24787 & SPARK-19531.
> The space savings from EC would be nice as the event logs can get somewhat 
> large, but I think other factors outweigh this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25855) Don't use Erasure Coding for event log files

2018-10-26 Thread Imran Rashid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665487#comment-16665487
 ] 

Imran Rashid commented on SPARK-25855:
--

cc [~tgraves] [~ste...@apache.org] [~vanzin] who might be interested in this.

Also [~xiaochen] as he helped explain the hdfs side to me and to make sure I 
didn't make a mistake.

I'll post a pr shortly but would appreciate opinions on whether or not this is 
a good idea.

> Don't use Erasure Coding for event log files
> 
>
> Key: SPARK-25855
> URL: https://issues.apache.org/jira/browse/SPARK-25855
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>
> While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a 
> bug with the event logs.  The main issue was a bug in hdfs (HDFS-14027), but 
> it did make us wonder whether Spark should be using EC for event log files in 
> general.  Its a poor choice because EC currently implements {{hflush()}} or 
> {{hsync()}} as no-ops, which mean you won't see anything in your event logs 
> until the app is complete.  That isn't necessarily a bug, but isn't really 
> great.  So I think we should ensure EC is always off for event logs.
> IIUC there is *not* a problem with applications which die without properly 
> closing the outputstream.  It'll take a while for the NN to realize the 
> client is gone and finish the block, but the data should get there eventually.
> Also related are SPARK-24787 & SPARK-19531.
> The space savings from EC would be nice as the event logs can get somewhat 
> large, but I think other factors outweigh this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25855) Don't use Erasure Coding for event log files

2018-10-26 Thread Imran Rashid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid updated SPARK-25855:
-
Description: 
While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a 
bug with the event logs.  The main issue was a bug in hdfs (HDFS-14027), but it 
did make us wonder whether Spark should be using EC for event log files in 
general.  Its a poor choice because EC currently implements {{hflush()}} or 
{{hsync()}} as no-ops, which mean you won't see anything in your event logs 
until the app is complete.  That isn't necessarily a bug, but isn't really 
great.  So I think we should ensure EC is always off for event logs.

IIUC there is *not* a problem with applications which die without properly 
closing the outputstream.  It'll take a while for the NN to realize the client 
is gone and finish the block, but the data should get there eventually.

Also related are SPARK-24787 & SPARK-19531.

The space savings from EC would be nice as the event logs can get somewhat 
large, but I think other factors outweigh this.

  was:
While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a 
bug with the event logs.  The main issue was a bug in hdfs (HDFS-14027), but it 
did make us wonder whether Spark should be using EC for event log files in 
general.  Its a poor choice because EC currently implements {{hflush()}} or 
{{hsync()}} as no-ops, which mean you won't see anything in your event logs 
until the app is complete.  That isn't necessarily a bug, but isn't really 
great.  So I think we should ensure EC is always off for event logs.

Also related are SPARK-24787 & SPARK-19531.

The space savings from EC would be nice as the event logs can get somewhat 
large, but I think other factors outweigh this.


> Don't use Erasure Coding for event log files
> 
>
> Key: SPARK-25855
> URL: https://issues.apache.org/jira/browse/SPARK-25855
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Priority: Major
>
> While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a 
> bug with the event logs.  The main issue was a bug in hdfs (HDFS-14027), but 
> it did make us wonder whether Spark should be using EC for event log files in 
> general.  Its a poor choice because EC currently implements {{hflush()}} or 
> {{hsync()}} as no-ops, which mean you won't see anything in your event logs 
> until the app is complete.  That isn't necessarily a bug, but isn't really 
> great.  So I think we should ensure EC is always off for event logs.
> IIUC there is *not* a problem with applications which die without properly 
> closing the outputstream.  It'll take a while for the NN to realize the 
> client is gone and finish the block, but the data should get there eventually.
> Also related are SPARK-24787 & SPARK-19531.
> The space savings from EC would be nice as the event logs can get somewhat 
> large, but I think other factors outweigh this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25855) Don't use Erasure Coding for event log files

2018-10-26 Thread Imran Rashid (JIRA)
Imran Rashid created SPARK-25855:


 Summary: Don't use Erasure Coding for event log files
 Key: SPARK-25855
 URL: https://issues.apache.org/jira/browse/SPARK-25855
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: Imran Rashid


While testing spark with hdfs erasure coding (new in hadoop 3), we ran into a 
bug with the event logs.  The main issue was a bug in hdfs (HDFS-14027), but it 
did make us wonder whether Spark should be using EC for event log files in 
general.  Its a poor choice because EC currently implements {{hflush()}} or 
{{hsync()}} as no-ops, which mean you won't see anything in your event logs 
until the app is complete.  That isn't necessarily a bug, but isn't really 
great.  So I think we should ensure EC is always off for event logs.

Also related are SPARK-24787 & SPARK-19531.

The space savings from EC would be nice as the event logs can get somewhat 
large, but I think other factors outweigh this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25804) JDOPersistenceManager leak when query via JDBC

2018-10-26 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-25804:

Attachment: image-2018-10-27-01-44-07-972.png

> JDOPersistenceManager leak when query via JDBC
> --
>
> Key: SPARK-25804
> URL: https://issues.apache.org/jira/browse/SPARK-25804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: pin_zhang
>Priority: Major
> Attachments: image-2018-10-27-01-44-07-972.png
>
>
> 1. start-thriftserver.sh under SPARK2.3.1
> 2. Create Table and insert values
>      create table test_leak (id string, index int);
>      insert into test_leak values('id1',1)
> 3. Create JDBC Client query the table
> import java.sql.*;
> public class HiveClient {
> public static void main(String[] args) throws Exception {
> String driverName = "org.apache.hive.jdbc.HiveDriver";
>  Class.forName(driverName);
>  Connection con = DriverManager.getConnection( 
> "jdbc:hive2://localhost:1/default", "test", "test");
>  Statement stmt = con.createStatement();
>  String sql = "select * from test_leak";
>  int loop = 100;
>  while ( loop – > 0) {
>     ResultSet rs = stmt.executeQuery(sql);
>     rs.next();
>     System.out.println(new java.sql.Timestamp(System.currentTimeMillis()) +" 
> : " +    rs.getString(1));
>    rs.close();
>   if( loop % 100 ==0){
>      Thread.sleep(1);
>   }
> }
> con.close(); 
>  }
>  }
> 4. Dump HS2 heap org.datanucleus.api.jdo.JDOPersistenceManager instances keep 
> increasing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25255) Add getActiveSession to SparkSession in PySpark

2018-10-26 Thread holdenk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk resolved SPARK-25255.
-
Resolution: Fixed

Thanks for the PR and fixing this issue :)

> Add getActiveSession to SparkSession in PySpark
> ---
>
> Key: SPARK-25255
> URL: https://issues.apache.org/jira/browse/SPARK-25255
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: holdenk
>Assignee: Huaxin Gao
>Priority: Trivial
>  Labels: starter
> Fix For: 3.0.0
>
>
> Add getActiveSession to PySpark session API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25255) Add getActiveSession to SparkSession in PySpark

2018-10-26 Thread holdenk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk updated SPARK-25255:

Fix Version/s: 3.0.0

> Add getActiveSession to SparkSession in PySpark
> ---
>
> Key: SPARK-25255
> URL: https://issues.apache.org/jira/browse/SPARK-25255
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: holdenk
>Priority: Trivial
>  Labels: starter
> Fix For: 3.0.0
>
>
> Add getActiveSession to PySpark session API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25255) Add getActiveSession to SparkSession in PySpark

2018-10-26 Thread holdenk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk reassigned SPARK-25255:
---

Assignee: Huaxin Gao

> Add getActiveSession to SparkSession in PySpark
> ---
>
> Key: SPARK-25255
> URL: https://issues.apache.org/jira/browse/SPARK-25255
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: holdenk
>Assignee: Huaxin Gao
>Priority: Trivial
>  Labels: starter
> Fix For: 3.0.0
>
>
> Add getActiveSession to PySpark session API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665351#comment-16665351
 ] 

Apache Spark commented on SPARK-25854:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/22854

> mvn helper script always exits w/1, causing mvn builds to fail
> --
>
> Key: SPARK-25854
> URL: https://issues.apache.org/jira/browse/SPARK-25854
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.2, 2.3.2, 2.4.1
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Critical
>
> the final line in the mvn helper script in build/ attempts to shut down the 
> zinc server.  due to the zinc server being set up w/a 30min timeout, by the 
> time the mvn test instantiation finishes, the server times out.
> this means that when the mvn script tries to shut down zinc, it returns w/an 
> exit code of 1.  this will then automatically fail the entire build (even if 
> the build passes).
> i propose the following:
> 1) up the timeout to 3h
> 2) put some logic at the end of the script to better handle killing the zinc 
> server
> PR coming now.
> [~srowen] [~cloud_fan] [~joshrosen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665350#comment-16665350
 ] 

Apache Spark commented on SPARK-25854:
--

User 'shaneknapp' has created a pull request for this issue:
https://github.com/apache/spark/pull/22854

> mvn helper script always exits w/1, causing mvn builds to fail
> --
>
> Key: SPARK-25854
> URL: https://issues.apache.org/jira/browse/SPARK-25854
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.2, 2.3.2, 2.4.1
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Critical
>
> the final line in the mvn helper script in build/ attempts to shut down the 
> zinc server.  due to the zinc server being set up w/a 30min timeout, by the 
> time the mvn test instantiation finishes, the server times out.
> this means that when the mvn script tries to shut down zinc, it returns w/an 
> exit code of 1.  this will then automatically fail the entire build (even if 
> the build passes).
> i propose the following:
> 1) up the timeout to 3h
> 2) put some logic at the end of the script to better handle killing the zinc 
> server
> PR coming now.
> [~srowen] [~cloud_fan] [~joshrosen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25854:


Assignee: shane knapp  (was: Apache Spark)

> mvn helper script always exits w/1, causing mvn builds to fail
> --
>
> Key: SPARK-25854
> URL: https://issues.apache.org/jira/browse/SPARK-25854
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.2, 2.3.2, 2.4.1
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Critical
>
> the final line in the mvn helper script in build/ attempts to shut down the 
> zinc server.  due to the zinc server being set up w/a 30min timeout, by the 
> time the mvn test instantiation finishes, the server times out.
> this means that when the mvn script tries to shut down zinc, it returns w/an 
> exit code of 1.  this will then automatically fail the entire build (even if 
> the build passes).
> i propose the following:
> 1) up the timeout to 3h
> 2) put some logic at the end of the script to better handle killing the zinc 
> server
> PR coming now.
> [~srowen] [~cloud_fan] [~joshrosen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25854:


Assignee: Apache Spark  (was: shane knapp)

> mvn helper script always exits w/1, causing mvn builds to fail
> --
>
> Key: SPARK-25854
> URL: https://issues.apache.org/jira/browse/SPARK-25854
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.2, 2.3.2, 2.4.1
>Reporter: shane knapp
>Assignee: Apache Spark
>Priority: Critical
>
> the final line in the mvn helper script in build/ attempts to shut down the 
> zinc server.  due to the zinc server being set up w/a 30min timeout, by the 
> time the mvn test instantiation finishes, the server times out.
> this means that when the mvn script tries to shut down zinc, it returns w/an 
> exit code of 1.  this will then automatically fail the entire build (even if 
> the build passes).
> i propose the following:
> 1) up the timeout to 3h
> 2) put some logic at the end of the script to better handle killing the zinc 
> server
> PR coming now.
> [~srowen] [~cloud_fan] [~joshrosen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail

2018-10-26 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665349#comment-16665349
 ] 

shane knapp commented on SPARK-25854:
-

https://github.com/apache/spark/pull/22854

> mvn helper script always exits w/1, causing mvn builds to fail
> --
>
> Key: SPARK-25854
> URL: https://issues.apache.org/jira/browse/SPARK-25854
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.2, 2.3.2, 2.4.1
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Critical
>
> the final line in the mvn helper script in build/ attempts to shut down the 
> zinc server.  due to the zinc server being set up w/a 30min timeout, by the 
> time the mvn test instantiation finishes, the server times out.
> this means that when the mvn script tries to shut down zinc, it returns w/an 
> exit code of 1.  this will then automatically fail the entire build (even if 
> the build passes).
> i propose the following:
> 1) up the timeout to 3h
> 2) put some logic at the end of the script to better handle killing the zinc 
> server
> PR coming now.
> [~srowen] [~cloud_fan] [~joshrosen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25854) mvn helper script always exits w/1, causing mvn builds to fail

2018-10-26 Thread shane knapp (JIRA)
shane knapp created SPARK-25854:
---

 Summary: mvn helper script always exits w/1, causing mvn builds to 
fail
 Key: SPARK-25854
 URL: https://issues.apache.org/jira/browse/SPARK-25854
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.3.2, 2.2.2, 2.4.1
Reporter: shane knapp
Assignee: shane knapp


the final line in the mvn helper script in build/ attempts to shut down the 
zinc server.  due to the zinc server being set up w/a 30min timeout, by the 
time the mvn test instantiation finishes, the server times out.

this means that when the mvn script tries to shut down zinc, it returns w/an 
exit code of 1.  this will then automatically fail the entire build (even if 
the build passes).

i propose the following:

1) up the timeout to 3h

2) put some logic at the end of the script to better handle killing the zinc 
server

PR coming now.

[~srowen] [~cloud_fan] [~joshrosen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25845) Fix MatchError for calendar interval type in rangeBetween

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665330#comment-16665330
 ] 

Apache Spark commented on SPARK-25845:
--

User 'jiangxb1987' has created a pull request for this issue:
https://github.com/apache/spark/pull/22853

> Fix MatchError for calendar interval type in rangeBetween
> -
>
> Key: SPARK-25845
> URL: https://issues.apache.org/jira/browse/SPARK-25845
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Reynold Xin
>Priority: Major
>
> WindowSpecDefinition checks start < less, but CalendarIntervalType is not 
> comparable, so it would throw the following exception at runtime:
>  
>  
> {noformat}
>  scala.MatchError: CalendarIntervalType (of class 
> org.apache.spark.sql.types.CalendarIntervalType$)  at 
> org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:58)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering$lzycompute(predicates.scala:592)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering(predicates.scala:592)
>  at 
> org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:797)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:496)
>  at 
> org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.isGreaterThan(windowExpressions.scala:245)
>  at 
> org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.checkInputDataTypes(windowExpressions.scala:216)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:171)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:171)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
>  at 
> scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
>  at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:48) at 
> org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved$lzycompute(windowExpressions.scala:48)
>  at 
> org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved(windowExpressions.scala:48)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83) 
>{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25845) Fix MatchError for calendar interval type in rangeBetween

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665329#comment-16665329
 ] 

Apache Spark commented on SPARK-25845:
--

User 'jiangxb1987' has created a pull request for this issue:
https://github.com/apache/spark/pull/22853

> Fix MatchError for calendar interval type in rangeBetween
> -
>
> Key: SPARK-25845
> URL: https://issues.apache.org/jira/browse/SPARK-25845
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Reynold Xin
>Priority: Major
>
> WindowSpecDefinition checks start < less, but CalendarIntervalType is not 
> comparable, so it would throw the following exception at runtime:
>  
>  
> {noformat}
>  scala.MatchError: CalendarIntervalType (of class 
> org.apache.spark.sql.types.CalendarIntervalType$)  at 
> org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:58)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering$lzycompute(predicates.scala:592)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering(predicates.scala:592)
>  at 
> org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:797)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:496)
>  at 
> org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.isGreaterThan(windowExpressions.scala:245)
>  at 
> org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.checkInputDataTypes(windowExpressions.scala:216)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:171)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:171)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
>  at 
> scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
>  at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:48) at 
> org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved$lzycompute(windowExpressions.scala:48)
>  at 
> org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved(windowExpressions.scala:48)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83) 
>{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25845) Fix MatchError for calendar interval type in rangeBetween

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25845:


Assignee: Apache Spark

> Fix MatchError for calendar interval type in rangeBetween
> -
>
> Key: SPARK-25845
> URL: https://issues.apache.org/jira/browse/SPARK-25845
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Reynold Xin
>Assignee: Apache Spark
>Priority: Major
>
> WindowSpecDefinition checks start < less, but CalendarIntervalType is not 
> comparable, so it would throw the following exception at runtime:
>  
>  
> {noformat}
>  scala.MatchError: CalendarIntervalType (of class 
> org.apache.spark.sql.types.CalendarIntervalType$)  at 
> org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:58)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering$lzycompute(predicates.scala:592)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering(predicates.scala:592)
>  at 
> org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:797)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:496)
>  at 
> org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.isGreaterThan(windowExpressions.scala:245)
>  at 
> org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.checkInputDataTypes(windowExpressions.scala:216)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:171)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:171)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
>  at 
> scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
>  at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:48) at 
> org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved$lzycompute(windowExpressions.scala:48)
>  at 
> org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved(windowExpressions.scala:48)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83) 
>{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25845) Fix MatchError for calendar interval type in rangeBetween

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25845:


Assignee: (was: Apache Spark)

> Fix MatchError for calendar interval type in rangeBetween
> -
>
> Key: SPARK-25845
> URL: https://issues.apache.org/jira/browse/SPARK-25845
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Reynold Xin
>Priority: Major
>
> WindowSpecDefinition checks start < less, but CalendarIntervalType is not 
> comparable, so it would throw the following exception at runtime:
>  
>  
> {noformat}
>  scala.MatchError: CalendarIntervalType (of class 
> org.apache.spark.sql.types.CalendarIntervalType$)  at 
> org.apache.spark.sql.catalyst.util.TypeUtils$.getInterpretedOrdering(TypeUtils.scala:58)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering$lzycompute(predicates.scala:592)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryComparison.ordering(predicates.scala:592)
>  at 
> org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:797)
>  at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:496)
>  at 
> org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.isGreaterThan(windowExpressions.scala:245)
>  at 
> org.apache.spark.sql.catalyst.expressions.SpecifiedWindowFrame.checkInputDataTypes(windowExpressions.scala:216)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved$lzycompute(Expression.scala:171)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression.resolved(Expression.scala:171)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)
>  at 
> scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)
>  at scala.collection.mutable.ArrayBuffer.forall(ArrayBuffer.scala:48) at 
> org.apache.spark.sql.catalyst.expressions.Expression.childrenResolved(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved$lzycompute(windowExpressions.scala:48)
>  at 
> org.apache.spark.sql.catalyst.expressions.WindowSpecDefinition.resolved(windowExpressions.scala:48)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$childrenResolved$1.apply(Expression.scala:183)
>  at 
> scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83) 
>{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs

2018-10-26 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665254#comment-16665254
 ] 

Shivaram Venkataraman commented on SPARK-12172:
---

+1 - I think if spark.lapply uses only one or two functions we could even 
inline them

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25023) Clarify Spark security documentation

2018-10-26 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-25023:
--
  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

> Clarify Spark security documentation
> 
>
> Key: SPARK-25023
> URL: https://issues.apache.org/jira/browse/SPARK-25023
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.2.2
>Reporter: Thomas Graves
>Priority: Minor
>
> I was reading through our deployment docs and security docs and its not clear 
> at all what deployment modes support exactly what for security.  I think we 
> should clarify the deployments that security is off by default on all 
> deployments.  We may also want to clarify the types of communication used 
> that would need to be secured.  We may also want to clarify multi-tenant safe 
> vs other things, like standalone mode for instance in my opinion is just note 
> secure, we do talk about using spark.authenticate for a secret but all 
> applications would use the same secret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25836) (Temporarily) disable automatic build/test of kubernetes-integration-tests

2018-10-26 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25836.
---
  Resolution: Duplicate
   Fix Version/s: 2.4.0
Target Version/s:   (was: 2.4.0)

> (Temporarily) disable automatic build/test of kubernetes-integration-tests
> --
>
> Key: SPARK-25836
> URL: https://issues.apache.org/jira/browse/SPARK-25836
> Project: Spark
>  Issue Type: Task
>  Components: Build, Kubernetes
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Priority: Minor
> Fix For: 2.4.0
>
>
> During 2.4.0 RC4 testing, we noticed an issue with 
> kubernetes-integration-tests and Scala 2.12 (SPARK-25835), and that the build 
> was actually publishing kubernetes-integration-tests. The tests are also 
> complicated in some ways and require some setup to run. This is being 
> simplified in SPARK-25809 for later.
> These tests, it seems, can be instead run ad hoc manually for now, given the 
> above. A quick fix is to not enable this module even with the kubernetes 
> profile is active.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25835) Propagate scala 2.12 profile in k8s integration tests

2018-10-26 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25835.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22838
[https://github.com/apache/spark/pull/22838]

> Propagate scala 2.12 profile in k8s integration tests
> -
>
> Key: SPARK-25835
> URL: https://issues.apache.org/jira/browse/SPARK-25835
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.4.0
>
>
> The 
> [line|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106]
>  that calls k8s integration tests ignores the scala version: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25835) Propagate scala 2.12 profile in k8s integration tests

2018-10-26 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-25835:
-

Assignee: Stavros Kontopoulos

> Propagate scala 2.12 profile in k8s integration tests
> -
>
> Key: SPARK-25835
> URL: https://issues.apache.org/jira/browse/SPARK-25835
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.4.0
>
>
> The 
> [line|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106]
>  that calls k8s integration tests ignores the scala version: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25023) Clarify Spark security documentation

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25023:


Assignee: (was: Apache Spark)

> Clarify Spark security documentation
> 
>
> Key: SPARK-25023
> URL: https://issues.apache.org/jira/browse/SPARK-25023
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.2
>Reporter: Thomas Graves
>Priority: Major
>
> I was reading through our deployment docs and security docs and its not clear 
> at all what deployment modes support exactly what for security.  I think we 
> should clarify the deployments that security is off by default on all 
> deployments.  We may also want to clarify the types of communication used 
> that would need to be secured.  We may also want to clarify multi-tenant safe 
> vs other things, like standalone mode for instance in my opinion is just note 
> secure, we do talk about using spark.authenticate for a secret but all 
> applications would use the same secret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25023) Clarify Spark security documentation

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665197#comment-16665197
 ] 

Apache Spark commented on SPARK-25023:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/22852

> Clarify Spark security documentation
> 
>
> Key: SPARK-25023
> URL: https://issues.apache.org/jira/browse/SPARK-25023
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.2
>Reporter: Thomas Graves
>Priority: Major
>
> I was reading through our deployment docs and security docs and its not clear 
> at all what deployment modes support exactly what for security.  I think we 
> should clarify the deployments that security is off by default on all 
> deployments.  We may also want to clarify the types of communication used 
> that would need to be secured.  We may also want to clarify multi-tenant safe 
> vs other things, like standalone mode for instance in my opinion is just note 
> secure, we do talk about using spark.authenticate for a secret but all 
> applications would use the same secret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25023) Clarify Spark security documentation

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25023:


Assignee: Apache Spark

> Clarify Spark security documentation
> 
>
> Key: SPARK-25023
> URL: https://issues.apache.org/jira/browse/SPARK-25023
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.2
>Reporter: Thomas Graves
>Assignee: Apache Spark
>Priority: Major
>
> I was reading through our deployment docs and security docs and its not clear 
> at all what deployment modes support exactly what for security.  I think we 
> should clarify the deployments that security is off by default on all 
> deployments.  We may also want to clarify the types of communication used 
> that would need to be secured.  We may also want to clarify multi-tenant safe 
> vs other things, like standalone mode for instance in my opinion is just note 
> secure, we do talk about using spark.authenticate for a secret but all 
> applications would use the same secret.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance

2018-10-26 Thread zuotingbing (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zuotingbing updated SPARK-25852:

Priority: Major  (was: Minor)

> we should filter the workOffers of which freeCores>0 for better performance
> ---
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Priority: Major
> Attachments: 2018-10-26_162822.png
>
>
> We should filter the workOffers of which freeCores=0 for better performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance

2018-10-26 Thread zuotingbing (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zuotingbing updated SPARK-25852:

Description: We should filter the workOffers of which freeCores=0 for 
better performance.  (was: We should filter the workOffers of which freeCores=0 
when make fake resource offers on all executors.)

> we should filter the workOffers of which freeCores>0 for better performance
> ---
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Priority: Minor
> Attachments: 2018-10-26_162822.png
>
>
> We should filter the workOffers of which freeCores=0 for better performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 for better performance

2018-10-26 Thread zuotingbing (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zuotingbing updated SPARK-25852:

Summary: we should filter the workOffers of which freeCores>0 for better 
performance  (was: we should filter the workOffers of which freeCores>0 when 
make fake resource offers on all executors)

> we should filter the workOffers of which freeCores>0 for better performance
> ---
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Priority: Minor
> Attachments: 2018-10-26_162822.png
>
>
> We should filter the workOffers of which freeCores=0 when make fake resource 
> offers on all executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25797) Views created via 2.1 cannot be read via 2.2+

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665072#comment-16665072
 ] 

Apache Spark commented on SPARK-25797:
--

User 'seancxmao' has created a pull request for this issue:
https://github.com/apache/spark/pull/22851

> Views created via 2.1 cannot be read via 2.2+
> -
>
> Key: SPARK-25797
> URL: https://issues.apache.org/jira/browse/SPARK-25797
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2
>Reporter: Chenxiao Mao
>Priority: Major
>
> We ran into this issue when we update our Spark from 2.1 to 2.3. Below's a 
> simple example to reproduce the issue.
> Create views via Spark 2.1
> {code:sql}
> create view v1 as
> select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1;
> {code}
> Query views via Spark 2.3
> {code:sql}
> select * from v1;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: 
> decimal(19,0) as it may truncate
> {code}
> After investigation, we found that this is because when a view is created via 
> Spark 2.1, the expanded text is saved instead of the original text. 
> Unfortunately, the blow expanded text is buggy.
> {code:sql}
> spark-sql> desc extended v1;
> c1 decimal(19,0) NULL
> Detailed Table Information
> Database default
> Table v1
> Type VIEW
> View Text SELECT `gen_attr_0` AS `c1` FROM (SELECT (CAST(CAST(1 AS 
> DECIMAL(18,0)) AS DECIMAL(19,0)) + CAST(CAST(1 AS DECIMAL(18,0)) AS 
> DECIMAL(19,0))) AS `gen_attr_0`) AS gen_subquery_0
> {code}
> We can see that c1 is decimal(19,0), however in the expanded text there is 
> decimal(19,0) + decimal(19,0) which results in decimal(20,0). Since Spark 
> 2.2, decimal(20,0) in query is not allowed to cast to view definition column 
> decimal(19,0). ([https://github.com/apache/spark/pull/16561])
> I further tested other decimal calculations. Only add/subtract has this issue.
> Create views via 2.1:
> {code:sql}
> create view v1 as
> select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1;
> create view v2 as
> select (cast(1 as decimal(18,0)) - cast(1 as decimal(18,0))) c1;
> create view v3 as
> select (cast(1 as decimal(18,0)) * cast(1 as decimal(18,0))) c1;
> create view v4 as
> select (cast(1 as decimal(18,0)) / cast(1 as decimal(18,0))) c1;
> create view v5 as
> select (cast(1 as decimal(18,0)) % cast(1 as decimal(18,0))) c1;
> create view v6 as
> select cast(1 as decimal(18,0)) c1
> union
> select cast(1 as decimal(19,0)) c1;
> {code}
> Query views via Spark 2.3
> {code:sql}
> select * from v1;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: 
> decimal(19,0) as it may truncate
> select * from v2;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3909: 
> decimal(19,0) as it may truncate
> select * from v3;
> 1
> select * from v4;
> 1
> select * from v5;
> 0
> select * from v6;
> 1
> {code}
> Views created via Spark 2.2+ don't have this issue because Spark 2.2+ does 
> not generate expanded text for view 
> (https://issues.apache.org/jira/browse/SPARK-18209).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25853) Parts of spark components (DAG Visualizationand executors page) not available in Internet Explorer

2018-10-26 Thread aastha (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aastha updated SPARK-25853:
---
Summary: Parts of spark components (DAG Visualizationand executors page) 
not available in Internet Explorer  (was: Parts of spark components not 
available in Internet Explorer)

> Parts of spark components (DAG Visualizationand executors page) not available 
> in Internet Explorer
> --
>
> Key: SPARK-25853
> URL: https://issues.apache.org/jira/browse/SPARK-25853
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.2.0, 2.3.2
>Reporter: aastha
>Priority: Major
> Fix For: 2.3.3
>
> Attachments: dag_error_ie.png, dag_not_rendered_ie.png, 
> dag_on_chrome.png, execuotrs_not_rendered_ie.png, executors_error_ie.png, 
> executors_on_chrome.png
>
>
> Spark UI has come limitations when working with Internet Explorer. The DAG 
> component as well as Executors page does not render, it works on Firefox and 
> Chrome. I have tested on recent Inter Explorer 11.483.15063.0. Since it works 
> on Chrome and Firefox their versions should not matter.
> For executors page, the root cause is that document.baseURI property is 
> undefined in Internet Explorer. When I debug by providing the property 
> myself, it shows up fine.
> For DAG component, developer tools haven't helped. 
> Attaching screenshots for Chrome and IE UI and debug console messages. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25853) Parts of spark components not available in Internet Explorer

2018-10-26 Thread aastha (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

aastha updated SPARK-25853:
---
Attachment: executors_on_chrome.png
executors_error_ie.png
execuotrs_not_rendered_ie.png
dag_on_chrome.png
dag_not_rendered_ie.png
dag_error_ie.png

> Parts of spark components not available in Internet Explorer
> 
>
> Key: SPARK-25853
> URL: https://issues.apache.org/jira/browse/SPARK-25853
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.2.0, 2.3.2
>Reporter: aastha
>Priority: Major
> Fix For: 2.3.3
>
> Attachments: dag_error_ie.png, dag_not_rendered_ie.png, 
> dag_on_chrome.png, execuotrs_not_rendered_ie.png, executors_error_ie.png, 
> executors_on_chrome.png
>
>
> Spark UI has come limitations when working with Internet Explorer. The DAG 
> component as well as Executors page does not render, it works on Firefox and 
> Chrome. I have tested on recent Inter Explorer 11.483.15063.0. Since it works 
> on Chrome and Firefox their versions should not matter.
> For executors page, the root cause is that document.baseURI property is 
> undefined in Internet Explorer. When I debug by providing the property 
> myself, it shows up fine.
> For DAG component, developer tools haven't helped. 
> Attaching screenshots for Chrome and IE UI and debug console messages. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25853) Parts of spark components not available in Internet Explorer

2018-10-26 Thread aastha (JIRA)
aastha created SPARK-25853:
--

 Summary: Parts of spark components not available in Internet 
Explorer
 Key: SPARK-25853
 URL: https://issues.apache.org/jira/browse/SPARK-25853
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.2, 2.2.0
Reporter: aastha
 Fix For: 2.3.3


Spark UI has come limitations when working with Internet Explorer. The DAG 
component as well as Executors page does not render, it works on Firefox and 
Chrome. I have tested on recent Inter Explorer 11.483.15063.0. Since it works 
on Chrome and Firefox their versions should not matter.
For executors page, the root cause is that document.baseURI property is 
undefined in Internet Explorer. When I debug by providing the property myself, 
it shows up fine.
For DAG component, developer tools haven't helped. 
Attaching screenshots for Chrome and IE UI and debug console messages. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors

2018-10-26 Thread zuotingbing (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zuotingbing updated SPARK-25852:

Component/s: (was: Spark Core)
 Scheduler

> we should filter the workOffers of which freeCores>0 when make fake resource 
> offers on all executors
> 
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Priority: Minor
> Attachments: 2018-10-26_162822.png
>
>
> We should filter the workOffers of which freeCores=0 when make fake resource 
> offers on all executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25852:


Assignee: (was: Apache Spark)

> we should filter the workOffers of which freeCores>0 when make fake resource 
> offers on all executors
> 
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Priority: Minor
> Attachments: 2018-10-26_162822.png
>
>
> We should filter the workOffers of which freeCores=0 when make fake resource 
> offers on all executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors

2018-10-26 Thread zuotingbing (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zuotingbing updated SPARK-25852:

Attachment: 2018-10-26_162822.png

> we should filter the workOffers of which freeCores>0 when make fake resource 
> offers on all executors
> 
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Priority: Minor
> Attachments: 2018-10-26_162822.png
>
>
> We should filter the workOffers of which freeCores=0 when make fake resource 
> offers on all executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664948#comment-16664948
 ] 

Apache Spark commented on SPARK-25852:
--

User 'zuotingbing' has created a pull request for this issue:
https://github.com/apache/spark/pull/22849

> we should filter the workOffers of which freeCores>0 when make fake resource 
> offers on all executors
> 
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Priority: Minor
> Attachments: 2018-10-26_162822.png
>
>
> We should filter the workOffers of which freeCores=0 when make fake resource 
> offers on all executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25852:


Assignee: Apache Spark

> we should filter the workOffers of which freeCores>0 when make fake resource 
> offers on all executors
> 
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Assignee: Apache Spark
>Priority: Minor
> Attachments: 2018-10-26_162822.png
>
>
> We should filter the workOffers of which freeCores=0 when make fake resource 
> offers on all executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25852) we should filter the workOffers of which freeCores>0 when make fake resource offers on all executors

2018-10-26 Thread zuotingbing (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zuotingbing updated SPARK-25852:

Summary: we should filter the workOffers of which freeCores>0 when make 
fake resource offers on all executors  (was: we should filter the workOffers of 
which freeCores=0 when make fake resource offers on all executors)

> we should filter the workOffers of which freeCores>0 when make fake resource 
> offers on all executors
> 
>
> Key: SPARK-25852
> URL: https://issues.apache.org/jira/browse/SPARK-25852
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: zuotingbing
>Priority: Minor
>
> We should filter the workOffers of which freeCores=0 when make fake resource 
> offers on all executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25852) we should filter the workOffers of which freeCores=0 when make fake resource offers on all executors

2018-10-26 Thread zuotingbing (JIRA)
zuotingbing created SPARK-25852:
---

 Summary: we should filter the workOffers of which freeCores=0 when 
make fake resource offers on all executors
 Key: SPARK-25852
 URL: https://issues.apache.org/jira/browse/SPARK-25852
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: zuotingbing


We should filter the workOffers of which freeCores=0 when make fake resource 
offers on all executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25851) Fix deprecated API warning in SQLListener

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25851:


Assignee: (was: Apache Spark)

> Fix deprecated API warning in SQLListener
> -
>
> Key: SPARK-25851
> URL: https://issues.apache.org/jira/browse/SPARK-25851
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Trivial
>
> In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6.
> There are some deprecated API warnings in SQLListener.
> Create a trivial PR to fix them.
> ```
> [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
> [warn] 
> [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(objectType, objectType))
> [warn] 
> [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long])
> [warn] 
> [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(longType, longType))
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25851) Fix deprecated API warning in SQLListener

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664891#comment-16664891
 ] 

Apache Spark commented on SPARK-25851:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/22848

> Fix deprecated API warning in SQLListener
> -
>
> Key: SPARK-25851
> URL: https://issues.apache.org/jira/browse/SPARK-25851
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Trivial
>
> In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6.
> There are some deprecated API warnings in SQLListener.
> Create a trivial PR to fix them.
> ```
> [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
> [warn] 
> [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(objectType, objectType))
> [warn] 
> [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long])
> [warn] 
> [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(longType, longType))
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25851) Fix deprecated API warning in SQLListener

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25851:


Assignee: Apache Spark

> Fix deprecated API warning in SQLListener
> -
>
> Key: SPARK-25851
> URL: https://issues.apache.org/jira/browse/SPARK-25851
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Trivial
>
> In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6.
> There are some deprecated API warnings in SQLListener.
> Create a trivial PR to fix them.
> ```
> [warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
> [warn] 
> [warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(objectType, objectType))
> [warn] 
> [warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long])
> [warn] 
> [warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory 
> is deprecated: see corresponding Javadoc for more information.
> [warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
> Array(longType, longType))
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25851) Fix deprecated API warning in SQLListener

2018-10-26 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-25851:
--

 Summary: Fix deprecated API warning in SQLListener
 Key: SPARK-25851
 URL: https://issues.apache.org/jira/browse/SPARK-25851
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Gengliang Wang


In https://github.com/apache/spark/pull/21596, Jackson is upgraded to 2.9.6.
There are some deprecated API warnings in SQLListener.
Create a trivial PR to fix them.

```
[warn] SQLListener.scala:92: method uncheckedSimpleType in class TypeFactory is 
deprecated: see corresponding Javadoc for more information.
[warn] val objectType = typeFactory.uncheckedSimpleType(classOf[Object])
[warn] 
[warn] SQLListener.scala:93: method constructSimpleType in class TypeFactory is 
deprecated: see corresponding Javadoc for more information.
[warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
Array(objectType, objectType))
[warn] 
[warn] SQLListener.scala:97: method uncheckedSimpleType in class TypeFactory is 
deprecated: see corresponding Javadoc for more information.
[warn] val longType = typeFactory.uncheckedSimpleType(classOf[Long])
[warn] 
[warn] SQLListener.scala:98: method constructSimpleType in class TypeFactory is 
deprecated: see corresponding Javadoc for more information.
[warn] typeFactory.constructSimpleType(classOf[(_, _)], classOf[(_, _)], 
Array(longType, longType))
```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25850) Make the split threshold for the code generated method configurable

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664857#comment-16664857
 ] 

Apache Spark commented on SPARK-25850:
--

User 'yucai' has created a pull request for this issue:
https://github.com/apache/spark/pull/22847

> Make the split threshold for the code generated method configurable
> ---
>
> Key: SPARK-25850
> URL: https://issues.apache.org/jira/browse/SPARK-25850
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: yucai
>Priority: Major
>
> As per the discussion in 
> [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new 
> configuration spark.sql.codegen.methodSplitThreshold to make the split 
> threshold for the code generated method configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25850) Make the split threshold for the code generated method configurable

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25850:


Assignee: Apache Spark

> Make the split threshold for the code generated method configurable
> ---
>
> Key: SPARK-25850
> URL: https://issues.apache.org/jira/browse/SPARK-25850
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: yucai
>Assignee: Apache Spark
>Priority: Major
>
> As per the discussion in 
> [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new 
> configuration spark.sql.codegen.methodSplitThreshold to make the split 
> threshold for the code generated method configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25850) Make the split threshold for the code generated method configurable

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25850:


Assignee: (was: Apache Spark)

> Make the split threshold for the code generated method configurable
> ---
>
> Key: SPARK-25850
> URL: https://issues.apache.org/jira/browse/SPARK-25850
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: yucai
>Priority: Major
>
> As per the discussion in 
> [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new 
> configuration spark.sql.codegen.methodSplitThreshold to make the split 
> threshold for the code generated method configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25850) Make the split threshold for the code generated method configurable

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664854#comment-16664854
 ] 

Apache Spark commented on SPARK-25850:
--

User 'yucai' has created a pull request for this issue:
https://github.com/apache/spark/pull/22847

> Make the split threshold for the code generated method configurable
> ---
>
> Key: SPARK-25850
> URL: https://issues.apache.org/jira/browse/SPARK-25850
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: yucai
>Priority: Major
>
> As per the discussion in 
> [https://github.com/apache/spark/pull/22823/files#r228400706,] add a new 
> configuration spark.sql.codegen.methodSplitThreshold to make the split 
> threshold for the code generated method configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25850) Make the split threshold for the code generated method configurable

2018-10-26 Thread yucai (JIRA)
yucai created SPARK-25850:
-

 Summary: Make the split threshold for the code generated method 
configurable
 Key: SPARK-25850
 URL: https://issues.apache.org/jira/browse/SPARK-25850
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: yucai


As per the discussion in 
[https://github.com/apache/spark/pull/22823/files#r228400706,] add a new 
configuration spark.sql.codegen.methodSplitThreshold to make the split 
threshold for the code generated method configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25797) Views created via 2.1 cannot be read via 2.2+

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25797:


Assignee: (was: Apache Spark)

> Views created via 2.1 cannot be read via 2.2+
> -
>
> Key: SPARK-25797
> URL: https://issues.apache.org/jira/browse/SPARK-25797
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2
>Reporter: Chenxiao Mao
>Priority: Major
>
> We ran into this issue when we update our Spark from 2.1 to 2.3. Below's a 
> simple example to reproduce the issue.
> Create views via Spark 2.1
> {code:sql}
> create view v1 as
> select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1;
> {code}
> Query views via Spark 2.3
> {code:sql}
> select * from v1;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: 
> decimal(19,0) as it may truncate
> {code}
> After investigation, we found that this is because when a view is created via 
> Spark 2.1, the expanded text is saved instead of the original text. 
> Unfortunately, the blow expanded text is buggy.
> {code:sql}
> spark-sql> desc extended v1;
> c1 decimal(19,0) NULL
> Detailed Table Information
> Database default
> Table v1
> Type VIEW
> View Text SELECT `gen_attr_0` AS `c1` FROM (SELECT (CAST(CAST(1 AS 
> DECIMAL(18,0)) AS DECIMAL(19,0)) + CAST(CAST(1 AS DECIMAL(18,0)) AS 
> DECIMAL(19,0))) AS `gen_attr_0`) AS gen_subquery_0
> {code}
> We can see that c1 is decimal(19,0), however in the expanded text there is 
> decimal(19,0) + decimal(19,0) which results in decimal(20,0). Since Spark 
> 2.2, decimal(20,0) in query is not allowed to cast to view definition column 
> decimal(19,0). ([https://github.com/apache/spark/pull/16561])
> I further tested other decimal calculations. Only add/subtract has this issue.
> Create views via 2.1:
> {code:sql}
> create view v1 as
> select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1;
> create view v2 as
> select (cast(1 as decimal(18,0)) - cast(1 as decimal(18,0))) c1;
> create view v3 as
> select (cast(1 as decimal(18,0)) * cast(1 as decimal(18,0))) c1;
> create view v4 as
> select (cast(1 as decimal(18,0)) / cast(1 as decimal(18,0))) c1;
> create view v5 as
> select (cast(1 as decimal(18,0)) % cast(1 as decimal(18,0))) c1;
> create view v6 as
> select cast(1 as decimal(18,0)) c1
> union
> select cast(1 as decimal(19,0)) c1;
> {code}
> Query views via Spark 2.3
> {code:sql}
> select * from v1;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: 
> decimal(19,0) as it may truncate
> select * from v2;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3909: 
> decimal(19,0) as it may truncate
> select * from v3;
> 1
> select * from v4;
> 1
> select * from v5;
> 0
> select * from v6;
> 1
> {code}
> Views created via Spark 2.2+ don't have this issue because Spark 2.2+ does 
> not generate expanded text for view 
> (https://issues.apache.org/jira/browse/SPARK-18209).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25797) Views created via 2.1 cannot be read via 2.2+

2018-10-26 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25797:


Assignee: Apache Spark

> Views created via 2.1 cannot be read via 2.2+
> -
>
> Key: SPARK-25797
> URL: https://issues.apache.org/jira/browse/SPARK-25797
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2
>Reporter: Chenxiao Mao
>Assignee: Apache Spark
>Priority: Major
>
> We ran into this issue when we update our Spark from 2.1 to 2.3. Below's a 
> simple example to reproduce the issue.
> Create views via Spark 2.1
> {code:sql}
> create view v1 as
> select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1;
> {code}
> Query views via Spark 2.3
> {code:sql}
> select * from v1;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: 
> decimal(19,0) as it may truncate
> {code}
> After investigation, we found that this is because when a view is created via 
> Spark 2.1, the expanded text is saved instead of the original text. 
> Unfortunately, the blow expanded text is buggy.
> {code:sql}
> spark-sql> desc extended v1;
> c1 decimal(19,0) NULL
> Detailed Table Information
> Database default
> Table v1
> Type VIEW
> View Text SELECT `gen_attr_0` AS `c1` FROM (SELECT (CAST(CAST(1 AS 
> DECIMAL(18,0)) AS DECIMAL(19,0)) + CAST(CAST(1 AS DECIMAL(18,0)) AS 
> DECIMAL(19,0))) AS `gen_attr_0`) AS gen_subquery_0
> {code}
> We can see that c1 is decimal(19,0), however in the expanded text there is 
> decimal(19,0) + decimal(19,0) which results in decimal(20,0). Since Spark 
> 2.2, decimal(20,0) in query is not allowed to cast to view definition column 
> decimal(19,0). ([https://github.com/apache/spark/pull/16561])
> I further tested other decimal calculations. Only add/subtract has this issue.
> Create views via 2.1:
> {code:sql}
> create view v1 as
> select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1;
> create view v2 as
> select (cast(1 as decimal(18,0)) - cast(1 as decimal(18,0))) c1;
> create view v3 as
> select (cast(1 as decimal(18,0)) * cast(1 as decimal(18,0))) c1;
> create view v4 as
> select (cast(1 as decimal(18,0)) / cast(1 as decimal(18,0))) c1;
> create view v5 as
> select (cast(1 as decimal(18,0)) % cast(1 as decimal(18,0))) c1;
> create view v6 as
> select cast(1 as decimal(18,0)) c1
> union
> select cast(1 as decimal(19,0)) c1;
> {code}
> Query views via Spark 2.3
> {code:sql}
> select * from v1;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: 
> decimal(19,0) as it may truncate
> select * from v2;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3909: 
> decimal(19,0) as it may truncate
> select * from v3;
> 1
> select * from v4;
> 1
> select * from v5;
> 0
> select * from v6;
> 1
> {code}
> Views created via Spark 2.2+ don't have this issue because Spark 2.2+ does 
> not generate expanded text for view 
> (https://issues.apache.org/jira/browse/SPARK-18209).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25797) Views created via 2.1 cannot be read via 2.2+

2018-10-26 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664798#comment-16664798
 ] 

Apache Spark commented on SPARK-25797:
--

User 'seancxmao' has created a pull request for this issue:
https://github.com/apache/spark/pull/22846

> Views created via 2.1 cannot be read via 2.2+
> -
>
> Key: SPARK-25797
> URL: https://issues.apache.org/jira/browse/SPARK-25797
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2
>Reporter: Chenxiao Mao
>Priority: Major
>
> We ran into this issue when we update our Spark from 2.1 to 2.3. Below's a 
> simple example to reproduce the issue.
> Create views via Spark 2.1
> {code:sql}
> create view v1 as
> select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1;
> {code}
> Query views via Spark 2.3
> {code:sql}
> select * from v1;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: 
> decimal(19,0) as it may truncate
> {code}
> After investigation, we found that this is because when a view is created via 
> Spark 2.1, the expanded text is saved instead of the original text. 
> Unfortunately, the blow expanded text is buggy.
> {code:sql}
> spark-sql> desc extended v1;
> c1 decimal(19,0) NULL
> Detailed Table Information
> Database default
> Table v1
> Type VIEW
> View Text SELECT `gen_attr_0` AS `c1` FROM (SELECT (CAST(CAST(1 AS 
> DECIMAL(18,0)) AS DECIMAL(19,0)) + CAST(CAST(1 AS DECIMAL(18,0)) AS 
> DECIMAL(19,0))) AS `gen_attr_0`) AS gen_subquery_0
> {code}
> We can see that c1 is decimal(19,0), however in the expanded text there is 
> decimal(19,0) + decimal(19,0) which results in decimal(20,0). Since Spark 
> 2.2, decimal(20,0) in query is not allowed to cast to view definition column 
> decimal(19,0). ([https://github.com/apache/spark/pull/16561])
> I further tested other decimal calculations. Only add/subtract has this issue.
> Create views via 2.1:
> {code:sql}
> create view v1 as
> select (cast(1 as decimal(18,0)) + cast(1 as decimal(18,0))) c1;
> create view v2 as
> select (cast(1 as decimal(18,0)) - cast(1 as decimal(18,0))) c1;
> create view v3 as
> select (cast(1 as decimal(18,0)) * cast(1 as decimal(18,0))) c1;
> create view v4 as
> select (cast(1 as decimal(18,0)) / cast(1 as decimal(18,0))) c1;
> create view v5 as
> select (cast(1 as decimal(18,0)) % cast(1 as decimal(18,0))) c1;
> create view v6 as
> select cast(1 as decimal(18,0)) c1
> union
> select cast(1 as decimal(19,0)) c1;
> {code}
> Query views via Spark 2.3
> {code:sql}
> select * from v1;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3906: 
> decimal(19,0) as it may truncate
> select * from v2;
> Error in query: Cannot up cast `c1` from decimal(20,0) to c1#3909: 
> decimal(19,0) as it may truncate
> select * from v3;
> 1
> select * from v4;
> 1
> select * from v5;
> 0
> select * from v6;
> 1
> {code}
> Views created via Spark 2.2+ don't have this issue because Spark 2.2+ does 
> not generate expanded text for view 
> (https://issues.apache.org/jira/browse/SPARK-18209).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23084) Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark

2018-10-26 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-23084.
-
   Resolution: Won't Fix
Fix Version/s: (was: 2.4.0)

This was merged but then reverted due to 
https://issues.apache.org/jira/browse/SPARK-25842

 

 

> Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark 
> ---
>
> Key: SPARK-23084
> URL: https://issues.apache.org/jira/browse/SPARK-23084
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Huaxin Gao
>Priority: Major
>
> Add the new APIs (introduced by https://github.com/apache/spark/pull/18814) 
> to PySpark. Also update the rangeBetween API
> {noformat}
> /**
>  * Window function: returns the special frame boundary that represents the 
> first row in the
>  * window partition.
>  *
>  * @group window_funcs
>  * @since 2.3.0
>  */
>  def unboundedPreceding(): Column = Column(UnboundedPreceding)
> /**
>  * Window function: returns the special frame boundary that represents the 
> last row in the
>  * window partition.
>  *
>  * @group window_funcs
>  * @since 2.3.0
>  */
>  def unboundedFollowing(): Column = Column(UnboundedFollowing)
> /**
>  * Window function: returns the special frame boundary that represents the 
> current row in the
>  * window partition.
>  *
>  * @group window_funcs
>  * @since 2.3.0
>  */
>  def currentRow(): Column = Column(CurrentRow)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-23084) Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark

2018-10-26 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin reopened SPARK-23084:
-

> Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark 
> ---
>
> Key: SPARK-23084
> URL: https://issues.apache.org/jira/browse/SPARK-23084
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Huaxin Gao
>Priority: Major
>
> Add the new APIs (introduced by https://github.com/apache/spark/pull/18814) 
> to PySpark. Also update the rangeBetween API
> {noformat}
> /**
>  * Window function: returns the special frame boundary that represents the 
> first row in the
>  * window partition.
>  *
>  * @group window_funcs
>  * @since 2.3.0
>  */
>  def unboundedPreceding(): Column = Column(UnboundedPreceding)
> /**
>  * Window function: returns the special frame boundary that represents the 
> last row in the
>  * window partition.
>  *
>  * @group window_funcs
>  * @since 2.3.0
>  */
>  def unboundedFollowing(): Column = Column(UnboundedFollowing)
> /**
>  * Window function: returns the special frame boundary that represents the 
> current row in the
>  * window partition.
>  *
>  * @group window_funcs
>  * @since 2.3.0
>  */
>  def currentRow(): Column = Column(CurrentRow)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org