date:20190718

[jira] [Assigned] (SPARK-28285) Convert and port 'outer-join.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28285:


Assignee: Huaxin Gao  (was: Apache Spark)

> Convert and port 'outer-join.sql' into UDF test base
> 
>
> Key: SPARK-28285
> URL: https://issues.apache.org/jira/browse/SPARK-28285
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28277) Convert and port 'except.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28277.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25101
[https://github.com/apache/spark/pull/25101]

> Convert and port 'except.sql' into UDF test base
> 
>
> Key: SPARK-28277
> URL: https://issues.apache.org/jira/browse/SPARK-28277
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28277) Convert and port 'except.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28277:


Assignee: Huaxin Gao

> Convert and port 'except.sql' into UDF test base
> 
>
> Key: SPARK-28277
> URL: https://issues.apache.org/jira/browse/SPARK-28277
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24632) Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers for persistence

2019-07-18 Thread Mathew Wicks (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888495#comment-16888495
 ] 

Mathew Wicks commented on SPARK-24632:
--

I have an elegant solution for this:

You can include a separate Python package which mirrors the class address for 
the java objects you wrap. For example, in the PySpark API for XGBoost I did 
created the following package for objects under 
*ml.dmlc.xgboost4j.scala.spark._*
{code:java}
ml/__init__.py
ml/dmlc/__init__.py
ml/dmlc/xgboost4j/__init__.py
ml/dmlc/xgboost4j/scala/__init__.py
ml/dmlc/xgboost4j/scala/spark/__init__.py
{code}
With all __init__.py empty except the final one, which contained:
{code:java}
import sys
from sparkxgb import xgboost

# Allows Pipeline()/PipelineModel() with XGBoost stages to be loaded from disk.
# Needed because they try to import Python objects from their Java location.
sys.modules['ml.dmlc.xgboost4j.scala.spark'] = xgboost
{code}
Where my actual Python wrapper classes are under *sparkxgb.xgboost*.

 

This works because PySpark will try import from the Java address of the class, 
even though it's in Python.

 

For more context: can find [the initial PR 
here|https://github.com/dmlc/xgboost/pull/4656].

> Allow 3rd-party libraries to use pyspark.ml abstractions for Java wrappers 
> for persistence
> --
>
> Key: SPARK-24632
> URL: https://issues.apache.org/jira/browse/SPARK-24632
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> This is a follow-up for [SPARK-17025], which allowed users to implement 
> Python PipelineStages in 3rd-party libraries, include them in Pipelines, and 
> use Pipeline persistence.  This task is to make it easier for 3rd-party 
> libraries to have PipelineStages written in Java and then to use pyspark.ml 
> abstractions to create wrappers around those Java classes.  This is currently 
> possible, except that users hit bugs around persistence.
> I spent a bit thinking about this and wrote up thoughts and a proposal in the 
> doc linked below.  Summary of proposal:
> Require that 3rd-party libraries with Java classes with Python wrappers 
> implement a trait which provides the corresponding Python classpath in some 
> field:
> {code}
> trait PythonWrappable {
>   def pythonClassPath: String = …
> }
> MyJavaType extends PythonWrappable
> {code}
> This will not be required for MLlib wrappers, which we can handle specially.
> One issue for this task will be that we may have trouble writing unit tests.  
> They would ideally test a Java class + Python wrapper class pair sitting 
> outside of pyspark.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28436) Throw better exception when datasource's schema is not equal to user-specific shema

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28436:
-
Summary: Throw better exception when datasource's schema is not equal to 
user-specific shema  (was: [SQL] Throw better exception when datasource's 
schema is not equal to user-specific shema)

> Throw better exception when datasource's schema is not equal to user-specific 
> shema
> ---
>
> Key: SPARK-28436
> URL: https://issues.apache.org/jira/browse/SPARK-28436
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.3
>Reporter: ShuMing Li
>Priority: Minor
>
> When this exception is thrown, users cannot find what's the difference 
> between datasource's original schema and user-specific schema, and maybe very 
> confused when meet the exception below.
> {code:java}
> org.apache.spark.sql.AnalysisException: org.apache.spark.odps.datasource does 
> not allow user-specified schemas.
> at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:347)
> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3270)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3269)
> at org.apache.spark.sql.Dataset.(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:653)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:714)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28438) Ignore metadata's(comments) difference when comparing datasource's schema and user-specific schema

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28438.
--
Resolution: Duplicate

> Ignore metadata's(comments) difference when comparing datasource's schema and 
> user-specific schema
> --
>
> Key: SPARK-28438
> URL: https://issues.apache.org/jira/browse/SPARK-28438
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: ShuMing Li
>Priority: Minor
>
> When users register a datasource table to Spark,  Spark only support complete 
> schema equality of datasource's origin schema  and user-specific's schema now.
> However datasource's origin schema may be little different with 
> user-specific's schema: the diff maybe `column's comment` or other metadata 
> info.
> Can we ignore column's comment or metadata info when comparing?
> {code:java}
> // DataSource.scala
> case (dataSource: RelationProvider, Some(schema)) =>
> val baseRelation =
> dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
> if (baseRelation.schema != schema) {
> throw new AnalysisException(s"$className does not allow user-specified 
> schemas, " +
> s"source schema: ${baseRelation.schema}, user-specific schema: ${schema}")
> }
> // StructType.scala
> override def equals(that: Any): Boolean = {
> that match
> { case StructType(otherFields) => java.util.Arrays.equals( 
> fields.asInstanceOf[Array[AnyRef]], otherFields.asInstanceOf[Array[AnyRef]]) 
> case _ => false }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28438) Ignore metadata's(comments) difference when comparing datasource's schema and user-specific schema

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28438:
-
Summary: Ignore metadata's(comments) difference when comparing datasource's 
schema and user-specific schema  (was: [SQL] Ignore metadata's(comments) 
difference when comparing datasource's schema and user-specific schema)

> Ignore metadata's(comments) difference when comparing datasource's schema and 
> user-specific schema
> --
>
> Key: SPARK-28438
> URL: https://issues.apache.org/jira/browse/SPARK-28438
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: ShuMing Li
>Priority: Minor
>
> When users register a datasource table to Spark,  Spark only support complete 
> schema equality of datasource's origin schema  and user-specific's schema now.
> However datasource's origin schema may be little different with 
> user-specific's schema: the diff maybe `column's comment` or other metadata 
> info.
> Can we ignore column's comment or metadata info when comparing?
> {code:java}
> // DataSource.scala
> case (dataSource: RelationProvider, Some(schema)) =>
> val baseRelation =
> dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
> if (baseRelation.schema != schema) {
> throw new AnalysisException(s"$className does not allow user-specified 
> schemas, " +
> s"source schema: ${baseRelation.schema}, user-specific schema: ${schema}")
> }
> // StructType.scala
> override def equals(that: Any): Boolean = {
> that match
> { case StructType(otherFields) => java.util.Arrays.equals( 
> fields.asInstanceOf[Array[AnyRef]], otherFields.asInstanceOf[Array[AnyRef]]) 
> case _ => false }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28442) Potentially persist data without leadership

2019-07-18 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888478#comment-16888478
 ] 

Hyukjin Kwon commented on SPARK-28442:
--

[~Tison], please provide reproducer and/or console print out if you faced this 
problem. It's difficult to see what's an issue without reading the current JIRA.

> Potentially persist data without leadership
> ---
>
> Key: SPARK-28442
> URL: https://issues.apache.org/jira/browse/SPARK-28442
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.3
>Reporter: TisonKun
>Priority: Major
>
> Spark Master could potentially persist data via 
> {{ZooKeeperPersistenceEngine}} even if it is not the leader. See the 
> execution order below.
> 1. master-1 became the leader.
> 2. master-1 received message and wanted to addApplication(or addWorker)
> 3. master-1 stuck because of a full gc
> 4. master-1 lost leadership on zk. master-2 became the leader.
> 5. master-1 received {{RevokedLeadership}} message but the message was 
> pending.
> 6. master-1 finished persisting data.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28285) Convert and port 'outer-join.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28285.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25103
[https://github.com/apache/spark/pull/25103]

> Convert and port 'outer-join.sql' into UDF test base
> 
>
> Key: SPARK-28285
> URL: https://issues.apache.org/jira/browse/SPARK-28285
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28443) Spark sql add exception when create field type NullType

2019-07-18 Thread ulysses you (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-28443:

Environment: (was:  

 )

> Spark sql add exception when create field type NullType 
> 
>
> Key: SPARK-28443
> URL: https://issues.apache.org/jira/browse/SPARK-28443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: ulysses you
>Priority: Major
>
> [28313|https://issues.apache.org/jira/browse/SPARK-28313]
> 28313 change a behavior that
> `Add rule to throw exception when catalog.create NullType StructField`
> So this pr is to discuss details
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28443) Spark sql add exception when create field type NullType

2019-07-18 Thread ulysses you (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-28443:

Description: 
[28313|https://issues.apache.org/jira/browse/SPARK-28313]

28313 change a behavior that

`Add rule to throw exception when catalog.create NullType StructField`

So this pr is to discuss details

 

  was:
28313

28313 change a behavior that

`Add rule to throw exception when catalog.create NullType StructField`

So this pr is to discuss details


> Spark sql add exception when create field type NullType 
> 
>
> Key: SPARK-28443
> URL: https://issues.apache.org/jira/browse/SPARK-28443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment:  
>  
>Reporter: ulysses you
>Priority: Major
>
> [28313|https://issues.apache.org/jira/browse/SPARK-28313]
> 28313 change a behavior that
> `Add rule to throw exception when catalog.create NullType StructField`
> So this pr is to discuss details
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28443) Spark sql add exception when create field type NullType

2019-07-18 Thread ulysses you (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-28443:

Environment: 
[28313|https://issues.apache.org/jira/browse/SPARK-28313]

28313 change a behavior that

`Add rule to throw exception when catalog.create NullType StructField`

So this pr is to discuss details

 

  was:
28313

28313 change a behavior that

`Add rule to throw exception when catalog.create NullType StructField`

So this pr is to discuss details


> Spark sql add exception when create field type NullType 
> 
>
> Key: SPARK-28443
> URL: https://issues.apache.org/jira/browse/SPARK-28443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: [28313|https://issues.apache.org/jira/browse/SPARK-28313]
> 28313 change a behavior that
> `Add rule to throw exception when catalog.create NullType StructField`
> So this pr is to discuss details
>  
>Reporter: ulysses you
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28443) Spark sql add exception when create field type NullType

2019-07-18 Thread ulysses you (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-28443:

Description: 
28313

28313 change a behavior that

`Add rule to throw exception when catalog.create NullType StructField`

So this pr is to discuss details

> Spark sql add exception when create field type NullType 
> 
>
> Key: SPARK-28443
> URL: https://issues.apache.org/jira/browse/SPARK-28443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment:  
>  
>Reporter: ulysses you
>Priority: Major
>
> 28313
> 28313 change a behavior that
> `Add rule to throw exception when catalog.create NullType StructField`
> So this pr is to discuss details



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28443) Spark sql add exception when create field type NullType

2019-07-18 Thread ulysses you (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-28443:

Environment: 
 

 

  was:
[28313|https://issues.apache.org/jira/browse/SPARK-28313]

28313 change a behavior that

`Add rule to throw exception when catalog.create NullType StructField`

So this pr is to discuss details

 


> Spark sql add exception when create field type NullType 
> 
>
> Key: SPARK-28443
> URL: https://issues.apache.org/jira/browse/SPARK-28443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment:  
>  
>Reporter: ulysses you
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28443) Spark sql add exception when create field type NullType

2019-07-18 Thread ulysses you (JIRA)

ulysses you created SPARK-28443:
---

 Summary: Spark sql add exception when create field type NullType 
 Key: SPARK-28443
 URL: https://issues.apache.org/jira/browse/SPARK-28443
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
 Environment: [28313|https://issues.apache.org/jira/browse/SPARK-28313]

28313 change a behavior when create table use NullType, so this pr is to 
discuss details
Reporter: ulysses you






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28443) Spark sql add exception when create field type NullType

2019-07-18 Thread ulysses you (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-28443:

Environment: 
28313

28313 change a behavior that

`Add rule to throw exception when catalog.create NullType StructField`

So this pr is to discuss details

  was:
[28313|https://issues.apache.org/jira/browse/SPARK-28313]

28313 change a behavior when create table use NullType, so this pr is to 
discuss details


> Spark sql add exception when create field type NullType 
> 
>
> Key: SPARK-28443
> URL: https://issues.apache.org/jira/browse/SPARK-28443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: 28313
> 28313 change a behavior that
> `Add rule to throw exception when catalog.create NullType StructField`
> So this pr is to discuss details
>Reporter: ulysses you
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28287) Convert and port 'udaf.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28287.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25194
[https://github.com/apache/spark/pull/25194]

> Convert and port 'udaf.sql' into UDF test base
> --
>
> Key: SPARK-28287
> URL: https://issues.apache.org/jira/browse/SPARK-28287
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Vinod KC
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28287) Convert and port 'udaf.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28287:


Assignee: Vinod KC

> Convert and port 'udaf.sql' into UDF test base
> --
>
> Key: SPARK-28287
> URL: https://issues.apache.org/jira/browse/SPARK-28287
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Vinod KC
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28442) Potentially persist data without leadership

2019-07-18 Thread TisonKun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated SPARK-28442:
-
Description: 
Spark Master could potentially persist data via {{ZooKeeperPersistenceEngine}} 
even if it is not the leader. See the execution order below.

1. master-1 became the leader.
2. master-1 received message and wanted to addApplication(or addWorker)
3. master-1 stuck because of a full gc
4. master-1 lost leadership on zk. master-2 became the leader.
5. master-1 received {{RevokedLeadership}} message but the message was pending.
6. master-1 finished persisting data.


  was:
Spark Master could potentially persist data via {{ZooKeeperPersistenceEngine}} 
even if it is not the leader. See the execution order below.

1. master-1 became the leader.
2. master-1 received message and wanted to addApplication(or addWorker)
3. master-1 stuck because of a full gc
4. master-1 lost leadership on zk. master-2 became the leader. master-1 
received {{RevokedLeadership}} message but the message was pending.
5. master-1 finished persisting data.



> Potentially persist data without leadership
> ---
>
> Key: SPARK-28442
> URL: https://issues.apache.org/jira/browse/SPARK-28442
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.3
>Reporter: TisonKun
>Priority: Major
>
> Spark Master could potentially persist data via 
> {{ZooKeeperPersistenceEngine}} even if it is not the leader. See the 
> execution order below.
> 1. master-1 became the leader.
> 2. master-1 received message and wanted to addApplication(or addWorker)
> 3. master-1 stuck because of a full gc
> 4. master-1 lost leadership on zk. master-2 became the leader.
> 5. master-1 received {{RevokedLeadership}} message but the message was 
> pending.
> 6. master-1 finished persisting data.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28442) Potentially persist data without leadership

2019-07-18 Thread TisonKun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated SPARK-28442:
-
Description: 
Spark Master could potentially persist data via {{ZooKeeperPersistenceEngine}} 
even if it is not the leader. See the execution order below.

1. master-1 became the leader.
2. master-1 received message and wanted to addApplication(or addWorker)
3. master-1 stuck because of a full gc
4. master-1 lost leadership on zk. master-2 became the leader. master-1 
received {{RevokedLeadership}} message but the message was pending.
5. master-1 finished persisting data.


  was:
Spark Master could potentially persist data via {{ZooKeeperPersistenceEngine}} 
even if it is not the leader. See the execution order below.

1. master-1 became the leader.
2. master-1 received message and wanted to addApplication(or addWorker)
3. master-1 stuck because of a full gc
4. master-1 lost leadership on zk, received {{RevokedLeadership}} message but 
it was pending.
5. master-1 finished persisting data.



> Potentially persist data without leadership
> ---
>
> Key: SPARK-28442
> URL: https://issues.apache.org/jira/browse/SPARK-28442
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.3
>Reporter: TisonKun
>Priority: Major
>
> Spark Master could potentially persist data via 
> {{ZooKeeperPersistenceEngine}} even if it is not the leader. See the 
> execution order below.
> 1. master-1 became the leader.
> 2. master-1 received message and wanted to addApplication(or addWorker)
> 3. master-1 stuck because of a full gc
> 4. master-1 lost leadership on zk. master-2 became the leader. master-1 
> received {{RevokedLeadership}} message but the message was pending.
> 5. master-1 finished persisting data.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28442) Potentially persist data without leadership

2019-07-18 Thread TisonKun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TisonKun updated SPARK-28442:
-
Component/s: (was: Documentation)
 Deploy

> Potentially persist data without leadership
> ---
>
> Key: SPARK-28442
> URL: https://issues.apache.org/jira/browse/SPARK-28442
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.3
>Reporter: TisonKun
>Priority: Major
>
> Spark Master could potentially persist data via 
> {{ZooKeeperPersistenceEngine}} even if it is not the leader. See the 
> execution order below.
> 1. master-1 became the leader.
> 2. master-1 received message and wanted to addApplication(or addWorker)
> 3. master-1 stuck because of a full gc
> 4. master-1 lost leadership on zk, received {{RevokedLeadership}} message but 
> it was pending.
> 5. master-1 finished persisting data.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28442) Potentially persist data without leadership

2019-07-18 Thread TisonKun (JIRA)

TisonKun created SPARK-28442:


 Summary: Potentially persist data without leadership
 Key: SPARK-28442
 URL: https://issues.apache.org/jira/browse/SPARK-28442
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.4.3
Reporter: TisonKun


Spark Master could potentially persist data via {{ZooKeeperPersistenceEngine}} 
even if it is not the leader. See the execution order below.

1. master-1 became the leader.
2. master-1 received message and wanted to addApplication(or addWorker)
3. master-1 stuck because of a full gc
4. master-1 lost leadership on zk, received {{RevokedLeadership}} message but 
it was pending.
5. master-1 finished persisting data.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28441) udf(max(udf(column))) throws java.lang.UnsupportedOperationException: Cannot evaluate expression: udf(null)

2019-07-18 Thread Huaxin Gao (JIRA)

Huaxin Gao created SPARK-28441:
--

 Summary: udf(max(udf(column))) throws 
java.lang.UnsupportedOperationException: Cannot evaluate expression: udf(null)
 Key: SPARK-28441
 URL: https://issues.apache.org/jira/browse/SPARK-28441
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 3.0.0
Reporter: Huaxin Gao


I found this when doing https://issues.apache.org/jira/browse/SPARK-28277

 
{code:java}
>>> @pandas_udf("string", PandasUDFType.SCALAR)
... def noop(x):
...     return x.apply(str)
... 
>>> spark.udf.register("udf", noop)

>>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t1 as select * from values 
>>> (\"one\", 1), (\"two\", 2),(\"three\", 3),(\"one\", NULL) as t1(k, v)")
DataFrame[]
>>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t2 as select * from values 
>>> (\"one\", 1), (\"two\", 22),(\"one\", 5),(\"one\", NULL), (NULL, 5) as 
>>> t2(k, v)")
DataFrame[]
>>> spark.sql("SELECT t1.k FROM t1 WHERE  t1.v <= (SELECT   udf(max(udf(t2.v))) 
>>> FROM     t2 WHERE    udf(t2.k) = udf(t1.k))").show()
py4j.protocol.Py4JJavaError: An error occurred while calling o65.showString.
: java.lang.UnsupportedOperationException: Cannot evaluate expression: udf(null)
 at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:296)
 at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:295)
 at org.apache.spark.sql.catalyst.expressions.PythonUDF.eval(PythonUDF.scala:52)
{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28416) Use java.time API in timestampAddInterval

2019-07-18 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28416.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25173
[https://github.com/apache/spark/pull/25173]

> Use java.time API in timestampAddInterval
> -
>
> Key: SPARK-28416
> URL: https://issues.apache.org/jira/browse/SPARK-28416
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> Implement the timestampAddInterval method of DateTimeUtils  by using the 
> plusMonths() and plus() method of ZonedDateTime of Java 8 time API. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28416) Use java.time API in timestampAddInterval

2019-07-18 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-28416:
-

Assignee: Maxim Gekk

> Use java.time API in timestampAddInterval
> -
>
> Key: SPARK-28416
> URL: https://issues.apache.org/jira/browse/SPARK-28416
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> Implement the timestampAddInterval method of DateTimeUtils  by using the 
> plusMonths() and plus() method of ZonedDateTime of Java 8 time API. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28312) Add numeric.sql

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28312:
-

Assignee: Yuming Wang

> Add numeric.sql
> ---
>
> Key: SPARK-28312
> URL: https://issues.apache.org/jira/browse/SPARK-28312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/numeric.sql.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28312) Add numeric.sql

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28312.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25092
[https://github.com/apache/spark/pull/25092]

> Add numeric.sql
> ---
>
> Key: SPARK-28312
> URL: https://issues.apache.org/jira/browse/SPARK-28312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/numeric.sql.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28432) Date/Time Functions: make_date/make_timestamp

2019-07-18 Thread Maxim Gekk (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888324#comment-16888324
 ] 

Maxim Gekk commented on SPARK-28432:


I am working on this

> Date/Time Functions: make_date/make_timestamp
> -
>
> Key: SPARK-28432
> URL: https://issues.apache.org/jira/browse/SPARK-28432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
> }}{{int}}{{)}}|{{date}}|Create date from year, month and day 
> fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
> |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
> _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
> precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
> minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
> 23.5)}}|{{2013-07-15 08:15:23.5}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28430) Some stage table rows render wrong number of columns if tasks are missing metrics

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28430.
---
   Resolution: Fixed
Fix Version/s: 2.4.4
   2.3.4
   3.0.0

Issue resolved by pull request 25183
[https://github.com/apache/spark/pull/25183]

> Some stage table rows render wrong number of columns if tasks are missing 
> metrics 
> --
>
> Key: SPARK-28430
> URL: https://issues.apache.org/jira/browse/SPARK-28430
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
> Fix For: 3.0.0, 2.3.4, 2.4.4
>
> Attachments: ui-screenshot.png
>
>
> The Spark UI's stages table renders too few columns for some tasks if a 
> subset of the tasks are missing their metrics. This is due to an 
> inconsistency in how we render certain columns: some columns gracefully 
> handle this case, but others do not. See attached screenshot below
>  !ui-screenshot.png! 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28439) pyspark.sql.functions.array_repeat should support Column as count argument

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28439:
-

Assignee: Maciej Szymkiewicz

> pyspark.sql.functions.array_repeat should support Column as count argument
> --
>
> Key: SPARK-28439
> URL: https://issues.apache.org/jira/browse/SPARK-28439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
>
> In Scala, Spark supports 
> (https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3777)
>  
> {code:java}
> (Column, Column) => Column
> {code}
> variant of array_repeat, however PySpark doesn't
> {code:java}
> >>> import pyspark   
> >>> from pyspark.sql import functions as f
> >>> pyspark.__version__
> '3.0.0.dev0'
>  
> >>> f.array_repeat(f.col("foo"), f.col("bar"))
> ...
> TypeError: Column is not iterable
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28439) pyspark.sql.functions.array_repeat should support Column as count argument

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28439.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25193
[https://github.com/apache/spark/pull/25193]

> pyspark.sql.functions.array_repeat should support Column as count argument
> --
>
> Key: SPARK-28439
> URL: https://issues.apache.org/jira/browse/SPARK-28439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 3.0.0
>
>
> In Scala, Spark supports 
> (https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3777)
>  
> {code:java}
> (Column, Column) => Column
> {code}
> variant of array_repeat, however PySpark doesn't
> {code:java}
> >>> import pyspark   
> >>> from pyspark.sql import functions as f
> >>> pyspark.__version__
> '3.0.0.dev0'
>  
> >>> f.array_repeat(f.col("foo"), f.col("bar"))
> ...
> TypeError: Column is not iterable
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28427) Support more Postgres JSON functions

2019-07-18 Thread Maxim Gekk (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888258#comment-16888258
 ] 

Maxim Gekk commented on SPARK-28427:


Probably, we can switch the flag spark.sql.legacy.sizeOfNull, or even remove it 
in Spark 3.0?  

> Support more Postgres JSON functions
> 
>
> Key: SPARK-28427
> URL: https://issues.apache.org/jira/browse/SPARK-28427
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Priority: Major
>
> Postgres features a number of JSON functions that are missing in Spark: 
> https://www.postgresql.org/docs/9.3/functions-json.html
> Redshift's JSON functions 
> (https://docs.aws.amazon.com/redshift/latest/dg/json-functions.html) have 
> partial overlap with the Postgres list.
> Some of these functions can be expressed in terms of compositions of existing 
> Spark functions. For example, I think that {{json_array_length}} can be 
> expressed with {{cardinality}} and {{from_json}}, but there's a caveat 
> related to legacy Hive compatibility (see the demo notebook at 
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5796212617691211/45530874214710/4901752417050771/latest.html
>  for more details).
> I'm filing this ticket so that we can triage the list of Postgres JSON 
> features and decide which ones make sense to support in Spark. After we've 
> done that, we can create individual tickets for specific functions and 
> features.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28382) Array Functions: unnest

2019-07-18 Thread Maxim Gekk (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888248#comment-16888248
 ] 

Maxim Gekk commented on SPARK-28382:


Is it just explode()?

> Array Functions: unnest
> ---
>
> Key: SPARK-28382
> URL: https://issues.apache.org/jira/browse/SPARK-28382
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{unnest}}({{anyarray}})|set of  anyelement|expand an array to a set of 
> rows|unnest(ARRAY[1,2])|1
> 2
> (2 rows)|
>  
> https://www.postgresql.org/docs/11/functions-array.html
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14543) SQL/Hive insertInto has unexpected results

2019-07-18 Thread Alexander Tronchin-James (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888246#comment-16888246
 ] 

Alexander Tronchin-James commented on SPARK-14543:
--

OK, thanks!

> SQL/Hive insertInto has unexpected results
> --
>
> Key: SPARK-14543
> URL: https://issues.apache.org/jira/browse/SPARK-14543
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>
> *Updated description*
> There should be an option to match input data to output columns by name. The 
> API allows operations on tables, which hide the column resolution problem. 
> It's easy to copy from one table to another without listing the columns, and 
> in the API it is common to work with columns by name rather than by position. 
> I think the API should add a way to match columns by name, which is closer to 
> what users expect. I propose adding something like this:
> {code}
> CREATE TABLE src (id: bigint, count: int, total: bigint)
> CREATE TABLE dst (id: bigint, total: bigint, count: int)
> sqlContext.table("src").write.byName.insertInto("dst")
> {code}
> *Original description*
> The Hive write path adds a pre-insertion cast (projection) to reconcile 
> incoming data columns with the outgoing table schema. Columns are matched by 
> position and casts are inserted to reconcile the two column schemas.
> When columns aren't correctly aligned, this causes unexpected results. I ran 
> into this by not using a correct {{partitionBy}} call (addressed by 
> SPARK-14459), which caused an error message that an int could not be cast to 
> an array. However, if the columns are vaguely compatible, for example string 
> and float, then no error or warning is produced and data is written to the 
> wrong columns using unexpected casts (string -> bigint -> float).
> A real-world use case that will hit this is when a table definition changes 
> by adding a column in the middle of a table. Spark SQL statements that copied 
> from that table to a destination table will then map the columns differently 
> but insert casts that mask the problem. The last column's data will be 
> dropped without a reliable warning for the user.
> This highlights a few problems:
> * Too many or too few incoming data columns should cause an AnalysisException 
> to be thrown
> * Only "safe" casts should be inserted automatically, like int -> long, using 
> UpCast
> * Pre-insertion casts currently ignore extra columns by using zip
> * The pre-insertion cast logic differs between Hive's MetastoreRelation and 
> LogicalRelation
> Also, I think there should be an option to match input data to output columns 
> by name. The API allows operations on tables, which hide the column 
> resolution problem. It's easy to copy from one table to another without 
> listing the columns, and in the API it is common to work with columns by name 
> rather than by position. I think the API should add a way to match columns by 
> name, which is closer to what users expect. I propose adding something like 
> this:
> {code}
> CREATE TABLE src (id: bigint, count: int, total: bigint)
> CREATE TABLE dst (id: bigint, total: bigint, count: int)
> sqlContext.table("src").write.byName.insertInto("dst")
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28440) Use TestingUtils to compare floating point values

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28440:
--
Component/s: (was: ML)
 MLlib

> Use TestingUtils to compare floating point values
> -
>
> Key: SPARK-28440
> URL: https://issues.apache.org/jira/browse/SPARK-28440
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28440) Use TestingUtils to compare floating point values

2019-07-18 Thread Dongjoon Hyun (JIRA)

Dongjoon Hyun created SPARK-28440:
-

 Summary: Use TestingUtils to compare floating point values
 Key: SPARK-28440
 URL: https://issues.apache.org/jira/browse/SPARK-28440
 Project: Spark
  Issue Type: Improvement
  Components: ML, Tests
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14543) SQL/Hive insertInto has unexpected results

2019-07-18 Thread Ryan Blue (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888222#comment-16888222
 ] 

Ryan Blue commented on SPARK-14543:
---

{{byName}} was never added to Apache Spark. The change was rejected, so it is 
only available in Netflix's Spark branch. I resolved this with "later" because 
we are including by-name resolution in the DSv2 work. The replacement for 
{{DataFrameWriter}} will default to name-based resolution.

> SQL/Hive insertInto has unexpected results
> --
>
> Key: SPARK-14543
> URL: https://issues.apache.org/jira/browse/SPARK-14543
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>
> *Updated description*
> There should be an option to match input data to output columns by name. The 
> API allows operations on tables, which hide the column resolution problem. 
> It's easy to copy from one table to another without listing the columns, and 
> in the API it is common to work with columns by name rather than by position. 
> I think the API should add a way to match columns by name, which is closer to 
> what users expect. I propose adding something like this:
> {code}
> CREATE TABLE src (id: bigint, count: int, total: bigint)
> CREATE TABLE dst (id: bigint, total: bigint, count: int)
> sqlContext.table("src").write.byName.insertInto("dst")
> {code}
> *Original description*
> The Hive write path adds a pre-insertion cast (projection) to reconcile 
> incoming data columns with the outgoing table schema. Columns are matched by 
> position and casts are inserted to reconcile the two column schemas.
> When columns aren't correctly aligned, this causes unexpected results. I ran 
> into this by not using a correct {{partitionBy}} call (addressed by 
> SPARK-14459), which caused an error message that an int could not be cast to 
> an array. However, if the columns are vaguely compatible, for example string 
> and float, then no error or warning is produced and data is written to the 
> wrong columns using unexpected casts (string -> bigint -> float).
> A real-world use case that will hit this is when a table definition changes 
> by adding a column in the middle of a table. Spark SQL statements that copied 
> from that table to a destination table will then map the columns differently 
> but insert casts that mask the problem. The last column's data will be 
> dropped without a reliable warning for the user.
> This highlights a few problems:
> * Too many or too few incoming data columns should cause an AnalysisException 
> to be thrown
> * Only "safe" casts should be inserted automatically, like int -> long, using 
> UpCast
> * Pre-insertion casts currently ignore extra columns by using zip
> * The pre-insertion cast logic differs between Hive's MetastoreRelation and 
> LogicalRelation
> Also, I think there should be an option to match input data to output columns 
> by name. The API allows operations on tables, which hide the column 
> resolution problem. It's easy to copy from one table to another without 
> listing the columns, and in the API it is common to work with columns by name 
> rather than by position. I think the API should add a way to match columns by 
> name, which is closer to what users expect. I propose adding something like 
> this:
> {code}
> CREATE TABLE src (id: bigint, count: int, total: bigint)
> CREATE TABLE dst (id: bigint, total: bigint, count: int)
> sqlContext.table("src").write.byName.insertInto("dst")
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14948) Exception when joining DataFrames derived form the same DataFrame

2019-07-18 Thread Wenchen Fan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888146#comment-16888146
 ] 

Wenchen Fan commented on SPARK-14948:
-

There is an ongoing effort to detect this case and fail instead of fixing it: 
https://github.com/apache/spark/pull/25107

> Exception when joining DataFrames derived form the same DataFrame
> -
>
> Key: SPARK-14948
> URL: https://issues.apache.org/jira/browse/SPARK-14948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Saurabh Santhosh
>Priority: Major
>
> h2. Spark Analyser is throwing the following exception in a specific scenario 
> :
> h2. Exception :
> org.apache.spark.sql.AnalysisException: resolved attribute(s) F1#3 missing 
> from asd#5,F2#4,F1#6,F2#7 in operator !Project [asd#5,F1#3];
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> h2. Code :
> {code:title=SparkClient.java|borderStyle=solid}
> StructField[] fields = new StructField[2];
> fields[0] = new StructField("F1", DataTypes.StringType, true, 
> Metadata.empty());
> fields[1] = new StructField("F2", DataTypes.StringType, true, 
> Metadata.empty());
> JavaRDD rdd =
> 
> sparkClient.getJavaSparkContext().parallelize(Arrays.asList(RowFactory.create("a",
>  "b")));
> DataFrame df = sparkClient.getSparkHiveContext().createDataFrame(rdd, new 
> StructType(fields));
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t1");
> DataFrame aliasedDf = sparkClient.getSparkHiveContext().sql("select F1 as 
> asd, F2 from t1");
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(aliasedDf, 
> "t2");
> sparkClient.getSparkHiveContext().registerDataFrameAsTable(df, "t3");
> 
> DataFrame join = aliasedDf.join(df, 
> aliasedDf.col("F2").equalTo(df.col("F2")), "inner");
> DataFrame select = join.select(aliasedDf.col("asd"), df.col("F1"));
> select.collect();
> {code}
> h2. Observations :
> * This issue is related to the Data Type of Fields of the initial Data 
> Frame.(If the Data Type is not String, it will work.)
> * It works fine if the data frame is registered as a temporary table and an 
> sql (select a.asd,b.F1 from t2 a inner join t3 b on a.F2=b.F2) is written.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27416) UnsafeMapData & UnsafeArrayData Kryo serialization breaks when two machines have different Oops size

2019-07-18 Thread Wenchen Fan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888130#comment-16888130
 ] 

Wenchen Fan commented on SPARK-27416:
-

yea let's backport it!

> UnsafeMapData & UnsafeArrayData Kryo serialization breaks when two machines 
> have different Oops size
> 
>
> Key: SPARK-27416
> URL: https://issues.apache.org/jira/browse/SPARK-27416
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: peng bo
>Assignee: peng bo
>Priority: Major
> Fix For: 3.0.0
>
>
> Actually this's follow up for 
> https://issues.apache.org/jira/browse/SPARK-27406, 
> https://issues.apache.org/jira/browse/SPARK-10914
> This issue is to fix the UnsafeMapData & UnsafeArrayData Kryo serialization 
> issue when two machines have different Oops size.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28388) Port select_implicit.sql

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28388.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25152
[https://github.com/apache/spark/pull/25152]

> Port select_implicit.sql
> 
>
> Key: SPARK-28388
> URL: https://issues.apache.org/jira/browse/SPARK-28388
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/select_implicit.sql.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28388) Port select_implicit.sql

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28388:
-

Assignee: Yuming Wang

> Port select_implicit.sql
> 
>
> Key: SPARK-28388
> URL: https://issues.apache.org/jira/browse/SPARK-28388
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/select_implicit.sql.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28138) Add timestamp.sql

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28138:
-

Assignee: Yuming Wang

> Add timestamp.sql
> -
>
> Key: SPARK-28138
> URL: https://issues.apache.org/jira/browse/SPARK-28138
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> [https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/timestamp.sql].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28138) Add timestamp.sql

2019-07-18 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28138.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25181
[https://github.com/apache/spark/pull/25181]

> Add timestamp.sql
> -
>
> Key: SPARK-28138
> URL: https://issues.apache.org/jira/browse/SPARK-28138
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> [https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/timestamp.sql].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28286) Convert and port 'pivot.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28286.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25122
[https://github.com/apache/spark/pull/25122]

> Convert and port 'pivot.sql' into UDF test base
> ---
>
> Key: SPARK-28286
> URL: https://issues.apache.org/jira/browse/SPARK-28286
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Chitral Verma
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28286) Convert and port 'pivot.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28286:


Assignee: Chitral Verma

> Convert and port 'pivot.sql' into UDF test base
> ---
>
> Key: SPARK-28286
> URL: https://issues.apache.org/jira/browse/SPARK-28286
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Chitral Verma
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28438) [SQL] Ignore metadata's(comments) difference when comparing datasource's schema and user-specific schema

2019-07-18 Thread ShuMing Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShuMing Li updated SPARK-28438:
---
Description: 
When users register a datasource table to Spark,  Spark only support complete 
schema equality of datasource's origin schema  and user-specific's schema now.

However datasource's origin schema may be little different with user-specific's 
schema: the diff maybe `column's comment` or other metadata info.

Can we ignore column's comment or metadata info when comparing?

// DataSource.scala
case (dataSource: RelationProvider, Some(schema)) =>
  val baseRelation =
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
  if (baseRelation.schema != schema) \{
throw new AnalysisException(s"$className does not allow user-specified 
schemas, " +
s"source schema: ${baseRelation.schema}, user-specific schema: 
${schema}")
  }

// StructType.scala

override def equals(that: Any): Boolean = \{
  that match {
case StructType(otherFields) =>
  java.util.Arrays.equals(
fields.asInstanceOf[Array[AnyRef]], 
otherFields.asInstanceOf[Array[AnyRef]])
case _ => false
  }
}

> [SQL] Ignore metadata's(comments) difference when comparing datasource's 
> schema and user-specific schema
> 
>
> Key: SPARK-28438
> URL: https://issues.apache.org/jira/browse/SPARK-28438
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: ShuMing Li
>Priority: Minor
>
> When users register a datasource table to Spark,  Spark only support complete 
> schema equality of datasource's origin schema  and user-specific's schema now.
> However datasource's origin schema may be little different with 
> user-specific's schema: the diff maybe `column's comment` or other metadata 
> info.
> Can we ignore column's comment or metadata info when comparing?
> // DataSource.scala
> case (dataSource: RelationProvider, Some(schema)) =>
>   val baseRelation =
> dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
>   if (baseRelation.schema != schema) \{
> throw new AnalysisException(s"$className does not allow user-specified 
> schemas, " +
> s"source schema: ${baseRelation.schema}, user-specific schema: 
> ${schema}")
>   }
> // StructType.scala
> override def equals(that: Any): Boolean = \{
>   that match {
> case StructType(otherFields) =>
>   java.util.Arrays.equals(
> fields.asInstanceOf[Array[AnyRef]], 
> otherFields.asInstanceOf[Array[AnyRef]])
> case _ => false
>   }
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28438) [SQL] Ignore metadata's(comments) difference when comparing datasource's schema and user-specific schema

2019-07-18 Thread ShuMing Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShuMing Li updated SPARK-28438:
---
Description: 
When users register a datasource table to Spark,  Spark only support complete 
schema equality of datasource's origin schema  and user-specific's schema now.

However datasource's origin schema may be little different with user-specific's 
schema: the diff maybe `column's comment` or other metadata info.

Can we ignore column's comment or metadata info when comparing?
{code:java}
// DataSource.scala
case (dataSource: RelationProvider, Some(schema)) =>
val baseRelation =
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
if (baseRelation.schema != schema) {
throw new AnalysisException(s"$className does not allow user-specified schemas, 
" +
s"source schema: ${baseRelation.schema}, user-specific schema: ${schema}")
}

// StructType.scala

override def equals(that: Any): Boolean = {
that match

{ case StructType(otherFields) => java.util.Arrays.equals( 
fields.asInstanceOf[Array[AnyRef]], otherFields.asInstanceOf[Array[AnyRef]]) 
case _ => false }

}
{code}
 

  was:
When users register a datasource table to Spark,  Spark only support complete 
schema equality of datasource's origin schema  and user-specific's schema now.

However datasource's origin schema may be little different with user-specific's 
schema: the diff maybe `column's comment` or other metadata info.

Can we ignore column's comment or metadata info when comparing?

// DataSource.scala
case (dataSource: RelationProvider, Some(schema)) =>
  val baseRelation =
dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
  if (baseRelation.schema != schema) \{
throw new AnalysisException(s"$className does not allow user-specified 
schemas, " +
s"source schema: ${baseRelation.schema}, user-specific schema: 
${schema}")
  }

// StructType.scala

override def equals(that: Any): Boolean = \{
  that match {
case StructType(otherFields) =>
  java.util.Arrays.equals(
fields.asInstanceOf[Array[AnyRef]], 
otherFields.asInstanceOf[Array[AnyRef]])
case _ => false
  }
}


> [SQL] Ignore metadata's(comments) difference when comparing datasource's 
> schema and user-specific schema
> 
>
> Key: SPARK-28438
> URL: https://issues.apache.org/jira/browse/SPARK-28438
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: ShuMing Li
>Priority: Minor
>
> When users register a datasource table to Spark,  Spark only support complete 
> schema equality of datasource's origin schema  and user-specific's schema now.
> However datasource's origin schema may be little different with 
> user-specific's schema: the diff maybe `column's comment` or other metadata 
> info.
> Can we ignore column's comment or metadata info when comparing?
> {code:java}
> // DataSource.scala
> case (dataSource: RelationProvider, Some(schema)) =>
> val baseRelation =
> dataSource.createRelation(sparkSession.sqlContext, caseInsensitiveOptions)
> if (baseRelation.schema != schema) {
> throw new AnalysisException(s"$className does not allow user-specified 
> schemas, " +
> s"source schema: ${baseRelation.schema}, user-specific schema: ${schema}")
> }
> // StructType.scala
> override def equals(that: Any): Boolean = {
> that match
> { case StructType(otherFields) => java.util.Arrays.equals( 
> fields.asInstanceOf[Array[AnyRef]], otherFields.asInstanceOf[Array[AnyRef]]) 
> case _ => false }
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28439) pyspark.sql.functions.array_repeat should support Column as count argument

2019-07-18 Thread Maciej Szymkiewicz (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-28439:
---
Description: 
In Scala, Spark supports 
(https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3777)

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't
{code:java}
>>> import pyspark   
>>> from pyspark.sql import functions as f
>>> pyspark.__version__
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))
...
TypeError: Column is not iterable


{code}
 

 

  was:
In Scala Spark supports

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't
{code:java}
>>> import pyspark   
>>> from pyspark.sql import functions as f
>>> pyspark.__version__
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))
...
TypeError: Column is not iterable


{code}
 

 


> pyspark.sql.functions.array_repeat should support Column as count argument
> --
>
> Key: SPARK-28439
> URL: https://issues.apache.org/jira/browse/SPARK-28439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> In Scala, Spark supports 
> (https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3777)
>  
> {code:java}
> (Column, Column) => Column
> {code}
> variant of array_repeat, however PySpark doesn't
> {code:java}
> >>> import pyspark   
> >>> from pyspark.sql import functions as f
> >>> pyspark.__version__
> '3.0.0.dev0'
>  
> >>> f.array_repeat(f.col("foo"), f.col("bar"))
> ...
> TypeError: Column is not iterable
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28439) pyspark.sql.functions.array_repeat should support Column as count argument

2019-07-18 Thread Maciej Szymkiewicz (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-28439:
---
Description: 
In Scala Spark supports

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't
{code:java}
>>> import pyspark   
>>> from pyspark.sql import functions as f
>>> pyspark.__version__
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))
...
TypeError: Column is not iterable


{code}
 

 

  was:
In Scala Spark supports

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't

 

 

 
{code:java}
>>> import pyspark   
>>> from pyspark.sql import functions as f
>>> pyspark.__version__
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))
...
TypeError: Column is not iterable


{code}
 

 


> pyspark.sql.functions.array_repeat should support Column as count argument
> --
>
> Key: SPARK-28439
> URL: https://issues.apache.org/jira/browse/SPARK-28439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> In Scala Spark supports
>  
> {code:java}
> (Column, Column) => Column
> {code}
> variant of array_repeat, however PySpark doesn't
> {code:java}
> >>> import pyspark   
> >>> from pyspark.sql import functions as f
> >>> pyspark.__version__
> '3.0.0.dev0'
>  
> >>> f.array_repeat(f.col("foo"), f.col("bar"))
> ...
> TypeError: Column is not iterable
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28439) pyspark.sql.functions.array_repeat should support Column as count argument

2019-07-18 Thread Maciej Szymkiewicz (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-28439:
---
Description: 
In Scala Spark supports

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't

 

 

 
{code:java}
>>> import pyspark   
>>> from pyspark.sql import functions as f
>>> pyspark.__version__
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))
...
TypeError: Column is not iterable


{code}
 

 

  was:
In Scala Spark supports

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't

 

 

 
{code:java}
>>> import pyspark  
>>> 
>>>      
>>> from pyspark.sql import functions as f  
>>> 
>>>     
>>> pyspark.__version__ 
>>> 
>>>      
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))  
>>> 
>>>      
Traceback (most recent call last):
...
TypeError: Column is not iterable


{code}
 

 


> pyspark.sql.functions.array_repeat should support Column as count argument
> --
>
> Key: SPARK-28439
> URL: https://issues.apache.org/jira/browse/SPARK-28439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> In Scala Spark supports
>  
> {code:java}
> (Column, Column) => Column
> {code}
> variant of array_repeat, however PySpark doesn't
>  
>  
>  
> {code:java}
> >>> import pyspark   
> >>> from pyspark.sql import functions as f
> >>> pyspark.__version__
> '3.0.0.dev0'
>  
> >>> f.array_repeat(f.col("foo"), f.col("bar"))
> ...
> TypeError: Column is not iterable
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28439) pyspark.sql.functions.array_repeat should support Column as count argument

2019-07-18 Thread Maciej Szymkiewicz (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-28439:
---
Description: 
In Scala Spark supports

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't

 

 

 
{code:java}
>>> import pyspark  
>>> 
>>>      
>>> from pyspark.sql import functions as f  
>>> 
>>>     
>>> pyspark.__version__ 
>>> 
>>>      
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))  
>>> 
>>>      
Traceback (most recent call last):
...
TypeError: Column is not iterable


{code}
 

 

  was:
In Scala Spark supports

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't

 

 

 
{code:java}
>>> import pyspark  
>>> 
>>>      
>>> from pyspark.sql import functions as f  
>>> 
>>>     >>> pyspark.__version__ 
>>> 
>>>      
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))  
>>> 
>>>      
Traceback (most recent call last):
...
TypeError: Column is not iterable


{code}
 

 


> pyspark.sql.functions.array_repeat should support Column as count argument
> --
>
> Key: SPARK-28439
> URL: https://issues.apache.org/jira/browse/SPARK-28439
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> In Scala Spark supports
>  
> {code:java}
> (Column, Column) => Column
> {code}
> variant of array_repeat, however PySpark doesn't
>  
>  
>  
> {code:java}
> >>> import pyspark
> >>>   
> >>>      
> >>> from pyspark.sql import functions as f
> >>>   
> >>>     
> >>> pyspark.__version__   
> >>>   
> >>>      
> '3.0.0.dev0'
>  
> >>> f.array_repeat(f.col("foo"), f.col("bar"))
> >>>   
> >>>      
> Traceback (most recent call last):
> ...
> TypeError: Column is not iterable
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28439) pyspark.sql.functions.array_repeat should support Column as count argument

2019-07-18 Thread Maciej Szymkiewicz (JIRA)

Maciej Szymkiewicz created SPARK-28439:
--

 Summary: pyspark.sql.functions.array_repeat should support Column 
as count argument
 Key: SPARK-28439
 URL: https://issues.apache.org/jira/browse/SPARK-28439
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 2.4.0, 3.0.0
Reporter: Maciej Szymkiewicz


In Scala Spark supports

 
{code:java}
(Column, Column) => Column
{code}
variant of array_repeat, however PySpark doesn't

 

 

 
{code:java}
>>> import pyspark  
>>> 
>>>      
>>> from pyspark.sql import functions as f  
>>> 
>>>     >>> pyspark.__version__ 
>>> 
>>>      
'3.0.0.dev0'
 
>>> f.array_repeat(f.col("foo"), f.col("bar"))  
>>> 
>>>      
Traceback (most recent call last):
...
TypeError: Column is not iterable


{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28438) [SQL] Ignore metadata's(comments) difference when comparing datasource's schema and user-specific schema

2019-07-18 Thread ShuMing Li (JIRA)

ShuMing Li created SPARK-28438:
--

 Summary: [SQL] Ignore metadata's(comments) difference when 
comparing datasource's schema and user-specific schema
 Key: SPARK-28438
 URL: https://issues.apache.org/jira/browse/SPARK-28438
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: ShuMing Li






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25788) Elastic net penalties for GLMs

2019-07-18 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887874#comment-16887874
 ] 

shahid commented on SPARK-25788:


[~pralabhkumar] yeah. Please go ahead. I think I don't have enough time 
bandwidth to look into the issue.

> Elastic net penalties for GLMs 
> ---
>
> Key: SPARK-25788
> URL: https://issues.apache.org/jira/browse/SPARK-25788
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.3.2
>Reporter: Christian Lorentzen
>Priority: Major
>
> Currently, both LinearRegression and LogisticRegression support an elastic 
> net penality (setElasticNetParam), i.e. L1 and L2 penalties. This feature 
> could and should also be added to GeneralizedLinearRegression.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28278) Convert and port 'except-all.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28278:


Assignee: Terry Kim

> Convert and port 'except-all.sql' into UDF test base
> 
>
> Key: SPARK-28278
> URL: https://issues.apache.org/jira/browse/SPARK-28278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Terry Kim
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28278) Convert and port 'except-all.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28278.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25090
[https://github.com/apache/spark/pull/25090]

> Convert and port 'except-all.sql' into UDF test base
> 
>
> Key: SPARK-28278
> URL: https://issues.apache.org/jira/browse/SPARK-28278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28283) Convert and port 'intersect-all.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28283:


Assignee: Terry Kim

> Convert and port 'intersect-all.sql' into UDF test base
> ---
>
> Key: SPARK-28283
> URL: https://issues.apache.org/jira/browse/SPARK-28283
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Terry Kim
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28283) Convert and port 'intersect-all.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28283.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25119
[https://github.com/apache/spark/pull/25119]

> Convert and port 'intersect-all.sql' into UDF test base
> ---
>
> Key: SPARK-28283
> URL: https://issues.apache.org/jira/browse/SPARK-28283
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28276) Convert and port 'cross-join.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28276:


Assignee: Liang-Chi Hsieh  (was: Hyukjin Kwon)

> Convert and port 'cross-join.sql' into UDF test base
> 
>
> Key: SPARK-28276
> URL: https://issues.apache.org/jira/browse/SPARK-28276
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28276) Convert and port 'cross-join.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28276.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25168
[https://github.com/apache/spark/pull/25168]

> Convert and port 'cross-join.sql' into UDF test base
> 
>
> Key: SPARK-28276
> URL: https://issues.apache.org/jira/browse/SPARK-28276
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28276) Convert and port 'cross-join.sql' into UDF test base

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28276:


Assignee: Hyukjin Kwon

> Convert and port 'cross-join.sql' into UDF test base
> 
>
> Key: SPARK-28276
> URL: https://issues.apache.org/jira/browse/SPARK-28276
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25788) Elastic net penalties for GLMs

2019-07-18 Thread pralabhkumar (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887858#comment-16887858
 ] 

pralabhkumar commented on SPARK-25788:
--

[~shahid] I can work on this . Please let me know if its ok 

> Elastic net penalties for GLMs 
> ---
>
> Key: SPARK-25788
> URL: https://issues.apache.org/jira/browse/SPARK-25788
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.3.2
>Reporter: Christian Lorentzen
>Priority: Major
>
> Currently, both LinearRegression and LogisticRegression support an elastic 
> net penality (setElasticNetParam), i.e. L1 and L2 penalties. This feature 
> could and should also be added to GeneralizedLinearRegression.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28437) Different format when casting interval type to string type

2019-07-18 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-28437:
---

 Summary: Different format when casting interval type to string type
 Key: SPARK-28437
 URL: https://issues.apache.org/jira/browse/SPARK-28437
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


*Spark SQL*:
{code:sql}
spark-sql> select cast(INTERVAL '10' SECOND as string);
interval 10 seconds
{code}

*PostgreSQL*:
{code:sql}
postgres=# select substr(version(), 0, 16), cast(INTERVAL '10' SECOND as text);
 substr  |   text
-+--
 PostgreSQL 11.3 | 00:00:10
(1 row)
{code}

*Vertica*:
{code:sql}
dbadmin=> select version(), cast(INTERVAL '10' SECOND as varchar(255));
  version   | ?column?
+--
 Vertica Analytic Database v9.1.1-0 | 10
(1 row)
{code}

*Presto*:
{code:sql}
presto> select cast(INTERVAL '10' SECOND as varchar(255));
 _col0

 0 00:00:10.000
(1 row)
{code}

*Oracle*:
{code:sql}
SQL> select cast(INTERVAL '10' SECOND as varchar(255)) from dual;

CAST(INTERVAL'10'SECONDASVARCHAR(255))

INTERVAL'+00 00:00:10.00'DAY TO SECOND
{code}





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-26796) Testcases failing with "org.apache.hadoop.fs.ChecksumException" error

2019-07-18 Thread Anuja Jakhade (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuja Jakhade reopened SPARK-26796:
---

> Testcases failing with "org.apache.hadoop.fs.ChecksumException" error
> -
>
> Key: SPARK-26796
> URL: https://issues.apache.org/jira/browse/SPARK-26796
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.2, 2.4.0
> Environment: Ubuntu 16.04 
> Java Version
> openjdk version "1.8.0_192"
>  OpenJDK Runtime Environment (build 1.8.0_192-b12_openj9)
>  Eclipse OpenJ9 VM (build openj9-0.11.0, JRE 1.8.0 Compressed References 
> 20181107_80 (JIT enabled, AOT enabled)
>  OpenJ9 - 090ff9dcd
>  OMR - ea548a66
>  JCL - b5a3affe73 based on jdk8u192-b12)
>  
> Hadoop  Version
> Hadoop 2.7.1
>  Subversion Unknown -r Unknown
>  Compiled by test on 2019-01-29T09:09Z
>  Compiled with protoc 2.5.0
>  From source with checksum 5e94a235f9a71834e2eb73fb36ee873f
>  This command was run using 
> /home/test/hadoop-release-2.7.1/hadoop-dist/target/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar
>  
>  
>  
>Reporter: Anuja Jakhade
>Priority: Major
>
> Observing test case failures due to Checksum error 
> Below is the error log
> [ERROR] checkpointAndComputation(test.org.apache.spark.JavaAPISuite) Time 
> elapsed: 1.232 s <<< ERROR!
> org.apache.spark.SparkException: 
> Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor 
> driver): org.apache.hadoop.fs.ChecksumException: Checksum error: 
> file:/home/test/spark/core/target/tmp/1548319689411-0/fd0ba388-539c-49aa-bf76-e7d50aa2d1fc/rdd-0/part-0
>  at 0 exp: 222499834 got: 1400184476
>  at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:323)
>  at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:279)
>  at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:214)
>  at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:232)
>  at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196)
>  at java.io.DataInputStream.read(DataInputStream.java:149)
>  at 
> java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2769)
>  at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2785)
>  at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3262)
>  at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:968)
>  at java.io.ObjectInputStream.(ObjectInputStream.java:390)
>  at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.(JavaSerializer.scala:63)
>  at 
> org.apache.spark.serializer.JavaDeserializationStream.(JavaSerializer.scala:63)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:122)
>  at 
> org.apache.spark.rdd.ReliableCheckpointRDD$.readCheckpointFile(ReliableCheckpointRDD.scala:300)
>  at 
> org.apache.spark.rdd.ReliableCheckpointRDD.compute(ReliableCheckpointRDD.scala:100)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:322)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:813)
> Driver stacktrace:
>  at 
> test.org.apache.spark.JavaAPISuite.checkpointAndComputation(JavaAPISuite.java:1243)
> Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error:
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28436) [SQL] Throw better exception when datasource's schema is not equal to user-specific shema

2019-07-18 Thread ShuMing Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShuMing Li updated SPARK-28436:
---
Affects Version/s: 2.3.0

> [SQL] Throw better exception when datasource's schema is not equal to 
> user-specific shema
> -
>
> Key: SPARK-28436
> URL: https://issues.apache.org/jira/browse/SPARK-28436
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.3
>Reporter: ShuMing Li
>Priority: Minor
>
> When this exception is thrown, users cannot find what's the difference 
> between datasource's original schema and user-specific schema, and maybe very 
> confused when meet the exception below.
> {code:java}
> org.apache.spark.sql.AnalysisException: org.apache.spark.odps.datasource does 
> not allow user-specified schemas.
> at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:347)
> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3270)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3269)
> at org.apache.spark.sql.Dataset.(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:653)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:714)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28436) [SQL] Throw better exception when datasource's schema is not equal to user-specific shema

2019-07-18 Thread ShuMing Li (JIRA)

ShuMing Li created SPARK-28436:
--

 Summary: [SQL] Throw better exception when datasource's schema is 
not equal to user-specific shema
 Key: SPARK-28436
 URL: https://issues.apache.org/jira/browse/SPARK-28436
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
Reporter: ShuMing Li


When this exception is thrown, users cannot find what's the difference between 
datasource's original schema and user-specific schema, and maybe very confused 
when meet the exception below.

```
org.apache.spark.sql.AnalysisException: org.apache.spark.odps.datasource does 
not allow user-specified schemas.;
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:347)
at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3270)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3269)
at org.apache.spark.sql.Dataset.(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:653)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:714)
```



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28436) [SQL] Throw better exception when datasource's schema is not equal to user-specific shema

2019-07-18 Thread ShuMing Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShuMing Li updated SPARK-28436:
---
Description: 
When this exception is thrown, users cannot find what's the difference between 
datasource's original schema and user-specific schema, and maybe very confused 
when meet the exception below.
{code:java}
org.apache.spark.sql.AnalysisException: org.apache.spark.odps.datasource does 
not allow user-specified schemas.
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:347)
at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3270)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3269)
at org.apache.spark.sql.Dataset.(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:653)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:714)
{code}

  was:
When this exception is thrown, users cannot find what's the difference between 
datasource's original schema and user-specific schema, and maybe very confused 
when meet the exception below.

```
org.apache.spark.sql.AnalysisException: org.apache.spark.odps.datasource does 
not allow user-specified schemas.;
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:347)
at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3270)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3269)
at org.apache.spark.sql.Dataset.(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:653)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:714)
```


> [SQL] Throw better exception when datasource's schema is not equal to 
> user-specific shema
> -
>
> Key: SPARK-28436
> URL: https://issues.apache.org/jira/browse/SPARK-28436
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: ShuMing Li
>Priority: Minor
>
> When this exception is thrown, users cannot find what's the difference 
> between datasource's original schema and user-specific schema, and maybe very 
> confused when meet the exception below.
> {code:java}
> org.apache.spark.sql.AnalysisException: org.apache.spark.odps.datasource does 
> not allow user-specified schemas.
> at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:347)
> at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3270)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3269)
> at org.apache.spark.sql.Dataset.(Dataset.scala:190)
> at

[jira] [Commented] (SPARK-28424) Improve interval input

2019-07-18 Thread Yuming Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887754#comment-16887754
 ] 

Yuming Wang commented on SPARK-28424:
-

I'm working on.

>  Improve interval input
> ---
>
> Key: SPARK-28424
> URL: https://issues.apache.org/jira/browse/SPARK-28424
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Example:
> {code:sql}
> INTERVAL '1 day 2:03:04'
> {code}
> https://www.postgresql.org/docs/11/datatype-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28411) insertInto with overwrite inconsistent behaviour Python/Scala

2019-07-18 Thread Maria Rebelka (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887725#comment-16887725
 ] 

Maria Rebelka commented on SPARK-28411:
---

Great, thank you!

> insertInto with overwrite inconsistent behaviour Python/Scala
> -
>
> Key: SPARK-28411
> URL: https://issues.apache.org/jira/browse/SPARK-28411
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.2.1, 2.4.0
>Reporter: Maria Rebelka
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.0.0
>
>
> The df.write.mode("overwrite").insertInto("table") has inconsistent behaviour 
> between Scala and Python. In Python, insertInto ignores "mode" parameter and 
> appends by default. Only when changing syntax to df.write.insertInto("table", 
> overwrite=True) we get expected behaviour.
> This is a native Spark syntax, expected to be the same between languages... 
> Also, in other write methods, like saveAsTable or write.parquet "mode" seem 
> to be respected.
> Reproduce, Python, ignore "overwrite":
> {code:java}
> df = spark.createDataFrame(sc.parallelize([(1, 2),(3,4)]),['i','j'])
> # create the table and load data
> df.write.saveAsTable("spark_overwrite_issue")
> # insert overwrite, expected result - 2 rows
> df.write.mode("overwrite").insertInto("spark_overwrite_issue")
> spark.sql("select * from spark_overwrite_issue").count()
> # result - 4 rows, insert appended data instead of overwrite{code}
> Reproduce, Scala, works as expected:
> {code:java}
> val df = Seq((1, 2),(3,4)).toDF("i","j")
> df.write.mode("overwrite").insertInto("spark_overwrite_issue")
> spark.sql("select * from spark_overwrite_issue").count()
> # result - 2 rows{code}
> Tested on Spark 2.2.1 (EMR) and 2.4.0 (Databricks)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-28429) SQL Datetime util function being casted to double instead of timestamp

2019-07-18 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28429:

Comment: was deleted

(was: I'm working on.)

> SQL Datetime util function being casted to double instead of timestamp
> --
>
> Key: SPARK-28429
> URL: https://issues.apache.org/jira/browse/SPARK-28429
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> In the code below, 'now()+'100 days' are casted to double and then an error 
> is thrown:
> {code:sql}
> CREATE TEMP VIEW v_window AS
> SELECT i, min(i) over (order by i range between '1 day' preceding and '10 
> days' following) as min_i
> FROM range(now(), now()+'100 days', '1 hour') i;
> {code}
> Error:
> {code:sql}
> cannot resolve '(current_timestamp() + CAST('100 days' AS DOUBLE))' due to 
> data type mismatch: differing      types in '(current_timestamp() + CAST('100 
> days' AS DOUBLE))' (timestamp and double).;{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28288) Convert and port 'window.sql' into UDF test base

2019-07-18 Thread Liang-Chi Hsieh (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887707#comment-16887707
 ] 

Liang-Chi Hsieh commented on SPARK-28288:
-

Those errors can be found in original window.sql. Seems fine.

> Convert and port 'window.sql' into UDF test base
> 
>
> Key: SPARK-28288
> URL: https://issues.apache.org/jira/browse/SPARK-28288
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28194) [SQL] A NoSuchElementException maybe thrown when EnsureRequirement

2019-07-18 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28194.
--
Resolution: Cannot Reproduce

So, this is likely a duplicate of another JIRA. Let's find the JIRA that fixed 
this issue and see if we can backport.

> [SQL] A NoSuchElementException maybe thrown when EnsureRequirement
> --
>
> Key: SPARK-28194
> URL: https://issues.apache.org/jira/browse/SPARK-28194
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: feiwang
>Priority: Major
>
> {code:java}
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$reorder$1.apply(EnsureRequirements.scala:239)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$reorder$1.apply(EnsureRequirements.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.reorder(EnsureRequirements.scala:234)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.reorderJoinKeys(EnsureRequirements.scala:257)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$reorderJoinPredicates(EnsureRequirements.scala:297)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:312)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:304)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:293)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$2.apply(TreeNode.scala:293)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:292)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28429) SQL Datetime util function being casted to double instead of timestamp

2019-07-18 Thread Yuming Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887673#comment-16887673
 ] 

Yuming Wang commented on SPARK-28429:
-

I'm working on.

> SQL Datetime util function being casted to double instead of timestamp
> --
>
> Key: SPARK-28429
> URL: https://issues.apache.org/jira/browse/SPARK-28429
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> In the code below, 'now()+'100 days' are casted to double and then an error 
> is thrown:
> {code:sql}
> CREATE TEMP VIEW v_window AS
> SELECT i, min(i) over (order by i range between '1 day' preceding and '10 
> days' following) as min_i
> FROM range(now(), now()+'100 days', '1 hour') i;
> {code}
> Error:
> {code:sql}
> cannot resolve '(current_timestamp() + CAST('100 days' AS DOUBLE))' due to 
> data type mismatch: differing      types in '(current_timestamp() + CAST('100 
> days' AS DOUBLE))' (timestamp and double).;{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28435) Support cast StringType to IntervalType for SQL interface

2019-07-18 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28435:

Summary: Support cast StringType to IntervalType for SQL interface  (was: 
Support cast string to interval for SQL interface)

> Support cast StringType to IntervalType for SQL interface
> -
>
> Key: SPARK-28435
> URL: https://issues.apache.org/jira/browse/SPARK-28435
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Scala interface support cast string to interval:
> {code:scala}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.catalyst.expressions._
> Cast(Literal("interval 3 month 1 hours"), CalendarIntervalType).eval()
> res0: Any = interval 3 months 1 hours
> {code}
> But SQL interface does not support it:
> {code:sql}
> scala> spark.sql("SELECT CAST('interval 3 month 1 hour' AS interval)").show
> org.apache.spark.sql.catalyst.parser.ParseException:
> DataType interval is not supported.(line 1, pos 41)
> == SQL ==
> SELECT CAST('interval 3 month 1 hour' AS interval)
> -^^^
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitPrimitiveDataType$1(AstBuilder.scala:1931)
>   at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1909)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:52)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:15397)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:58)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSparkDataType(AstBuilder.scala:1903)
>   at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitCast$1(AstBuilder.scala:1334)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28435) Support cast string to interval for SQL interface

2019-07-18 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-28435:
---

 Summary: Support cast string to interval for SQL interface
 Key: SPARK-28435
 URL: https://issues.apache.org/jira/browse/SPARK-28435
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


Scala interface support cast string to interval:
{code:scala}
import org.apache.spark.sql.types._
import org.apache.spark.sql.catalyst.expressions._

Cast(Literal("interval 3 month 1 hours"), CalendarIntervalType).eval()
res0: Any = interval 3 months 1 hours
{code}

But SQL interface does not support it:
{code:sql}
scala> spark.sql("SELECT CAST('interval 3 month 1 hour' AS interval)").show
org.apache.spark.sql.catalyst.parser.ParseException:
DataType interval is not supported.(line 1, pos 41)

== SQL ==
SELECT CAST('interval 3 month 1 hour' AS interval)
-^^^

  at 
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitPrimitiveDataType$1(AstBuilder.scala:1931)
  at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:108)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:1909)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitPrimitiveDataType(AstBuilder.scala:52)
  at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$PrimitiveDataTypeContext.accept(SqlBaseParser.java:15397)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:58)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSparkDataType(AstBuilder.scala:1903)
  at 
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitCast$1(AstBuilder.scala:1334)
{code}






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

75 matches

Mail list logo