[jira] [Updated] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2018-06-27 Thread Dong Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Jiang updated SPARK-24669:
---
Description: 
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
"now, if I drop the table explicitly, instead of via dropping database cascade, 
then it will be the correct result"
spark.sql("drop table foo.first")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
spark.table("foo.first").show()
+--+
|id|
+--+
|second|
+--+
{code}

Same sequence failed in 2.3.1 as well.

  was:
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
"now, if I drop the table explicitly, instead of via dropping database cascade, 
then it will be the correct result"
spark.sql("drop table foo.first")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
spark.table("foo.first").show()
+--+
|id|
+--+
|second|
+--+
{code}


> Managed table was not cleared of path after drop database cascade
> -
>
> Key: SPARK-24669
> URL: https://issues.apache.org/jira/browse/SPARK-24669
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Dong Jiang
>Priority: Major
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> "now, if I drop the table explicitly, instead of via dropping database 
> cascade, then it will be the correct result"
> spark.sql("drop table foo.first")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> spark.table("foo.first").show()
> +--+
> |id|
> +--+
> |second|
> +--+
> {code}
> Same sequence failed in 2.3.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2018-06-27 Thread Dong Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Jiang updated SPARK-24669:
---
Affects Version/s: 2.3.1

> Managed table was not cleared of path after drop database cascade
> -
>
> Key: SPARK-24669
> URL: https://issues.apache.org/jira/browse/SPARK-24669
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Dong Jiang
>Priority: Major
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> "now, if I drop the table explicitly, instead of via dropping database 
> cascade, then it will be the correct result"
> spark.sql("drop table foo.first")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> spark.table("foo.first").show()
> +--+
> |id|
> +--+
> |second|
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2018-06-27 Thread Dong Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Jiang updated SPARK-24669:
---
Description: 
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
"now, if I drop the table explicitly, instead of via dropping database cascade, 
then it will be the correct result"
spark.sql("drop table foo.first")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
spark.table("foo.first").show()
+--+
|id|
+--+
|second|
+--+
{code}

  was:
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
"now, if I drop the table explicitly, then it will be correct"
spark.sql("drop table foo.first")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
spark.table("foo.first").show()
+--+
|id|
+--+
|second|
+--+
{code}


> Managed table was not cleared of path after drop database cascade
> -
>
> Key: SPARK-24669
> URL: https://issues.apache.org/jira/browse/SPARK-24669
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dong Jiang
>Priority: Major
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> "now, if I drop the table explicitly, instead of via dropping database 
> cascade, then it will be the correct result"
> spark.sql("drop table foo.first")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> spark.table("foo.first").show()
> +--+
> |id|
> +--+
> |second|
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2018-06-27 Thread Dong Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Jiang updated SPARK-24669:
---
Description: 
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
"now, if I drop the table explicitly, then it will be correct"
spark.sql("drop table foo.first")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
spark.table("foo.first").show()
+--+
|id|
+--+
|second|
+--+
{code}

  was:
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
{code}


> Managed table was not cleared of path after drop database cascade
> -
>
> Key: SPARK-24669
> URL: https://issues.apache.org/jira/browse/SPARK-24669
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dong Jiang
>Priority: Major
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> "now, if I drop the table explicitly, then it will be correct"
> spark.sql("drop table foo.first")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> spark.table("foo.first").show()
> +--+
> |id|
> +--+
> |second|
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2018-06-27 Thread Dong Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Jiang updated SPARK-24669:
---
Description: 
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
{code}

  was:
I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.second").show()
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
{code}


> Managed table was not cleared of path after drop database cascade
> -
>
> Key: SPARK-24669
> URL: https://issues.apache.org/jira/browse/SPARK-24669
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dong Jiang
>Priority: Major
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2018-06-27 Thread Dong Jiang (JIRA)
Dong Jiang created SPARK-24669:
--

 Summary: Managed table was not cleared of path after drop database 
cascade
 Key: SPARK-24669
 URL: https://issues.apache.org/jira/browse/SPARK-24669
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Dong Jiang


I can do the following in sequence
# Create a managed table using path options
# Drop the table via dropping the parent database cascade
# Re-create the database and table with a different path
# The new table shows data from the old path, not the new path

{code}
echo "first" > /tmp/first.csv
echo "second" > /tmp/second.csv
spark-shell
spark.version
res0: String = 2.3.0
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/first.csv')")
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
spark.sql("drop database foo cascade")
spark.sql("create database foo")
spark.sql("create table foo.first (id string) using csv options 
(path='/tmp/second.csv')")
"note, the path is different now, pointing to second.csv, but still showing 
data from first file"
spark.table("foo.second").show()
spark.table("foo.first").show()
+-+
|   id|
+-+
|first|
+-+
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23866) Extend ALTER TABLE DROP PARTITION syntax to use all comparators

2018-04-04 Thread Dong Jiang (JIRA)
Dong Jiang created SPARK-23866:
--

 Summary: Extend ALTER TABLE DROP PARTITION syntax to use all 
comparators
 Key: SPARK-23866
 URL: https://issues.apache.org/jira/browse/SPARK-23866
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Dong Jiang


Please add SQL support equivalent to drop multiple partitions by operators 
other than =, basically equivalent of 
https://issues.apache.org/jira/browse/HIVE-2908

"To drop a partition from a Hive table, this works:

ALTER TABLE foo DROP PARTITION(ds = 'date')

...but it should also work to drop all partitions prior to date.

ALTER TABLE foo DROP PARTITION(ds < 'date')

This task is to implement ALTER TABLE DROP PARTITION for all of the 
comparators, < > <= >= <> = != instead of just for =."



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23549) Spark SQL unexpected behavior when comparing timestamp to date

2018-03-08 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391351#comment-16391351
 ] 

Dong Jiang commented on SPARK-23549:


[~kiszk], I expect your query to return false, as presto/Athena does.
A date in SQL is typically thought of equivalent to timestamp at 00:00:00

> Spark SQL unexpected behavior when comparing timestamp to date
> --
>
> Key: SPARK-23549
> URL: https://issues.apache.org/jira/browse/SPARK-23549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Dong Jiang
>Priority: Major
>
> {code:java}
> scala> spark.version
> res1: String = 2.2.1
> scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between 
> cast('2017-02-28' as date) and cast('2017-03-01' as date)").show
> +---+
> |((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= 
> CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 
> AS TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))|
> +---+
> |                                                                             
>                                                                               
>                                                false|
> +---+{code}
> As shown above, when a timestamp is compared to date in SparkSQL, both 
> timestamp and date are downcast to string, and leading to unexpected result. 
> If run the same SQL in presto/Athena, I got the expected result
> {code:java}
> select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as 
> date) and cast('2017-03-01' as date)
>   _col0
> 1 true
> {code}
> Is this a bug for Spark or a feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23549) Spark SQL unexpected behavior when comparing timestamp to date

2018-03-02 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384226#comment-16384226
 ] 

Dong Jiang commented on SPARK-23549:


Tested in spark 2.3.0, same thing

> Spark SQL unexpected behavior when comparing timestamp to date
> --
>
> Key: SPARK-23549
> URL: https://issues.apache.org/jira/browse/SPARK-23549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Dong Jiang
>Priority: Major
>
> {code:java}
> scala> spark.version
> res1: String = 2.2.1
> scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between 
> cast('2017-02-28' as date) and cast('2017-03-01' as date)").show
> +---+
> |((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= 
> CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 
> AS TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))|
> +---+
> |                                                                             
>                                                                               
>                                                false|
> +---+{code}
> As shown above, when a timestamp is compared to date in SparkSQL, both 
> timestamp and date are downcast to string, and leading to unexpected result. 
> If run the same SQL in presto/Athena, I got the expected result
> {code:java}
> select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as 
> date) and cast('2017-03-01' as date)
>   _col0
> 1 true
> {code}
> Is this a bug for Spark or a feature?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23549) Spark SQL unexpected behavior when comparing timestamp to date

2018-03-01 Thread Dong Jiang (JIRA)
Dong Jiang created SPARK-23549:
--

 Summary: Spark SQL unexpected behavior when comparing timestamp to 
date
 Key: SPARK-23549
 URL: https://issues.apache.org/jira/browse/SPARK-23549
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.1
Reporter: Dong Jiang


{code:java}
scala> spark.version

res1: String = 2.2.1

scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between 
cast('2017-02-28' as date) and cast('2017-03-01' as date)").show


+---+

|((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= 
CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 AS 
TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))|

+---+

|                                                                               
                                                                                
                                           false|

+---+{code}
As shown above, when a timestamp is compared to date in SparkSQL, both 
timestamp and date are downcast to string, and leading to unexpected result. If 
run the same SQL in presto/Athena, I got the expected result
{code:java}
select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as 
date) and cast('2017-03-01' as date)
    _col0
1   true{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23549) Spark SQL unexpected behavior when comparing timestamp to date

2018-03-01 Thread Dong Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Jiang updated SPARK-23549:
---
Description: 
{code:java}
scala> spark.version

res1: String = 2.2.1

scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between 
cast('2017-02-28' as date) and cast('2017-03-01' as date)").show


+---+

|((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= 
CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 AS 
TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))|

+---+

|                                                                               
                                                                                
                                           false|

+---+{code}
As shown above, when a timestamp is compared to date in SparkSQL, both 
timestamp and date are downcast to string, and leading to unexpected result. If 
run the same SQL in presto/Athena, I got the expected result
{code:java}
select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as 
date) and cast('2017-03-01' as date)
    _col0
1   true
{code}

Is this a bug for Spark or a feature?

  was:
{code:java}
scala> spark.version

res1: String = 2.2.1

scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between 
cast('2017-02-28' as date) and cast('2017-03-01' as date)").show


+---+

|((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= 
CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 AS 
TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))|

+---+

|                                                                               
                                                                                
                                           false|

+---+{code}
As shown above, when a timestamp is compared to date in SparkSQL, both 
timestamp and date are downcast to string, and leading to unexpected result. If 
run the same SQL in presto/Athena, I got the expected result
{code:java}
select cast('2017-03-01 00:00:00' as timestamp) between cast('2017-02-28' as 
date) and cast('2017-03-01' as date)
    _col0
1   true{code}


> Spark SQL unexpected behavior when comparing timestamp to date
> --
>
> Key: SPARK-23549
> URL: https://issues.apache.org/jira/browse/SPARK-23549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Dong Jiang
>Priority: Major
>
> {code:java}
> scala> spark.version
> res1: String = 2.2.1
> scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) between 
> cast('2017-02-28' as date) and cast('2017-03-01' as date)").show
> +---+
> |((CAST(CAST(2017-03-01 00:00:00 AS TIMESTAMP) AS STRING) >= 
> CAST(CAST(2017-02-28 AS DATE) AS STRING)) AND (CAST(CAST(2017-03-01 00:00:00 
> AS TIMESTAMP) AS STRING) <= CAST(CAST(2017-03-01 AS DATE) AS STRING)))|
> +---+
> |                                                                             
>                                                                               
>                                        

[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-12-28 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305691#comment-16305691
 ] 

Dong Jiang commented on SPARK-13127:


[~gaurav24], looks like you are like me, waiting for this ticket to be worked 
on.
If you would like, help to comment on this thread in developer list to advocate 
to have this issue resolved in Spark 2.3 release
http://apache-spark-developers-list.1001551.n3.nabble.com/Timeline-for-Spark-2-3-td22793.html

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly

2017-12-18 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295261#comment-16295261
 ] 

Dong Jiang commented on SPARK-17647:


Are we sure this issue is resolved, I tested the following on spark-shell 2.2.0
{code}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
  /_/
 
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sql("select '' like '%\\%'").show
+--+
|\ LIKE %\%|
+--+
| false|
+--+
{code}
same in spark-sql
{code}
spark-sql> select '' like '%\\%';
false
Time taken: 2.296 seconds, Fetched 1 row(s)
{code}


> SQL LIKE does not handle backslashes correctly
> --
>
> Key: SPARK-17647
> URL: https://issues.apache.org/jira/browse/SPARK-17647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>  Labels: correctness
> Fix For: 2.1.1, 2.2.0
>
>
> Try the following in SQL shell:
> {code}
> select '' like '%\\%';
> {code}
> It returned false, which is wrong.
> cc: [~yhuai] [~joshrosen]
> A false-negative considered previously:
> {code}
> select '' rlike '.*.*';
> {code}
> It returned true, which is correct if we assume that the pattern is treated 
> as a Java string but not raw string.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-11-13 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250512#comment-16250512
 ] 

Dong Jiang commented on SPARK-13127:


[~igozali], I think you are referring to this parquet ticket: 
https://issues.apache.org/jira/browse/PARQUET-686
The parquet ticket indicated the fix is in 1.9.0, so we still need Spark to 
upgrade parquet to 1.9.0
I have examined the parquet file generated by Spark 2.2, the string column 
doesn't have the min/max generated in the footer. I believe it is disabled.

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-11-13 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250512#comment-16250512
 ] 

Dong Jiang edited comment on SPARK-13127 at 11/13/17 11:56 PM:
---

[~igozali], I think you are referring to this parquet ticket: 
https://issues.apache.org/jira/browse/PARQUET-686
The parquet ticket indicated the fix is in 1.9.0, so we still need Spark to 
upgrade parquet to 1.9.0
I have examined the parquet file generated by Spark 2.2, the string column 
doesn't have the min/max generated in the footer. I believe it is disabled.
Do we have any progress on this issue? Will it be included in Spark 2.3?


was (Author: djiangxu):
[~igozali], I think you are referring to this parquet ticket: 
https://issues.apache.org/jira/browse/PARQUET-686
The parquet ticket indicated the fix is in 1.9.0, so we still need Spark to 
upgrade parquet to 1.9.0
I have examined the parquet file generated by Spark 2.2, the string column 
doesn't have the min/max generated in the footer. I believe it is disabled.

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Dong Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Jiang updated SPARK-16806:
---
Description: 
The following is from 2.0, for the same epoch, the function with format 
argument generates a different result for the year.
spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd HH:mm:ss');
1969-12-31 19:01:40 1970-12-31 19:01:40



  was:
The following is from 2.0, for the same epoch, the function with format 
argument generates a different result.
spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd HH:mm:ss');
1969-12-31 19:01:40 1970-12-31 19:01:40




> from_unixtime function gives wrong answer
> -
>
> Key: SPARK-16806
> URL: https://issues.apache.org/jira/browse/SPARK-16806
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Dong Jiang
>
> The following is from 2.0, for the same epoch, the function with format 
> argument generates a different result for the year.
> spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd 
> HH:mm:ss');
> 1969-12-31 19:01:40   1970-12-31 19:01:40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Dong Jiang (JIRA)
Dong Jiang created SPARK-16806:
--

 Summary: from_unixtime function gives wrong answer
 Key: SPARK-16806
 URL: https://issues.apache.org/jira/browse/SPARK-16806
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Dong Jiang


The following is from 2.0, for the same epoch, the function with format 
argument generates a different result.
spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd HH:mm:ss');
1969-12-31 19:01:40 1970-12-31 19:01:40





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org