[jira] [Comment Edited] (SPARK-19496) to_date with format has weird behavior

2017-02-07 Thread Song Jun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857541#comment-15857541
 ] 

Song Jun edited comment on SPARK-19496 at 2/8/17 7:18 AM:
--

mysql: select str_to_date('2014-12-31','%Y-%d-%m')   also return null

that is mysql both return null when the date is invalidate or the formate is 
invalidate.

and hive will transform the invalidate  date to  valid, e.g 2014-31-12 -> 31/12 
= 2 -> 2014+2=2016
, 31 - 12*2=7 -> 2016-07-12

currently spark can handle wrong format / wrong date  when to_date has the 
format parameter (like hive's transform), what about we also make to_date 
without format parameter follow its action, that is replace null with a 
transformed date to return


was (Author: windpiger):
mysql: select str_to_date('2014-12-31','%Y-%d-%m')   also return null

that is mysql both return null when the date is invalidate or the formate is 
invalidate.

and hive will transform the invalidate  date to  valid, e.g 2014-31-12 -> 31/12 
= 2 -> 2014+2=2016
, 31 - 12*2=7 -> 2016-07-12

> to_date with format has weird behavior
> --
>
> Key: SPARK-19496
> URL: https://issues.apache.org/jira/browse/SPARK-19496
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Wenchen Fan
>
> Today, if we run
> {code}
> SELECT to_date('2015-07-22', '-dd-MM')
> {code}
> will result to `2016-10-07`, while running
> {code}
> SELECT to_date('2014-31-12')   # default format
> {code}
> will return null.
> this behavior is weird and we should check other systems like hive to see if 
> this is expected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19496) to_date with format has weird behavior

2017-02-07 Thread Song Jun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857541#comment-15857541
 ] 

Song Jun edited comment on SPARK-19496 at 2/8/17 7:11 AM:
--

mysql: select str_to_date('2014-12-31','%Y-%d-%m')   also return null

that is mysql both return null when the date is invalidate or the formate is 
invalidate.

and hive will transform the invalidate  date to  valid, e.g 2014-31-12 -> 31/12 
= 2 -> 2014+2=2016
, 31 - 12*2=7 -> 2016-07-12


was (Author: windpiger):
mysql: select str_to_date('2014-12-31','%Y-%d-%m')   also return null

mysql both return null when the date is invalidate or the formate is invalidate

> to_date with format has weird behavior
> --
>
> Key: SPARK-19496
> URL: https://issues.apache.org/jira/browse/SPARK-19496
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Wenchen Fan
>
> Today, if we run
> {code}
> SELECT to_date('2015-07-22', '-dd-MM')
> {code}
> will result to `2016-10-07`, while running
> {code}
> SELECT to_date('2014-31-12')   # default format
> {code}
> will return null.
> this behavior is weird and we should check other systems like hive to see if 
> this is expected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19496) to_date with format has weird behavior

2017-02-07 Thread Song Jun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857541#comment-15857541
 ] 

Song Jun edited comment on SPARK-19496 at 2/8/17 7:09 AM:
--

mysql: select str_to_date('2014-12-31','%Y-%d-%m')   also return null

mysql both return null when the date is invalidate or the formate is invalidate


was (Author: windpiger):
mysql: select str_to_date('2014-12-31','%Y-%d-%m')   also return null


> to_date with format has weird behavior
> --
>
> Key: SPARK-19496
> URL: https://issues.apache.org/jira/browse/SPARK-19496
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Wenchen Fan
>
> Today, if we run
> {code}
> SELECT to_date('2015-07-22', '-dd-MM')
> {code}
> will result to `2016-10-07`, while running
> {code}
> SELECT to_date('2014-31-12')   # default format
> {code}
> will return null.
> this behavior is weird and we should check other systems like hive to see if 
> this is expected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19496) to_date with format has weird behavior

2017-02-07 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857459#comment-15857459
 ] 

Hyukjin Kwon edited comment on SPARK-19496 at 2/8/17 6:16 AM:
--

- Hive

{code}
hive> SELECT to_date('2014-31-12');
2016-07-12
...

hive> SELECT to_date('2014-12-32');
2015-01-01
...

hive> SELECT to_date('2014-12-31');
2014-12-31
{code}

{code}
hive> SELECT to_date('2015-07-22', '-dd-MM')
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:7 Arguments length 
mismatch ''-dd-MM'': to_date() requires 1 argument, got 2
...
{code}

- Postgres

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
{code}

{code}
postgres=# SELECT to_date('2014-12-31', '-MM-dd');
  to_date

 2014-12-31
(1 row)

postgres=# SELECT to_date('2014-13-31', '-MM-dd');
  to_date

 2015-01-31
(1 row)
{code}


- Spark

{code}
spark-sql> SELECT to_date('2014-31-12');
NULL
...

spark-sql> SELECT to_date('2014-12-32');
NULL
...

spark-sql> SELECT to_date('2014-12-31');
2014-12-31
...
{code}

{code}
spark-sql> SELECT to_date('2015-07-22', '-dd-MM')
2016-10-07
...
{code}

- MySQL

{code}
mysql> SELECT str_to_date('2014-12-31');
ERROR 1582 (42000): Incorrect parameter count in the call to native function 
'str_to_date'
{code}

{code}
mysql> SELECT str_to_date('2014-12-31', '%Y-%m-%d');
+---+
| str_to_date('2014-12-31', '%Y-%m-%d') |
+---+
| 2014-12-31|
+---+
1 row in set (0.00 sec)

mysql> SELECT str_to_date('2014-13-31', '%Y-%m-%d');
+---+
| str_to_date('2014-13-31', '%Y-%m-%d') |
+---+
| NULL  |
+---+
1 row in set, 1 warning (0.00 sec)
{code}


MySQL/SparkSQL - it seems returning {{NULL}}. MySQL seems not supporting 
omitting the format.

Hive/Postgres - it seems returning calculated dates. Hive seems not supporting 
the format. Postgres seems not supporting omitting the format.


was (Author: hyukjin.kwon):
- Hive

{code}
hive> SELECT to_date('2014-31-12');
2016-07-12
...

hive> SELECT to_date('2014-12-32');
2015-01-01
...

hive> SELECT to_date('2014-12-31');
2014-12-31
{code}

{code}
hive> SELECT to_date('2015-07-22', '-dd-MM')
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:7 Arguments length 
mismatch ''-dd-MM'': to_date() requires 1 argument, got 2
...
{code}

- Postgres

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
{code}

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
postgres=# SELECT to_date('2014-12-31', '-MM-dd');
  to_date

 2014-12-31
(1 row)

postgres=# SELECT to_date('2014-13-31', '-MM-dd');
  to_date

 2015-01-31
(1 row)
{code}


- Spark

{code}
spark-sql> SELECT to_date('2014-31-12');
NULL
...

spark-sql> SELECT to_date('2014-12-32');
NULL
...

spark-sql> SELECT to_date('2014-12-31');
2014-12-31
...
{code}

{code}
spark-sql> SELECT to_date('2015-07-22', '-dd-MM')
2016-10-07
...
{code}

- MySQL

{code}
mysql> SELECT str_to_date('2014-12-31');
ERROR 1582 (42000): Incorrect parameter count in the call to native function 
'str_to_date'
{code}

{code}
mysql> SELECT str_to_date('2014-12-31', '%Y-%m-%d');
+---+
| str_to_date('2014-12-31', '%Y-%m-%d') |
+---+
| 2014-12-31|
+---+
1 row in set (0.00 sec)

mysql> SELECT str_to_date('2014-13-31', '%Y-%m-%d');
+---+
| str_to_date('2014-13-31', '%Y-%m-%d') |
+---+
| NULL  |
+---+
1 row in set, 1 warning (0.00 sec)
{code}


MySQL/SparkSQL - it seems returning {{NULL}}. MySQL seems not supporting 
omitting the format.

Hive/Postgres - it seems returning calculated dates. Hive seems not supporting 
the format. Postgres seems not supporting omitting the format.

> to_date with format has weird behavior
> --
>
> Key: SPARK-19496
> URL: 

[jira] [Comment Edited] (SPARK-19496) to_date with format has weird behavior

2017-02-07 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857459#comment-15857459
 ] 

Hyukjin Kwon edited comment on SPARK-19496 at 2/8/17 6:16 AM:
--

- Hive

{code}
hive> SELECT to_date('2014-31-12');
2016-07-12
...

hive> SELECT to_date('2014-12-32');
2015-01-01
...

hive> SELECT to_date('2014-12-31');
2014-12-31
{code}

{code}
hive> SELECT to_date('2015-07-22', '-dd-MM')
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:7 Arguments length 
mismatch ''-dd-MM'': to_date() requires 1 argument, got 2
...
{code}

- Postgres

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
{code}

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
postgres=# SELECT to_date('2014-12-31', '-MM-dd');
  to_date

 2014-12-31
(1 row)

postgres=# SELECT to_date('2014-13-31', '-MM-dd');
  to_date

 2015-01-31
(1 row)
{code}


- Spark

{code}
spark-sql> SELECT to_date('2014-31-12');
NULL
...

spark-sql> SELECT to_date('2014-12-32');
NULL
...

spark-sql> SELECT to_date('2014-12-31');
2014-12-31
...
{code}

{code}
spark-sql> SELECT to_date('2015-07-22', '-dd-MM')
2016-10-07
...
{code}

- MySQL

{code}
mysql> SELECT str_to_date('2014-12-31');
ERROR 1582 (42000): Incorrect parameter count in the call to native function 
'str_to_date'
{code}

{code}
mysql> SELECT str_to_date('2014-12-31', '%Y-%m-%d');
+---+
| str_to_date('2014-12-31', '%Y-%m-%d') |
+---+
| 2014-12-31|
+---+
1 row in set (0.00 sec)

mysql> SELECT str_to_date('2014-13-31', '%Y-%m-%d');
+---+
| str_to_date('2014-13-31', '%Y-%m-%d') |
+---+
| NULL  |
+---+
1 row in set, 1 warning (0.00 sec)
{code}


MySQL/SparkSQL - it seems returning {{NULL}}. MySQL seems not supporting 
omitting the format.

Hive/Postgres - it seems returning calculated dates. Hive seems not supporting 
the format. Postgres seems not supporting omitting the format.


was (Author: hyukjin.kwon):
- Hive

{code}
hive> SELECT to_date('2014-31-12');
2016-07-12
...

hive> SELECT to_date('2014-12-32');
2015-01-01
...

hive> SELECT to_date('2014-12-31');
2014-12-31
{code}

{code}
hive> SELECT to_date('2015-07-22', '-dd-MM')
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:7 Arguments length 
mismatch ''-dd-MM'': to_date() requires 1 argument, got 2
...
{code}

- Postgres

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
postgres=# SELECT to_date('2014-12-31', '-MM-dd');
  to_date

 2014-12-31
(1 row)

postgres=# SELECT to_date('2014-13-31', '-MM-dd');
  to_date

 2015-01-31
(1 row)
{code}


- Spark

{code}
spark-sql> SELECT to_date('2014-31-12');
NULL
...

spark-sql> SELECT to_date('2014-12-32');
NULL
...

spark-sql> SELECT to_date('2014-12-31');
2014-12-31
...
{code}

{code}
spark-sql> SELECT to_date('2015-07-22', '-dd-MM')
2016-10-07
...
{code}

- MySQL

{code}
mysql> SELECT str_to_date('2014-12-31', '%Y-%m-%d');
+---+
| str_to_date('2014-12-31', '%Y-%m-%d') |
+---+
| 2014-12-31|
+---+
1 row in set (0.00 sec)

mysql> SELECT str_to_date('2014-13-31', '%Y-%m-%d');
+---+
| str_to_date('2014-13-31', '%Y-%m-%d') |
+---+
| NULL  |
+---+
1 row in set, 1 warning (0.00 sec)
{code}


MySQL/SparkSQL - it seems returning {{NULL}}. MySQL seems not supporting 
omitting the format.

Hive/Postgres - it seems returning calculated dates. Hive seems not supporting 
the format. Postgres seems not supporting omitting the format.

> to_date with format has weird behavior
> --
>
> Key: SPARK-19496
> URL: https://issues.apache.org/jira/browse/SPARK-19496
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>

[jira] [Comment Edited] (SPARK-19496) to_date with format has weird behavior

2017-02-07 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857459#comment-15857459
 ] 

Hyukjin Kwon edited comment on SPARK-19496 at 2/8/17 6:12 AM:
--

- Hive

{code}
hive> SELECT to_date('2014-31-12');
2016-07-12
...

hive> SELECT to_date('2014-12-32');
2015-01-01
...

hive> SELECT to_date('2014-12-31');
2014-12-31
{code}

{code}
hive> SELECT to_date('2015-07-22', '-dd-MM')
org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:7 Arguments length 
mismatch ''-dd-MM'': to_date() requires 1 argument, got 2
...
{code}

- Postgres

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
postgres=# SELECT to_date('2014-12-31', '-MM-dd');
  to_date

 2014-12-31
(1 row)

postgres=# SELECT to_date('2014-13-31', '-MM-dd');
  to_date

 2015-01-31
(1 row)
{code}


- Spark

{code}
spark-sql> SELECT to_date('2014-31-12');
NULL
...

spark-sql> SELECT to_date('2014-12-32');
NULL
...

spark-sql> SELECT to_date('2014-12-31');
2014-12-31
...
{code}

{code}
spark-sql> SELECT to_date('2015-07-22', '-dd-MM')
2016-10-07
...
{code}

- MySQL

{code}
mysql> SELECT str_to_date('2014-12-31', '%Y-%m-%d');
+---+
| str_to_date('2014-12-31', '%Y-%m-%d') |
+---+
| 2014-12-31|
+---+
1 row in set (0.00 sec)

mysql> SELECT str_to_date('2014-13-31', '%Y-%m-%d');
+---+
| str_to_date('2014-13-31', '%Y-%m-%d') |
+---+
| NULL  |
+---+
1 row in set, 1 warning (0.00 sec)
{code}


MySQL/SparkSQL - it seems returning {{NULL}}. MySQL seems not supporting 
omitting the format.

Hive/Postgres - it seems returning calculated dates. Hive seems not supporting 
the format. Postgres seems not supporting omitting the format.


was (Author: hyukjin.kwon):
- Hive

{code}
hive> SELECT to_date('2014-31-12');
2016-07-12
...

hive> SELECT to_date('2014-12-32');
2015-01-01
...

hive> SELECT to_date('2014-12-31');
2014-12-31
{code}


- Postgres

{code}
postgres=# SELECT to_date('2014-12-31');
ERROR:  function to_date(unknown) does not exist
LINE 1: SELECT to_date('2014-12-31');
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.
postgres=# SELECT to_date('2014-12-31', '-MM-dd');
  to_date

 2014-12-31
(1 row)

postgres=# SELECT to_date('2014-13-31', '-MM-dd');
  to_date

 2015-01-31
(1 row)
{code}


- Spark

{code}
spark-sql> SELECT to_date('2014-31-12');
NULL
...

spark-sql> SELECT to_date('2014-12-32');
NULL
...

spark-sql> SELECT to_date('2014-12-31');
2014-12-31
...
{code}


- MySQL

{code}
mysql> SELECT str_to_date('2014-12-31', '%Y-%m-%d');
+---+
| str_to_date('2014-12-31', '%Y-%m-%d') |
+---+
| 2014-12-31|
+---+
1 row in set (0.00 sec)

mysql> SELECT str_to_date('2014-13-31', '%Y-%m-%d');
+---+
| str_to_date('2014-13-31', '%Y-%m-%d') |
+---+
| NULL  |
+---+
1 row in set, 1 warning (0.00 sec)
{code}


MySQL/SparkSQL - it seems returning {{NULL}}. 

Hive/Postgres - it seems returning calculated dates.

> to_date with format has weird behavior
> --
>
> Key: SPARK-19496
> URL: https://issues.apache.org/jira/browse/SPARK-19496
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Wenchen Fan
>
> Today, if we run
> {code}
> SELECT to_date('2015-07-22', '-dd-MM')
> {code}
> will result to `2016-10-07`, while running
> {code}
> SELECT to_date('2014-31-12')   # default format
> {code}
> will return null.
> this behavior is weird and we should check other systems like hive to see if 
> this is expected.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org