[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-31 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950745#comment-15950745
 ] 

Navya Krishnappa commented on SPARK-20152:
--

[~srowen] & [~hyukjin.kwon] Thank you for your comments. 

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950149#comment-15950149
 ] 

Hyukjin Kwon commented on SPARK-20152:
--

I think the correct usage is as below:

{code}
scala> new 
java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z")
res15: java.util.Date = Tue Mar 21 09:00:00 KST 2017
{code}

I should have left some comments there maybe. At that time I introduce this in 
SPARK-16216, I used {{ZZ}} as specified in {{FastDateFormat}} to support "ISO 
8601 extended format time zones" (see 
https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/time/FastDateFormat.html).
 I am sorry I kind of tend to trust Apache ones more ... maybe I had to use 
{{SimpleDateFormat}} with thread-local instead.

After this gets merged, I realised it seems {{FastDateFormat}} has a bug about 
supporting {{XXX}} format specified in 
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html -
 https://issues.apache.org/jira/browse/LANG-1101 and it seems fixed in 3.4.

IIRC, I used this format for that reason and the commons-lang3 version was 
3.3.2 at that time. After few months, in favour of SPARK-17985, it is bumped up 
and now it should be fixed and I think you can use {{XXX}} as below:

{code}
scala> import org.apache.commons.lang3.time.FastDateFormat
import org.apache.commons.lang3.time.FastDateFormat

scala> 
FastDateFormat.getInstance("-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z")
res0: java.util.Date = Tue Mar 21 09:00:00 KST 2017
{code}

The related test was added in commons here - 
https://github.com/apache/commons-lang/commit/bdb074610c87a210ea4c0d91d579cb4558f4b19f

To cut this short, I think this issue is resolvable, and I think we can replace 
the default format to {{XXX}} by default now instead of {{ZZ}} which is 
{{FastDateFormat}}-specific up to my knowledge.


> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949486#comment-15949486
 ] 

Sean Owen commented on SPARK-20152:
---

That does not seem to parse, not in Java/Scala even:
{code}
scala> new 
java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSZZ").parse("2017-03-21T00:00:00Z")
java.text.ParseException: Unparseable date: "2017-03-21T00:00:00Z"
  at java.text.DateFormat.parse(DateFormat.java:366)
  ... 29 elided
{code}

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949399#comment-15949399
 ] 

Navya Krishnappa commented on SPARK-20152:
--

But if we specify timestampformat: "-MM-dd'T'HH:mm:ss.SSSZZ" and parse 
"2017-03-21T00:00:00Z", it is working fine. Same scenario is not applied while 
parsing "03-21-2017T03:30:02Z" with "MM-dd-'T'HH:mm:ss.SSSZZ" format.  Let 
me know if my inputs are wrong.

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949028#comment-15949028
 ] 

Sean Owen commented on SPARK-20152:
---

Ah OK the format is OK, it just means the timezone must be in "+hhmm" format. 
Your input doesn't match the ".SSS" or "ZZ" part of your pattern though. 
"03-21-2017T03:30:02.000+0100" parses correctly according to your pattern. See 
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html again 
to see how timezones are parsed in this pattern. "Z" isn't accepted.

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Navya Krishnappa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948989#comment-15948989
 ] 

Navya Krishnappa commented on SPARK-20152:
--

According to the spark "-MM-dd'T'HH:mm:ss.SSSZZ" is default timestamp 
format. In examples, i have swapped the date fields. And I'm using valid 
letters in my format.

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2: 
> Dataset dataset = getSqlContext().read() 
> .option(PARSER_LIB, "commons") 
> .option(INFER_SCHEMA, "true") 
> .option(DELIMITER, ",") 
> .option(QUOTE, "\"") 
> .option(ESCAPE, "\\") 
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") 
> .option(MODE, Mode.PERMISSIVE) 
> .csv(sourceFile); 
> Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"

2017-03-30 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948739#comment-15948739
 ] 

Sean Owen commented on SPARK-20152:
---

That does not look like a valid timezone format: 
https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html

> Time zone is not respected while parsing csv for timeStampFormat 
> "MM-dd-'T'HH:mm:ss.SSSZZ"
> --
>
> Key: SPARK-20152
> URL: https://issues.apache.org/jira/browse/SPARK-20152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Navya Krishnappa
>
> When reading the below mentioned time value by specifying the 
> "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored.
> Source File: 
> TimeColumn
> 03-21-2017T03:30:02Z
> Source code1:
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but 
> expected result is TimeCoumn should be of "TimestampType"  and should 
> consider time zone for manipulation
> Source code2:
> Dataset dataset = getSqlContext().read()
> .option(DAWBConstant.PARSER_LIB, "commons")
> .option(INFER_SCHEMA, "true")
> .option(DAWBConstant.DELIMITER, ",")
> .option(DAWBConstant.QUOTE, "\"")
> .option(DAWBConstant.ESCAPE, "\\")
> .option("timestampFormat" , "MM-dd-'T'HH:mm:ss")
> .option(DAWBConstant.MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but 
> expected result is TimeCoumn should consider time zone for manipulation



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org