[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950745#comment-15950745 ] Navya Krishnappa commented on SPARK-20152: -- [~srowen] & [~hyukjin.kwon] Thank you for your comments. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950149#comment-15950149 ] Hyukjin Kwon commented on SPARK-20152: -- I think the correct usage is as below: {code} scala> new java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z") res15: java.util.Date = Tue Mar 21 09:00:00 KST 2017 {code} I should have left some comments there maybe. At that time I introduce this in SPARK-16216, I used {{ZZ}} as specified in {{FastDateFormat}} to support "ISO 8601 extended format time zones" (see https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/time/FastDateFormat.html). I am sorry I kind of tend to trust Apache ones more ... maybe I had to use {{SimpleDateFormat}} with thread-local instead. After this gets merged, I realised it seems {{FastDateFormat}} has a bug about supporting {{XXX}} format specified in https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html - https://issues.apache.org/jira/browse/LANG-1101 and it seems fixed in 3.4. IIRC, I used this format for that reason and the commons-lang3 version was 3.3.2 at that time. After few months, in favour of SPARK-17985, it is bumped up and now it should be fixed and I think you can use {{XXX}} as below: {code} scala> import org.apache.commons.lang3.time.FastDateFormat import org.apache.commons.lang3.time.FastDateFormat scala> FastDateFormat.getInstance("-MM-dd'T'HH:mm:ss.SSSXXX").parse("2017-03-21T00:00:00.000Z") res0: java.util.Date = Tue Mar 21 09:00:00 KST 2017 {code} The related test was added in commons here - https://github.com/apache/commons-lang/commit/bdb074610c87a210ea4c0d91d579cb4558f4b19f To cut this short, I think this issue is resolvable, and I think we can replace the default format to {{XXX}} by default now instead of {{ZZ}} which is {{FastDateFormat}}-specific up to my knowledge. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949486#comment-15949486 ] Sean Owen commented on SPARK-20152: --- That does not seem to parse, not in Java/Scala even: {code} scala> new java.text.SimpleDateFormat("-MM-dd'T'HH:mm:ss.SSSZZ").parse("2017-03-21T00:00:00Z") java.text.ParseException: Unparseable date: "2017-03-21T00:00:00Z" at java.text.DateFormat.parse(DateFormat.java:366) ... 29 elided {code} > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949399#comment-15949399 ] Navya Krishnappa commented on SPARK-20152: -- But if we specify timestampformat: "-MM-dd'T'HH:mm:ss.SSSZZ" and parse "2017-03-21T00:00:00Z", it is working fine. Same scenario is not applied while parsing "03-21-2017T03:30:02Z" with "MM-dd-'T'HH:mm:ss.SSSZZ" format. Let me know if my inputs are wrong. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949028#comment-15949028 ] Sean Owen commented on SPARK-20152: --- Ah OK the format is OK, it just means the timezone must be in "+hhmm" format. Your input doesn't match the ".SSS" or "ZZ" part of your pattern though. "03-21-2017T03:30:02.000+0100" parses correctly according to your pattern. See https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html again to see how timezones are parsed in this pattern. "Z" isn't accepted. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948989#comment-15948989 ] Navya Krishnappa commented on SPARK-20152: -- According to the spark "-MM-dd'T'HH:mm:ss.SSSZZ" is default timestamp format. In examples, i have swapped the date fields. And I'm using valid letters in my format. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948739#comment-15948739 ] Sean Owen commented on SPARK-20152: --- That does not look like a valid timezone format: https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org