[jira] [Commented] (SPARK-14428) [SQL] Allow more flexibility when parsing dates and timestamps in json datasources
[ https://issues.apache.org/jira/browse/SPARK-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15577994#comment-15577994 ] Hyukjin Kwon commented on SPARK-14428: -- For 1. I guess this was fixed in https://github.com/apache/spark/pull/14279 so we should define the format for read/write. > [SQL] Allow more flexibility when parsing dates and timestamps in json > datasources > -- > > Key: SPARK-14428 > URL: https://issues.apache.org/jira/browse/SPARK-14428 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.1 >Reporter: Michel Lemay >Priority: Minor > Labels: date, features, json, timestamp > > Reading a json with dates and timestamps is limited to predetermined string > formats or long values. > 1) Should be able to set an option on json datasource to parse dates and > timestamps using custom string format. > 2) Should be able to change the interpretation of long values since epoch. > It could support different precisions like days, seconds, milliseconds, > microseconds and nanoseconds. > Something in the lines of : > {code} > object Precision extends Enumeration { > val days, seconds, milliseconds, microseconds, nanoseconds = Value > } > def convertWithPrecision(time: Long, from: Precision.Value, to: > Precision.Value): Long = ... > ... > val dateFormat = parameters.getOrElse("dateFormat", "").trim > val timestampFormat = parameters.getOrElse("timestampFormat", "").trim > val longDatePrecision = getOrElse("longDatePrecision", "days") > val longTimestampPrecision = getOrElse("longTimestampPrecision", > "milliseconds") > {code} > and > {code} > case (VALUE_STRING, DateType) => > val stringValue = parser.getText > val days = if (configOptions.dateFormat.nonEmpty) { > // User defined format, make sure it complies to the SQL DATE > format (number of days) > val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not > thread safe. > DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, > Precision.milliseconds, Precision.days) > } else if (stringValue.forall(_.isDigit)) { > DateTimeUtils.convertWithPrecision(stringValue.toLong, > configOptions.longDatePrecision, Precision.days) > } else { > // The format of this string will probably be "-mm-dd". > > DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime, > Precision.milliseconds, Precision.days) > } > days.toInt > case (VALUE_NUMBER_INT, DateType) => > DateTimeUtils.convertWithPrecision((parser.getLongValue, > configOptions.longDatePrecision, Precision.days).toInt > {code} > With similar handling for Timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14428) [SQL] Allow more flexibility when parsing dates and timestamps in json datasources
[ https://issues.apache.org/jira/browse/SPARK-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229506#comment-15229506 ] Hyukjin Kwon commented on SPARK-14428: -- I can work on this if it is decided to be supported. (I am working on CSV one for this, https://github.com/apache/spark/pull/11550) > [SQL] Allow more flexibility when parsing dates and timestamps in json > datasources > -- > > Key: SPARK-14428 > URL: https://issues.apache.org/jira/browse/SPARK-14428 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.1 >Reporter: Michel Lemay >Priority: Minor > Labels: date, features, json, timestamp > > Reading a json with dates and timestamps is limited to predetermined string > formats or long values. > 1) Should be able to set an option on json datasource to parse dates and > timestamps using custom string format. > 2) Should be able to change the interpretation of long values since epoch. > It could support different precisions like days, seconds, milliseconds, > microseconds and nanoseconds. > Something in the lines of : > {code} > object Precision extends Enumeration { > val days, seconds, milliseconds, microseconds, nanoseconds = Value > } > def convertWithPrecision(time: Long, from: Precision.Value, to: > Precision.Value): Long = ... > ... > val dateFormat = parameters.getOrElse("dateFormat", "").trim > val timestampFormat = parameters.getOrElse("timestampFormat", "").trim > val longDatePrecision = getOrElse("longDatePrecision", "days") > val longTimestampPrecision = getOrElse("longTimestampPrecision", > "milliseconds") > {code} > and > {code} > case (VALUE_STRING, DateType) => > val stringValue = parser.getText > val days = if (configOptions.dateFormat.nonEmpty) { > // User defined format, make sure it complies to the SQL DATE > format (number of days) > val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not > thread safe. > DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, > Precision.milliseconds, Precision.days) > } else if (stringValue.forall(_.isDigit)) { > DateTimeUtils.convertWithPrecision(stringValue.toLong, > configOptions.longDatePrecision, Precision.days) > } else { > // The format of this string will probably be "-mm-dd". > > DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime, > Precision.milliseconds, Precision.days) > } > days.toInt > case (VALUE_NUMBER_INT, DateType) => > DateTimeUtils.convertWithPrecision((parser.getLongValue, > configOptions.longDatePrecision, Precision.days).toInt > {code} > With similar handling for Timestamps. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org