[jira] [Commented] (SPARK-14428) [SQL] Allow more flexibility when parsing dates and timestamps in json datasources

2016-10-15 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15577994#comment-15577994
 ] 

Hyukjin Kwon commented on SPARK-14428:
--

For 1. I guess this was fixed in https://github.com/apache/spark/pull/14279 so 
we should define the format for read/write.

> [SQL] Allow more flexibility when parsing dates and timestamps in json 
> datasources
> --
>
> Key: SPARK-14428
> URL: https://issues.apache.org/jira/browse/SPARK-14428
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Michel Lemay
>Priority: Minor
>  Labels: date, features, json, timestamp
>
> Reading a json with dates and timestamps is limited to predetermined string 
> formats or long values.
> 1) Should be able to set an option on json datasource to parse dates and 
> timestamps using custom string format.
> 2) Should be able to change the interpretation of long values since epoch.  
> It could support different precisions like days, seconds, milliseconds, 
> microseconds and nanoseconds.  
> Something in the lines of :
> {code}
> object Precision extends Enumeration {
> val days, seconds, milliseconds, microseconds, nanoseconds = Value
>   }
> def convertWithPrecision(time: Long, from: Precision.Value, to: 
> Precision.Value): Long = ...
> ...
>   val dateFormat = parameters.getOrElse("dateFormat", "").trim
>   val timestampFormat = parameters.getOrElse("timestampFormat", "").trim
>   val longDatePrecision = getOrElse("longDatePrecision", "days")
>   val longTimestampPrecision = getOrElse("longTimestampPrecision", 
> "milliseconds")
> {code}
> and 
> {code}
>   case (VALUE_STRING, DateType) =>
> val stringValue = parser.getText
> val days = if (configOptions.dateFormat.nonEmpty) {
>   // User defined format, make sure it complies to the SQL DATE 
> format (number of days)
>   val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not 
> thread safe.
>   DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, 
> Precision.milliseconds, Precision.days)
> } else if (stringValue.forall(_.isDigit)) {
>   DateTimeUtils.convertWithPrecision(stringValue.toLong, 
> configOptions.longDatePrecision, Precision.days)
> } else {
>   // The format of this string will probably be "-mm-dd".
>   
> DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime,
>  Precision.milliseconds, Precision.days)
> }
> days.toInt
>   case (VALUE_NUMBER_INT, DateType) =>
>   DateTimeUtils.convertWithPrecision((parser.getLongValue, 
> configOptions.longDatePrecision, Precision.days).toInt
> {code}
> With similar handling for Timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14428) [SQL] Allow more flexibility when parsing dates and timestamps in json datasources

2016-04-06 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229506#comment-15229506
 ] 

Hyukjin Kwon commented on SPARK-14428:
--

I can work on this if it is decided to be supported. (I am working on CSV one 
for this, https://github.com/apache/spark/pull/11550)

> [SQL] Allow more flexibility when parsing dates and timestamps in json 
> datasources
> --
>
> Key: SPARK-14428
> URL: https://issues.apache.org/jira/browse/SPARK-14428
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Michel Lemay
>Priority: Minor
>  Labels: date, features, json, timestamp
>
> Reading a json with dates and timestamps is limited to predetermined string 
> formats or long values.
> 1) Should be able to set an option on json datasource to parse dates and 
> timestamps using custom string format.
> 2) Should be able to change the interpretation of long values since epoch.  
> It could support different precisions like days, seconds, milliseconds, 
> microseconds and nanoseconds.  
> Something in the lines of :
> {code}
> object Precision extends Enumeration {
> val days, seconds, milliseconds, microseconds, nanoseconds = Value
>   }
> def convertWithPrecision(time: Long, from: Precision.Value, to: 
> Precision.Value): Long = ...
> ...
>   val dateFormat = parameters.getOrElse("dateFormat", "").trim
>   val timestampFormat = parameters.getOrElse("timestampFormat", "").trim
>   val longDatePrecision = getOrElse("longDatePrecision", "days")
>   val longTimestampPrecision = getOrElse("longTimestampPrecision", 
> "milliseconds")
> {code}
> and 
> {code}
>   case (VALUE_STRING, DateType) =>
> val stringValue = parser.getText
> val days = if (configOptions.dateFormat.nonEmpty) {
>   // User defined format, make sure it complies to the SQL DATE 
> format (number of days)
>   val sdf = new SimpleDateFormat(configOptions.dateFormat) // Not 
> thread safe.
>   DateTimeUtils.convertWithPrecision(sdf.parse(stringValue).getTime, 
> Precision.milliseconds, Precision.days)
> } else if (stringValue.forall(_.isDigit)) {
>   DateTimeUtils.convertWithPrecision(stringValue.toLong, 
> configOptions.longDatePrecision, Precision.days)
> } else {
>   // The format of this string will probably be "-mm-dd".
>   
> DateTimeUtils.convertWithPrecision(DateTimeUtils.stringToTime(parser.getText).getTime,
>  Precision.milliseconds, Precision.days)
> }
> days.toInt
>   case (VALUE_NUMBER_INT, DateType) =>
>   DateTimeUtils.convertWithPrecision((parser.getLongValue, 
> configOptions.longDatePrecision, Precision.days).toInt
> {code}
> With similar handling for Timestamps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org