Re: Dataframe reader does not read microseconds, but TimestampType supports microseconds

2018-07-02 Thread Jörn Franke
How do you read the files ? Do you have some source code ? It could be related 
to the Json data source.

What Spark version do you use?

> On 2. Jul 2018, at 09:03, Colin Williams  
> wrote:
> 
> I'm confused as to why Sparks Dataframe reader does not support reading json 
> or similar with microsecond timestamps to microseconds, but instead reads 
> into millis.
> 
> This seems strange when the TimestampType supports microseconds.
> 
> For example create a schema for a json object with a column of TimestampType. 
> Then read data from that column with timestamps with microseconds like 
> 
> 2018-05-13 20:25:34.153712
> 
> 2018-05-13T20:25:37.348006
> 
> You will end up with timestamps with millisecond precision. 
> 
> E.G. 2018-05-13 20:25:34.153
> 
> 
> 
> When reading about TimestampType: The data type representing 
> java.sql.Timestamp values. Please use the singleton DataTypes.TimestampType. 
> 
> java.sql.timestamp provides a method that reads timestamps like 
> Timestamp.valueOf("2018-05-13 20:25:37.348006") including milliseconds.
> 
> So why does Spark's DataFrame reader drop the ball on this?


Dataframe reader does not read microseconds, but TimestampType supports microseconds

2018-07-02 Thread Colin Williams
I'm confused as to why Sparks Dataframe reader does not support reading
json or similar with microsecond timestamps to microseconds, but instead
reads into millis.

This seems strange when the TimestampType supports microseconds.

For example create a schema for a json object with a column of
TimestampType. Then read data from that column with timestamps with
microseconds like

2018-05-13 20:25:34.153712

2018-05-13T20:25:37.348006

You will end up with timestamps with millisecond precision.

E.G. 2018-05-13 20:25:34.153



When reading about TimestampType: The data type representing
java.sql.Timestamp values. Please use the singleton DataTypes.TimestampType.


java.sql.timestamp provides a method that reads timestamps like
Timestamp.valueOf("2018-05-13 20:25:37.348006") including milliseconds.

So why does Spark's DataFrame reader drop the ball on this?