Github user felixcheung commented on the pull request:
https://github.com/apache/spark/pull/3820#issuecomment-70450284
I've tested this PR but the result seems to be off.
Parquet generated from Hive with timestamp values set by
'from_utc_timestamp('1970-01-01 08:00:00','PST')'
What I see with this PR:
scala> t.take(10).foreach(println(_))
...
15/01/18 22:06:41 INFO NewHadoopRDD: Input split: ParquetInputSplit{part:
file:/users/x/parquetwithtimestamp start: 0 end: 25448 length: 25448 hosts: []
requestedSchema: message root {
optional binary code (UTF8);
optional binary description (UTF8);
optional int32 total_emp;
optional int32 salary;
optional int96 timestamp;
}
readSupportMetadata:
{org.apache.spark.sql.parquet.row.metadata={"type":"struct","fields":[{"name":"code","type":"string","nullable":true,"metadata":{}},{"name":"description","type":"string","nullable":true,"metadata":{}},{"name":"total_emp","type":"integer","nullable":true,"metadata":{}},{"name":"salary","type":"integer","nullable":true,"metadata":{}},{"name":"timestamp","type":"timestamp","nullable":true,"metadata":{}}]},
org.apache.spark.sql.parquet.row.requested_schema={"type":"struct","fields":[{"name":"code","type":"string","nullable":true,"metadata":{}},{"name":"description","type":"string","nullable":true,"metadata":{}},{"name":"total_emp","type":"integer","nullable":true,"metadata":{}},{"name":"salary","type":"integer","nullable":true,"metadata":{}},{"name":"timestamp","type":"timestamp","nullable":true,"metadata":{}}]}}}
15/01/18 22:06:41 WARN ParquetRecordReader: Can not initialize counter due
to context is not a instance of TaskInputOutputContext, but is
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
15/01/18 22:06:41 INFO InternalParquetRecordReader: RecordReader
initialized will read a total of 823 records.
15/01/18 22:06:41 INFO InternalParquetRecordReader: at row 0. reading next
block
15/01/18 22:06:41 INFO CodecPool: Got brand-new decompressor [.snappy]
15/01/18 22:06:41 INFO InternalParquetRecordReader: block read in memory in
27 ms. row count = 823
[00-0000,All Occupations,134354250,40690,1974-01-07 17:58:00.000008896]
[11-0000,Management occupations,6003930,96150,1974-01-07 17:58:00.000008896]
Expect: 1970-01-01 08:00:00
Actual: 1974-01-07 17:58:00.000008896
Any idea?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]