Hi Team,
I wish to store timestamp(eg: 2020-10-13 13:01:0) in parquet using avro schema
to load to hive.
but I don’t see any option to do so, I pass it as a string, but hive fails to
read it as it is expecting timestamp
I am using ParquetIO from apache beam to write output files in parquet format.
avro schema:
{
"type": "record",
"name": "TestRecord",
"namespace": "com.test-v2.avro",
"fields": [{
"name": "reason",
"type": "string"
}, {
"name": "event",
"type": "string"
}, {
"name": "timestamp",
"type": "string"
}]
}
output Event:
{
"reason": "Invalid",
"event": "dropped",
"timestamp": "2020-10-13 13:01:05"
}
Hive schema is
reason -> varchar(100),
event -> varchar(100),
timestamp -> timestamp
but when trying to query hive it fails to read timestamp column since input
event has string type.
I see this Jira in place https://issues.apache.org/jira/browse/AVRO-2924
And I believe the code change can be done here :
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/LogicalTypes.java
Correct me if I am wrong.
Thanks,
Julius