Hi Team,

I wish to store timestamp(eg: 2020-10-13 13:01:0) in parquet using avro schema 
to load to hive.
but I don’t see any option to do so, I pass it as a string, but hive fails to 
read it as it is expecting timestamp
I am using ParquetIO from apache beam to write output files in parquet format.

avro schema:
{
                "type": "record",
                "name": "TestRecord",
                "namespace": "com.test-v2.avro",
                "fields": [{
                                "name": "reason",
                                "type": "string"
                }, {
                                "name": "event",
                                "type": "string"
                }, {
                                "name": "timestamp",
                                "type": "string"
                }]
}
output Event:
{
                "reason": "Invalid",
                "event": "dropped",
                "timestamp": "2020-10-13 13:01:05"
}
Hive schema is
reason ->  varchar(100),
event -> varchar(100),
timestamp -> timestamp

but when trying to query hive it fails to read timestamp column since input 
event has string type.

I see this Jira in place https://issues.apache.org/jira/browse/AVRO-2924

And I believe the code change can be done here :  
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/LogicalTypes.java

Correct me if I am wrong.

Thanks,
Julius

Reply via email to