Hi Prabhakar
From what I recall, Drill won't consider a provided schema when
querying Parquet because Parquet files bundle their own schema. You
might need to use a SQL function like COALESCE(TRAN_AMOUNT, 1.11) and
possibly put that in a SQL view for reuse.
Regards
James
On 2023/07/11 18:40, Prabhakar Bhosale wrote:
Hi Team,
I am using drill 1.20.1 with parquet files.
I have two parquet files in a directory with one column missing in one
file. When I query the directory it gives me NULL values for all those rows
which are from the file where that column is missing.
But I want a specific value for that column instead of NULL. So I
have created the schema as given below. But even after creating it is still
returning the NULL value. Please let me know what is going wrong.
I have also ensured that storage.table.user_schema_file=true at system
level.
The files are stored on linux mount point.
The name of the missing column is "TRAN_AMOUNT".
The schema is as below
{
"table" : "archive.default.`executions`",
"schema" : {
"type" : "tuple_schema",
"columns" : [
{
"name" : "EXEC_ID",
"type" : "VARCHAR",
"mode" : "OPTIONAL"
},
{
"name" : "CUST_ID",
"type" : "VARCHAR",
"mode" : "OPTIONAL"
},
{
"name" : "CELL_ID",
"type" : "VARCHAR",
"mode" : "OPTIONAL"
},
{
"name" : "TRAN_AMOUNT",
"type" : "FLOAT",
"mode" : "REQUIRED",
"properties" : {
"drill.default" : "1.11"
}
}
]
},
"version" : 1
}