Re: table schema for parquet file is not working

James Turton Wed, 12 Jul 2023 04:03:57 -0700

Hi Prabhakar

From what I recall, Drill won't consider a provided schema whenquerying Parquet because Parquet files bundle their own schema. Youmight need to use a SQL function like COALESCE(TRAN_AMOUNT, 1.11) andpossibly put that in a SQL view for reuse.


Regards
James

On 2023/07/11 18:40, Prabhakar Bhosale wrote:

Hi Team,
I am using drill 1.20.1 with parquet files.

I have two parquet files in a directory with one column missing in one
file. When I query the directory it gives me NULL values for all those rows
which are from the file where that column is missing.

But I want a specific value for that column instead of NULL. So I
have created the schema as given below. But even after creating it is still
returning the NULL value. Please let me know what is going wrong.

I have also ensured that storage.table.user_schema_file=true at system
level.

The files are stored on linux mount point.
The name of the missing column is "TRAN_AMOUNT".



The schema is as below

{
   "table" : "archive.default.`executions`",
   "schema" : {
     "type" : "tuple_schema",
     "columns" : [
       {
         "name" : "EXEC_ID",
         "type" : "VARCHAR",
         "mode" : "OPTIONAL"
       },
       {
         "name" : "CUST_ID",
         "type" : "VARCHAR",
         "mode" : "OPTIONAL"
       },
       {
         "name" : "CELL_ID",
         "type" : "VARCHAR",
         "mode" : "OPTIONAL"
       },
       {
         "name" : "TRAN_AMOUNT",
         "type" : "FLOAT",
         "mode" : "REQUIRED",
         "properties" : {
           "drill.default" : "1.11"
     }
       }
     ]
   },
   "version" : 1
}

Re: table schema for parquet file is not working

Reply via email to