Hi Deepak,

Drill is "schema-free" (really, schema-on-read). This means Drill behaves 
differently for missing columns than traditional schema-on-write databases. In 
particular, if a reader does not find the requested column in the input file, 
Drill will invent a nullable int column and fill it with nulls. (Drill is 
optimistic; it hopes that some of your files has a value for the column, even 
if some files do not.)

In your first example, you can see the nullable int in the "NullableIntHolder". 
Apparently that type conflicts with an expected type of an untyped null holder. 
(The untyped null holder must be new; I hadn't seen that previously.) Not sure 
where the "UntypedNullHolder" came from.

I would file a JIRA ticket for the first case. The documentation says you can 
provide any scalar type to COALESCE. I'd suppose they have to be of the same 
type, but the documentation does not say so. Certainly COALESCE should accept 
an untyped null. The output of your example should have been a null value of 
type nullable int.

In the second case, the cast converts the nullable int to a nullable varchar. 
The output you see is expected: the NULL value of type nullable int is 
converted to a null value of type nullable varchar, and, according to the 
documentation, "If all expressions evaluate to null, then the COALESCE function 
returns null." 
Thanks,
- Paul

 

    On Thursday, August 23, 2018, 5:14:02 PM PDT, Deepak Kapoor 
<dkapoor...@gmail.com> wrote:  
 
 Would you expect coalesce to return a result when none of the columns in 
coalesce exist in the parquet file ??

Query 1:
select coalesce(cert_statusdd,cert_status33) from dfs.`/tmp/parquetfiles`;
Error: SYSTEM ERROR: CompileException: Line 56, Column 27: Assignment 
conversion not possible from type 
“org.apache.drill.exec.expr.holders.NullableIntHolder” to type 
“org.apache.drill.exec.vector.UntypedNullHolder”

Fragment 1:0

[Error Id: 7b9193fb-289b-4fbf-a52a-2b93b01f0cd0 on dkvm2c:31010] (state=,code=0)

In above case both the columns do not exist.  Is the error reported a valid 
error or is it a bug.

Another observation is that if you use CAST on the column that does not exist, 
Coalesce will not error out instead return nulls. 
Query 2:
select coalesce(cast(cert_statusdd as varchar),cast(cert_statusee as varchar)) 
from dfs.`/tmp/parquetfiles` limit 3;
+---------+
| EXPR$0  |
+---------+
| null    |
| null    |
| null    |
+————+

Note: Databases like postgresql and mysql will not allow COALESCE on non 
existing column, w or without CAST. 
Drill however being schema free, do we expect COALESCE to succeed and give null 
as result ?? In which case the SYSTEM ERROR: CompileException with the 1st 
Quert  be a bug. ??

Thanks.
- Deepak (dkap...@mapr.com <mailto:dkap...@mapr.com>)
  

Reply via email to