Re: Avro - Schema is good - Schema validation is bad

Stefán Baxter Mon, 14 Dec 2015 09:36:36 -0800

Hi,

This simply can not be the desired behavior!


This prevents from using a field from a changing schema with dir0
sub-selection (directory pruning) as the altered/full schema is never part
of the query and it subsequently fails.

Drill should, IMOP, never have rules that are dependent on the underlying
storage type. If the query runs with JSON and Parquet then it should work
for Avro as well.

I'm hoping this strict schema validation is all just a misunderstanding.

Regards,
 -Stefán

On Mon, Dec 14, 2015 at 3:28 PM, Kamesh <[email protected]> wrote:

> For Avro files, we first construct the schema, and this schema is used for
> validating queries. So, if there are any errors in the query (like the
> invalid field references) it will fail fast. As of now, for other file
> formats, query validation (checking  for invalid field reference) does not
> happen, and at run time, it constructs the schema for them and hence nulls
> for invalid fields.
>
>
> On Mon, Dec 14, 2015 at 2:36 PM, Stefán Baxter <[email protected]>
> wrote:
>
> > Hi,
> >
> > I'm getting the following error when querying Avro files:
> >
> > Error: VALIDATION ERROR: From line 1, column 48 to line 1, column 57:
> > Column 'some_col' not found in any table
> >
> > It's true that the field is in none of the tables I'm targeting, in that
> > particular query, but that does not mean that it is in none of the
> possible
> > files I could be querying.
> >
> > We use Avro to get the benefits of the schema but I never expected Drill
> to
> > enforce it this way.
> >
> > Why do unresolved  columns not return null?
> >
> > This makes no sense to me as I think a fundamental trade of Drill, when
> > trying to eliminate ETL, is to return null for any missing fields.
> >
> > Please advise.
> >
> > Regards,
> >  -Stefán
> >
>
>
>
> --
> Kamesh.
>

Re: Avro - Schema is good - Schema validation is bad

Reply via email to