Sigh of relief is premature.  Nobody has committed to carrying this
interpretation forward.



On Mon, Dec 14, 2015 at 11:44 AM, Stefán Baxter <[email protected]>
wrote:

> /me sighs of relief
>
> On Mon, Dec 14, 2015 at 7:28 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Actually, even without multiple storage types, this could be radically
> > confusing.
> >
> > If I have many avro files that are partitioned into directories, then
> > queries that use the partitioning to limit the files that I see could
> > include or exclude more recent files that have added a new field.
> >
> > That means that a query would succeed or fail according to which date
> range
> > I use for the query.
> >
> > That seems pretty radically bad.
> >
> >
> >
> >
> > On Mon, Dec 14, 2015 at 9:33 AM, Stefán Baxter <
> [email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > This simply can not be the desired behavior!
> > >
> > > This prevents from using a field from a changing schema with dir0
> > > sub-selection (directory pruning) as the altered/full schema is never
> > part
> > > of the query and it subsequently fails.
> > >
> > > Drill should, IMOP, never have rules that are dependent on the
> underlying
> > > storage type. If the query runs with JSON and Parquet then it should
> work
> > > for Avro as well.
> > >
> > > I'm hoping this strict schema validation is all just a
> misunderstanding.
> > >
> > > Regards,
> > >  -Stefán
> > >
> > > On Mon, Dec 14, 2015 at 3:28 PM, Kamesh <[email protected]>
> wrote:
> > >
> > > > For Avro files, we first construct the schema, and this schema is
> used
> > > for
> > > > validating queries. So, if there are any errors in the query (like
> the
> > > > invalid field references) it will fail fast. As of now, for other
> file
> > > > formats, query validation (checking  for invalid field reference)
> does
> > > not
> > > > happen, and at run time, it constructs the schema for them and hence
> > > nulls
> > > > for invalid fields.
> > > >
> > > >
> > > > On Mon, Dec 14, 2015 at 2:36 PM, Stefán Baxter <
> > > [email protected]>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm getting the following error when querying Avro files:
> > > > >
> > > > > Error: VALIDATION ERROR: From line 1, column 48 to line 1, column
> 57:
> > > > > Column 'some_col' not found in any table
> > > > >
> > > > > It's true that the field is in none of the tables I'm targeting, in
> > > that
> > > > > particular query, but that does not mean that it is in none of the
> > > > possible
> > > > > files I could be querying.
> > > > >
> > > > > We use Avro to get the benefits of the schema but I never expected
> > > Drill
> > > > to
> > > > > enforce it this way.
> > > > >
> > > > > Why do unresolved  columns not return null?
> > > > >
> > > > > This makes no sense to me as I think a fundamental trade of Drill,
> > when
> > > > > trying to eliminate ETL, is to return null for any missing fields.
> > > > >
> > > > > Please advise.
> > > > >
> > > > > Regards,
> > > > >  -Stefán
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Kamesh.
> > > >
> > >
> >
>

Reply via email to