Re: DRILL-1257

Adam Gilmore Thu, 30 Jul 2015 03:17:52 -0700

Interesting.  I'm quite interested in how this would translate into the
creation of Parquet files too, considering Parquet as a format doesn't
support embedded types (as far as I know).  In our implementations, we have
ended up manually checking schemas first before Parquet creation (i.e.
splitting Parquet files).


Do you have any thoughts on that?

On Thu, Jul 30, 2015 at 12:45 PM, Jacques Nadeau <[email protected]> wrote:

> Well the "good news" is that this is such an important issue that we
> recreated it:
>
> https://issues.apache.org/jira/browse/DRILL-3228
>
> :)
>
> We're starting discussions about it now.  Realistically, it will take a
> little time to get right.  Simple promotion is easier to achieve but it
> would only work as long as it was done in the first batch until we fix
> schema change in all the operators.  (This might be good enough for your
> use cases... and could be a good start to this work).
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Jul 29, 2015 at 6:44 PM, Adam Gilmore <[email protected]>
> wrote:
>
> > Wanted to touch base to see what the status was of DRILL-1257.
> >
> > We've run into a few instances where JSON/Mongo data is changing types
> and
> > Drill is unable to query it (e.g. a numeric type becomes a string type).
> >
> > I know this is a pretty massive change with a lot of tough decisions to
> > make on how to handle that, but wanted to see what the roadmap looked
> like
> > - that is, is it in the near future?
> >
> > At the moment I'm trying to work out some sort of temporary fix (i.e..
> > "upgrading" vectors, e.g. converting a float vector to varchar vector in
> my
> > above example).
> >
> > As we're allowing users to run aggregations etc. against their data
> without
> > having knowledge of the schema, we can't really use "all_text_mode" and
> do
> > our own casting (apart from the huge performance degradation associated
> > with it).
> >
>

Re: DRILL-1257

Reply via email to