RE: what's the differenct between drill and optiq

Andrew Brust Thu, 28 May 2015 08:09:08 -0700

Absolutely nothing to apologize for, and the below explanation is very helpful.

FWIW, I certainly understood that Hive's use of Calcite offered relatively 
little in the way of type flexibility/late binding, compare to Drill.  I get 
that Drill's entire raison d'etre is around this and never thought that Hive 
"had it too."  It was more a question of my being surprised that the query 
planners had any common technology at all.  I have never coded in Scala or 
Haskell, but I have coded plenty in C#, Pascal and VB, and I can apprecaiute 
the analogy just by having experience with one half of it.

It's part of the reason I think Drill is so cool, and part of the reason why 
MapR did so well in one of Gigaom's last Sector Roadmaps.

My "ponder question" is whether mainstream RDBMSes like Oracle and SQL Server 
will one day add Drill-like late binding functionality.

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Thursday, May 28, 2015 1:10 AM
To: [email protected]
Subject: Re: what's the differenct between drill and optiq

Andrew,

Sorry for being cryptic.  Hanifi is more clear.  My point was directed at the 
differences between where Hive may ultimately go and where Drill is now.  
Hanifi was providing a good summary of where Drill is now.

As he said, Calcite does query parsing and planning.  Ultimately, it will do 
the same for Hive.  Even so, Drill has extended Calcite's planning capabilities 
in ways which are not used by Hive.  These extensions allow Calcite to produce 
plans for the Drill execution engine.  That execution engine is what Hanifi 
meant by flexible distributed columnar execution with late binding.

SQL is not normally a late binding language.  Instead, it shows its long 
heritage by being a very statically typed language.  That static typing is a 
problem in the modern world of flexible data and dealing with this problem is a 
key goal of Drill.

The key technological advance in Drill that enables it to address late typing 
problems is something called the ANY type.  This is essentially a way for the 
parser to punt the problem of resolving the type of some value until the query 
is actually running.  At that point, Drill has an empirical schema available 
for each record batch which can be used to do final code generation and 
optimization.  If the empirical schema changes due to changes in the data being 
processed, that code can be regenerated as needed.

This is a huge philosophical and design change that is hard to just paste onto 
an existing engine.  Just as it would be next to impossible to modify a Pascal 
or Fortran execution environment to do the type inferencing and lazy execution 
that Scala or Haskell do, it is going to be hard to extend Hive's entire 
execution environment to deal with type dynamism.  Simply passing around 
dynamic types will not give performance anywhere near what Drill does because 
of the inevitable cost of type tag dispatching.

To give just the simplest example, suppose you have data that used a column 
named X to hold an integer for a long while and then switched to using a column 
named Y to hold a floating point number.  To deal with this, you might create a 
view which has a case statement that uses the value of X or Y, whichever is 
non-null.  In conventional SQL engines, the query parser and planner would 
generate code for this case statement and it would execute for every record.  
With Drill, almost all record batches would have
*either* X or Y.  Drill would generate different code for those two different 
patterns of data and that code would be generated with the knowledge that X is 
null, or that Y is null.  As such, the optimizer in the code generator would 
actually just completely remove the case statement by evaluating it at code 
generation time.  By pushing that code generation time very late in the 
execution, Drill would have no perceptible penalty relative to uniformly typed 
code, but it would have the ability to deal with non-uniform data.

My original comment was an indefensible shorthand for all of this.  Things 
should be made as simple as possible, but no simpler, as the great man said.

On Wed, May 27, 2015 at 8:32 PM, Andrew Brust < 
[email protected]> wrote:

> That makes sense.  Just having trouble mapping that back on Ted's 
> comment.  But I tend to think that's me and my ignorance.
>
> -----Original Message-----
> From: Hanifi Gunes [mailto:[email protected]]
> Sent: Wednesday, May 27, 2015 4:48 PM
> To: user
> Subject: Re: what's the differenct between drill and optiq
>
> Calcite does parsing & planning of queries. Drill executes in a very 
> flexible distributed columnar fashion with late binding.
>
> On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <[email protected]>
> wrote:
>
> > Andrew,
> >
> > What Hive does not have is the extensions that Drill has that allow 
> > SQL to be type flexible.  The ALL type and all of the implications 
> > both in terms of implementation and user impact it has are a really 
> > big
> deal.
> >
> >
> >
> > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust < 
> > [email protected]> wrote:
> >
> > > Thanks!
> > >
> > > Sent from my phone
> > > <insert witty apology for typos here>
> > >
> > > ----- Reply message -----
> > > From: "PHANI KUMAR YADAVILLI" <[email protected]>
> > > To: "[email protected]" <[email protected]>
> > > Subject: what's the differenct between drill and optiq
> > > Date: Wed, May 27, 2015 8:33 AM
> > >
> > > Yes hive uses calcite. You can refer hive documentation.
> > > On May 27, 2015 6:01 PM, "Andrew Brust" < 
> > > [email protected]>
> > > wrote:
> > >
> > > > Folks at Hortonworks told me that Hive now uses Calcite as well.
> > > > Can anyone here confirm or deny that?
> > > >
> > > > -----Original Message-----
> > > > From: Rajkumar Singh [mailto:[email protected]]
> > > > Sent: Wednesday, May 27, 2015 6:52 AM
> > > > To: [email protected]
> > > > Subject: Re: what's the differenct between drill and optiq
> > > >
> > > > Optiq(now known as calcite) is an api for query parser,planner 
> > > > and optimization, drill uses it for the SQL parsing,validation 
> > > > and optimization.Drill query planner applies its own custom 
> > > > planner rules
> > to
> > > > build the query logical plan.
> > > >
> > > > Rajkumar Singh
> > > >
> > > >
> > > >
> > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <[email protected]> wrote:
> > > > >
> > > > > Hi:
> > > > >
> > > > > I just want to know the difference between drill and optiq.
> > > > >
> > > > >
> > > > > Is drill just 'extend' optiq to support many other 
> > > > > 'stores'(hadoop,
> > > > mongodb, ...)?
> > > > >
> > > > >
> > > > > ---from davy
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

RE: what's the differenct between drill and optiq

Reply via email to