Re: what's the differenct between drill and optiq

Ted Dunning Thu, 28 May 2015 16:09:43 -0700

Flink is very impressive (I helped bring them to Apache and maintain close
contacts with the project founders).


Flink is also very nicely complementary to Drill in that it brings a new
kind of execution environment.  This environment has some very cool
capabilities that might work well in Drill.  It will be exciting to watch
and see what really works in Flink and where they find success.





On Thu, May 28, 2015 at 8:09 AM, Andrew Brust <
[email protected]> wrote:

> Agreed, and very interesting.  Lots of people at Datameer seem impressed
> by Flink.
>
> I have to look up Kylin...
>
> -----Original Message-----
> From: Jacques Nadeau [mailto:[email protected]]
> Sent: Thursday, May 28, 2015 1:20 AM
> To: [email protected]
> Subject: Re: what's the differenct between drill and optiq
>
> Andrew,
>
> As others have pointed out there are definitely differences in how each
> different community project leverages Calcite (remember, Apache Kylin,
> Phoenix and I believe Flink also use it).  Remember, Calcite--at its
> core--is a developers toolkit that other applications/systems incorporate.
> While an end user could use Calcite, the most common use is as an embedded
> library in a broader system.
>
> The great news is that the community is working together collaborate on an
> amazing shared library and framework.
>
> -Jacques
>
>
>
> On Wed, May 27, 2015 at 10:10 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Andrew,
> >
> > Sorry for being cryptic.  Hanifi is more clear.  My point was directed
> > at the differences between where Hive may ultimately go and where
> > Drill is now.  Hanifi was providing a good summary of where Drill is now.
> >
> > As he said, Calcite does query parsing and planning.  Ultimately, it
> > will do the same for Hive.  Even so, Drill has extended Calcite's
> > planning capabilities in ways which are not used by Hive.  These
> > extensions allow Calcite to produce plans for the Drill execution
> > engine.  That execution engine is what Hanifi meant by flexible
> > distributed columnar execution with late binding.
> >
> > SQL is not normally a late binding language.  Instead, it shows its
> > long heritage by being a very statically typed language.  That static
> > typing is a problem in the modern world of flexible data and dealing
> > with this problem is a key goal of Drill.
> >
> > The key technological advance in Drill that enables it to address late
> > typing problems is something called the ANY type.  This is essentially
> > a way for the parser to punt the problem of resolving the type of some
> > value until the query is actually running.  At that point, Drill has
> > an empirical schema available for each record batch which can be used
> > to do final code generation and optimization.  If the empirical schema
> > changes due to changes in the data being processed, that code can be
> > regenerated as needed.
> >
> > This is a huge philosophical and design change that is hard to just
> > paste onto an existing engine.  Just as it would be next to impossible
> > to modify a Pascal or Fortran execution environment to do the type
> > inferencing and lazy execution that Scala or Haskell do, it is going
> > to be hard to extend Hive's entire execution environment to deal with
> > type dynamism.  Simply passing around dynamic types will not give
> > performance anywhere near what Drill does because of the inevitable cost
> of type tag dispatching.
> >
> > To give just the simplest example, suppose you have data that used a
> > column named X to hold an integer for a long while and then switched
> > to using a column named Y to hold a floating point number.  To deal
> > with this, you might create a view which has a case statement that
> > uses the value of X or Y, whichever is non-null.  In conventional SQL
> > engines, the query parser and planner would generate code for this
> > case statement and it would execute for every record.  With Drill,
> > almost all record batches would have
> > *either* X or Y.  Drill would generate different code for those two
> > different patterns of data and that code would be generated with the
> > knowledge that X is null, or that Y is null.  As such, the optimizer
> > in the code generator would actually just completely remove the case
> > statement by evaluating it at code generation time.  By pushing that
> > code generation time very late in the execution, Drill would have no
> > perceptible penalty relative to uniformly typed code, but it would
> > have the ability to deal with non-uniform data.
> >
> >
> > My original comment was an indefensible shorthand for all of this.
> > Things should be made as simple as possible, but no simpler, as the
> > great man said.
> >
> >
> > On Wed, May 27, 2015 at 8:32 PM, Andrew Brust <
> > [email protected]> wrote:
> >
> > > That makes sense.  Just having trouble mapping that back on Ted's
> > > comment.  But I tend to think that's me and my ignorance.
> > >
> > > -----Original Message-----
> > > From: Hanifi Gunes [mailto:[email protected]]
> > > Sent: Wednesday, May 27, 2015 4:48 PM
> > > To: user
> > > Subject: Re: what's the differenct between drill and optiq
> > >
> > > Calcite does parsing & planning of queries. Drill executes in a very
> > > flexible distributed columnar fashion with late binding.
> > >
> > > On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <[email protected]>
> > > wrote:
> > >
> > > > Andrew,
> > > >
> > > > What Hive does not have is the extensions that Drill has that
> > > > allow SQL to be type flexible.  The ALL type and all of the
> > > > implications both in terms of implementation and user impact it
> > > > has are a really big
> > > deal.
> > > >
> > > >
> > > >
> > > > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust <
> > > > [email protected]> wrote:
> > > >
> > > > > Thanks!
> > > > >
> > > > > Sent from my phone
> > > > > <insert witty apology for typos here>
> > > > >
> > > > > ----- Reply message -----
> > > > > From: "PHANI KUMAR YADAVILLI" <[email protected]>
> > > > > To: "[email protected]" <[email protected]>
> > > > > Subject: what's the differenct between drill and optiq
> > > > > Date: Wed, May 27, 2015 8:33 AM
> > > > >
> > > > > Yes hive uses calcite. You can refer hive documentation.
> > > > > On May 27, 2015 6:01 PM, "Andrew Brust" <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Folks at Hortonworks told me that Hive now uses Calcite as well.
> > > > > > Can anyone here confirm or deny that?
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Rajkumar Singh [mailto:[email protected]]
> > > > > > Sent: Wednesday, May 27, 2015 6:52 AM
> > > > > > To: [email protected]
> > > > > > Subject: Re: what's the differenct between drill and optiq
> > > > > >
> > > > > > Optiq(now known as calcite) is an api for query parser,planner
> > > > > > and optimization, drill uses it for the SQL parsing,validation
> > > > > > and optimization.Drill query planner applies its own custom
> > > > > > planner rules
> > > > to
> > > > > > build the query logical plan.
> > > > > >
> > > > > > Rajkumar Singh
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <[email protected]>
> wrote:
> > > > > > >
> > > > > > > Hi:
> > > > > > >
> > > > > > > I just want to know the difference between drill and optiq.
> > > > > > >
> > > > > > >
> > > > > > > Is drill just 'extend' optiq to support many other
> > > > > > > 'stores'(hadoop,
> > > > > > mongodb, ...)?
> > > > > > >
> > > > > > >
> > > > > > > ---from davy
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: what's the differenct between drill and optiq

Reply via email to