Agreed, and very interesting. Lots of people at Datameer seem impressed by Flink.
I have to look up Kylin... -----Original Message----- From: Jacques Nadeau [mailto:[email protected]] Sent: Thursday, May 28, 2015 1:20 AM To: [email protected] Subject: Re: what's the differenct between drill and optiq Andrew, As others have pointed out there are definitely differences in how each different community project leverages Calcite (remember, Apache Kylin, Phoenix and I believe Flink also use it). Remember, Calcite--at its core--is a developers toolkit that other applications/systems incorporate. While an end user could use Calcite, the most common use is as an embedded library in a broader system. The great news is that the community is working together collaborate on an amazing shared library and framework. -Jacques On Wed, May 27, 2015 at 10:10 PM, Ted Dunning <[email protected]> wrote: > Andrew, > > Sorry for being cryptic. Hanifi is more clear. My point was directed > at the differences between where Hive may ultimately go and where > Drill is now. Hanifi was providing a good summary of where Drill is now. > > As he said, Calcite does query parsing and planning. Ultimately, it > will do the same for Hive. Even so, Drill has extended Calcite's > planning capabilities in ways which are not used by Hive. These > extensions allow Calcite to produce plans for the Drill execution > engine. That execution engine is what Hanifi meant by flexible > distributed columnar execution with late binding. > > SQL is not normally a late binding language. Instead, it shows its > long heritage by being a very statically typed language. That static > typing is a problem in the modern world of flexible data and dealing > with this problem is a key goal of Drill. > > The key technological advance in Drill that enables it to address late > typing problems is something called the ANY type. This is essentially > a way for the parser to punt the problem of resolving the type of some > value until the query is actually running. At that point, Drill has > an empirical schema available for each record batch which can be used > to do final code generation and optimization. If the empirical schema > changes due to changes in the data being processed, that code can be > regenerated as needed. > > This is a huge philosophical and design change that is hard to just > paste onto an existing engine. Just as it would be next to impossible > to modify a Pascal or Fortran execution environment to do the type > inferencing and lazy execution that Scala or Haskell do, it is going > to be hard to extend Hive's entire execution environment to deal with > type dynamism. Simply passing around dynamic types will not give > performance anywhere near what Drill does because of the inevitable cost of > type tag dispatching. > > To give just the simplest example, suppose you have data that used a > column named X to hold an integer for a long while and then switched > to using a column named Y to hold a floating point number. To deal > with this, you might create a view which has a case statement that > uses the value of X or Y, whichever is non-null. In conventional SQL > engines, the query parser and planner would generate code for this > case statement and it would execute for every record. With Drill, > almost all record batches would have > *either* X or Y. Drill would generate different code for those two > different patterns of data and that code would be generated with the > knowledge that X is null, or that Y is null. As such, the optimizer > in the code generator would actually just completely remove the case > statement by evaluating it at code generation time. By pushing that > code generation time very late in the execution, Drill would have no > perceptible penalty relative to uniformly typed code, but it would > have the ability to deal with non-uniform data. > > > My original comment was an indefensible shorthand for all of this. > Things should be made as simple as possible, but no simpler, as the > great man said. > > > On Wed, May 27, 2015 at 8:32 PM, Andrew Brust < > [email protected]> wrote: > > > That makes sense. Just having trouble mapping that back on Ted's > > comment. But I tend to think that's me and my ignorance. > > > > -----Original Message----- > > From: Hanifi Gunes [mailto:[email protected]] > > Sent: Wednesday, May 27, 2015 4:48 PM > > To: user > > Subject: Re: what's the differenct between drill and optiq > > > > Calcite does parsing & planning of queries. Drill executes in a very > > flexible distributed columnar fashion with late binding. > > > > On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <[email protected]> > > wrote: > > > > > Andrew, > > > > > > What Hive does not have is the extensions that Drill has that > > > allow SQL to be type flexible. The ALL type and all of the > > > implications both in terms of implementation and user impact it > > > has are a really big > > deal. > > > > > > > > > > > > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust < > > > [email protected]> wrote: > > > > > > > Thanks! > > > > > > > > Sent from my phone > > > > <insert witty apology for typos here> > > > > > > > > ----- Reply message ----- > > > > From: "PHANI KUMAR YADAVILLI" <[email protected]> > > > > To: "[email protected]" <[email protected]> > > > > Subject: what's the differenct between drill and optiq > > > > Date: Wed, May 27, 2015 8:33 AM > > > > > > > > Yes hive uses calcite. You can refer hive documentation. > > > > On May 27, 2015 6:01 PM, "Andrew Brust" < > > > > [email protected]> > > > > wrote: > > > > > > > > > Folks at Hortonworks told me that Hive now uses Calcite as well. > > > > > Can anyone here confirm or deny that? > > > > > > > > > > -----Original Message----- > > > > > From: Rajkumar Singh [mailto:[email protected]] > > > > > Sent: Wednesday, May 27, 2015 6:52 AM > > > > > To: [email protected] > > > > > Subject: Re: what's the differenct between drill and optiq > > > > > > > > > > Optiq(now known as calcite) is an api for query parser,planner > > > > > and optimization, drill uses it for the SQL parsing,validation > > > > > and optimization.Drill query planner applies its own custom > > > > > planner rules > > > to > > > > > build the query logical plan. > > > > > > > > > > Rajkumar Singh > > > > > > > > > > > > > > > > > > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <[email protected]> wrote: > > > > > > > > > > > > Hi: > > > > > > > > > > > > I just want to know the difference between drill and optiq. > > > > > > > > > > > > > > > > > > Is drill just 'extend' optiq to support many other > > > > > > 'stores'(hadoop, > > > > > mongodb, ...)? > > > > > > > > > > > > > > > > > > ---from davy > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
