RE: [PROPOSAL] Optiq

2014-05-16 Thread Kasper Sørensen
Good section. I do agree to what it says and somehow hope we can eventually 
help each other out with e.g. a library of adaptors.

-Original Message-
From: Julian Hyde [mailto:julianh...@gmail.com] 
Sent: 8. maj 2014 20:03
To: general@incubator.apache.org
Subject: Re: [PROPOSAL] Optiq

The "Relationships with Other Apache Products" section has been updated to 
cover Optiq's functional overlaps with existing Apache projects.

https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products

Julian

On May 2, 2014, at 11:23 AM, Henry Saputra  wrote:

> Ah sorry, I did not mean "asking to update", I meant "proposing to update".
> 
> Thanks,
> 
> - Henry
> 
> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra  
> wrote:
>> HI Ashutosh,
>> 
>> Since there was a question/ comment about relationship with Apache 
>> MetaModel, I am asking to update the proposal to include this 
>> discussion in either "Relationships with Other Apache Products" or 
>> "Alignment" section before going for a VOTE.
>> 
>> Apache Slider did the same thing with relation to Apache Twill and 
>> Apache Helix projects.
>> 
>> Thanks,
>> 
>> - Henry
>> 
>> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan  
>> wrote:
>>> I would like to propose Optiq as an Apache Incubator project.  I 
>>> have posted the proposal to 
>>> https://wiki.apache.org/incubator/OptiqProposal and posted the text of the 
>>> proposal below.
>>> 
>>> Ashutosh.
>>> 
>>> = Optiq =
>>> == Abstract ==
>>> 
>>> Optiq is a framework that allows efficient translation of queries 
>>> involving heterogeneous and federated data.
>>> 
>>> == Proposal ==
>>> 
>>> Optiq is a highly customizable engine for parsing and planning 
>>> queries on data in a wide variety of formats. It allows 
>>> database-like access, and in particular a SQL interface and advanced 
>>> query optimization, for data not residing in a traditional database.
>>> 
>>> == Background ==
>>> 
>>> Databases were traditionally engineered in a monolithic stack, 
>>> providing a data storage format, data processing algorithms, query 
>>> parser, query planner, built-in functions, metadata repository and 
>>> connectivity layer.
>>> They innovate in some areas but rarely in all.
>>> 
>>> Modern data management systems are decomposing that stack into 
>>> separate components, separating data, processing engine, metadata, 
>>> and query language support. They are highly heterogeneous, with data 
>>> in multiple locations and formats, caching and redundant data, 
>>> different workloads, and processing occurring in different engines.
>>> 
>>> Query planning (sometimes called query optimization) has always been 
>>> a key function of a DBMS, because it allows the implementors to 
>>> introduce new query-processing algorithms, and allows data 
>>> administrators to re-organize the data without affecting 
>>> applications built on that data. In a componentized system, the 
>>> query planner integrates the components (data formats, engines, 
>>> algorithms) without introducing unncessary coupling or performance 
>>> tradeoffs.
>>> 
>>> But building a query planner is hard; many systems muddle along 
>>> without a planner, and indeed a SQL interface, until the demand from 
>>> their customers is overwhelming.
>>> 
>>> There is an opportunity to make this process more efficient by 
>>> creating a re-usable framework.
>>> 
>>> == Rationale ==
>>> 
>>> Optiq allows database-like access, and in particular a SQL interface 
>>> and advanced query optimization, for data not residing in a 
>>> traditional database. It is complementary to many current Hadoop and 
>>> NoSQL systems, which have innovative and performant storage and 
>>> runtime systems but lack a SQL interface and intelligent query translation.
>>> 
>>> Optiq is already in use by several projects, including Apache Drill, 
>>> Apache Hive and Cascading Lingual, and commercial products.
>>> 
>>> Optiq's architecture consists of:
>>> 
>>> An extensible relational algebra.
>>> SPIs (service-provider interfaces) for metadata (schemas and 
>>> tables), planner rules, statistics, cost-estimates, user-defined functions.
>>> Built-in sets of rules for logical transformations a

Re: [PROPOSAL] Optiq

2014-05-16 Thread Henry Saputra
Yes, thanks updating the proposal. Really appreciate it.

- Henry

On Thu, May 8, 2014 at 11:03 AM, Julian Hyde  wrote:
> The “Relationships with Other Apache Products” section has been updated to 
> cover Optiq’s functional overlaps with existing Apache projects.
>
> https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products
>
> Julian
>
> On May 2, 2014, at 11:23 AM, Henry Saputra  wrote:
>
>> Ah sorry, I did not mean "asking to update", I meant "proposing to update".
>>
>> Thanks,
>>
>> - Henry
>>
>> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra  
>> wrote:
>>> HI Ashutosh,
>>>
>>> Since there was a question/ comment about relationship with Apache
>>> MetaModel, I am asking to update the proposal to include this
>>> discussion in either "Relationships with Other Apache Products" or
>>> "Alignment" section before going for a VOTE.
>>>
>>> Apache Slider did the same thing with relation to Apache Twill and
>>> Apache Helix projects.
>>>
>>> Thanks,
>>>
>>> - Henry
>>>
>>> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan  
>>> wrote:
 I would like to propose Optiq as an Apache Incubator project.  I have
 posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
 posted the text of the proposal below.

 Ashutosh.

 = Optiq =
 == Abstract ==

 Optiq is a framework that allows efficient translation of queries involving
 heterogeneous and federated data.

 == Proposal ==

 Optiq is a highly customizable engine for parsing and planning queries on
 data in a wide variety of formats. It allows database-like access, and in
 particular a SQL interface and advanced query optimization, for data not
 residing in a traditional database.

 == Background ==

 Databases were traditionally engineered in a monolithic stack, providing a
 data storage format, data processing algorithms, query parser, query
 planner, built-in functions, metadata repository and connectivity layer.
 They innovate in some areas but rarely in all.

 Modern data management systems are decomposing that stack into separate
 components, separating data, processing engine, metadata, and query
 language support. They are highly heterogeneous, with data in multiple
 locations and formats, caching and redundant data, different workloads, and
 processing occurring in different engines.

 Query planning (sometimes called query optimization) has always been a key
 function of a DBMS, because it allows the implementors to introduce new
 query-processing algorithms, and allows data administrators to re-organize
 the data without affecting applications built on that data. In a
 componentized system, the query planner integrates the components (data
 formats, engines, algorithms) without introducing unncessary coupling or
 performance tradeoffs.

 But building a query planner is hard; many systems muddle along without a
 planner, and indeed a SQL interface, until the demand from their customers
 is overwhelming.

 There is an opportunity to make this process more efficient by creating a
 re-usable framework.

 == Rationale ==

 Optiq allows database-like access, and in particular a SQL interface and
 advanced query optimization, for data not residing in a traditional
 database. It is complementary to many current Hadoop and NoSQL systems,
 which have innovative and performant storage and runtime systems but lack a
 SQL interface and intelligent query translation.

 Optiq is already in use by several projects, including Apache Drill, Apache
 Hive and Cascading Lingual, and commercial products.

 Optiq's architecture consists of:

 An extensible relational algebra.
 SPIs (service-provider interfaces) for metadata (schemas and tables),
 planner rules, statistics, cost-estimates, user-defined functions.
 Built-in sets of rules for logical transformations and common data-sources.
 Two query planning engines driven by rules, statistics, etc. One engine is
 cost-based, the other rule-based.
 Optional SQL parser, validator and translator to relational algebra.
 Optional JDBC driver.
 == Initial Goals ==

 The initial goals are be to move the existing codebase to Apache and
 integrate with the Apache development process. Once this is accomplished,
 we plan for incremental development and releases that follow the Apache
 guidelines.

 As we move the code into the org.apache namespace, we will restructure
 components as necessary to allow clients to use just the components of
 Optiq that they need.

 A version 1.0 release, including pre-built binaries, will foster wider
 adoption.

 == Current Status ==

 Optiq has had over a dozen minor releases over the last 18 months. Its core
 SQL par

Re: [PROPOSAL] Optiq

2014-05-11 Thread Julian Hyde
The “Relationships with Other Apache Products” section has been updated to 
cover Optiq’s functional overlaps with existing Apache projects.

https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products

Julian

On May 2, 2014, at 11:23 AM, Henry Saputra  wrote:

> Ah sorry, I did not mean "asking to update", I meant "proposing to update".
> 
> Thanks,
> 
> - Henry
> 
> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra  
> wrote:
>> HI Ashutosh,
>> 
>> Since there was a question/ comment about relationship with Apache
>> MetaModel, I am asking to update the proposal to include this
>> discussion in either "Relationships with Other Apache Products" or
>> "Alignment" section before going for a VOTE.
>> 
>> Apache Slider did the same thing with relation to Apache Twill and
>> Apache Helix projects.
>> 
>> Thanks,
>> 
>> - Henry
>> 
>> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan  
>> wrote:
>>> I would like to propose Optiq as an Apache Incubator project.  I have
>>> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
>>> posted the text of the proposal below.
>>> 
>>> Ashutosh.
>>> 
>>> = Optiq =
>>> == Abstract ==
>>> 
>>> Optiq is a framework that allows efficient translation of queries involving
>>> heterogeneous and federated data.
>>> 
>>> == Proposal ==
>>> 
>>> Optiq is a highly customizable engine for parsing and planning queries on
>>> data in a wide variety of formats. It allows database-like access, and in
>>> particular a SQL interface and advanced query optimization, for data not
>>> residing in a traditional database.
>>> 
>>> == Background ==
>>> 
>>> Databases were traditionally engineered in a monolithic stack, providing a
>>> data storage format, data processing algorithms, query parser, query
>>> planner, built-in functions, metadata repository and connectivity layer.
>>> They innovate in some areas but rarely in all.
>>> 
>>> Modern data management systems are decomposing that stack into separate
>>> components, separating data, processing engine, metadata, and query
>>> language support. They are highly heterogeneous, with data in multiple
>>> locations and formats, caching and redundant data, different workloads, and
>>> processing occurring in different engines.
>>> 
>>> Query planning (sometimes called query optimization) has always been a key
>>> function of a DBMS, because it allows the implementors to introduce new
>>> query-processing algorithms, and allows data administrators to re-organize
>>> the data without affecting applications built on that data. In a
>>> componentized system, the query planner integrates the components (data
>>> formats, engines, algorithms) without introducing unncessary coupling or
>>> performance tradeoffs.
>>> 
>>> But building a query planner is hard; many systems muddle along without a
>>> planner, and indeed a SQL interface, until the demand from their customers
>>> is overwhelming.
>>> 
>>> There is an opportunity to make this process more efficient by creating a
>>> re-usable framework.
>>> 
>>> == Rationale ==
>>> 
>>> Optiq allows database-like access, and in particular a SQL interface and
>>> advanced query optimization, for data not residing in a traditional
>>> database. It is complementary to many current Hadoop and NoSQL systems,
>>> which have innovative and performant storage and runtime systems but lack a
>>> SQL interface and intelligent query translation.
>>> 
>>> Optiq is already in use by several projects, including Apache Drill, Apache
>>> Hive and Cascading Lingual, and commercial products.
>>> 
>>> Optiq's architecture consists of:
>>> 
>>> An extensible relational algebra.
>>> SPIs (service-provider interfaces) for metadata (schemas and tables),
>>> planner rules, statistics, cost-estimates, user-defined functions.
>>> Built-in sets of rules for logical transformations and common data-sources.
>>> Two query planning engines driven by rules, statistics, etc. One engine is
>>> cost-based, the other rule-based.
>>> Optional SQL parser, validator and translator to relational algebra.
>>> Optional JDBC driver.
>>> == Initial Goals ==
>>> 
>>> The initial goals are be to move the existing codebase to Apache and
>>> integrate with the Apache development process. Once this is accomplished,
>>> we plan for incremental development and releases that follow the Apache
>>> guidelines.
>>> 
>>> As we move the code into the org.apache namespace, we will restructure
>>> components as necessary to allow clients to use just the components of
>>> Optiq that they need.
>>> 
>>> A version 1.0 release, including pre-built binaries, will foster wider
>>> adoption.
>>> 
>>> == Current Status ==
>>> 
>>> Optiq has had over a dozen minor releases over the last 18 months. Its core
>>> SQL parser and validator, and its planning engine and core rules, are
>>> mature and robust and are the basis for several production systems; but
>>> other components and SPIs are still undergoing rapid evolution.
>>> 
>>>

Re: [PROPOSAL] Optiq

2014-05-10 Thread Ashutosh Chauhan
Now that discussion is settling down, I will start a vote thread shortly.


On Mon, May 5, 2014 at 3:22 PM, Ashutosh Chauhan wrote:

> Thanks everyone for great feedback. With Julian's help I have updated the
> section "Relationships with Other Apache projects" so that folks can  get a
> sense where Optiq stands w.r.t other projects going on at ASF.
>
> Thanks,
> Ashutosh
>
>
> On Fri, May 2, 2014 at 11:23 AM, Henry Saputra wrote:
>
>> Ah sorry, I did not mean "asking to update", I meant "proposing to
>> update".
>>
>> Thanks,
>>
>> - Henry
>>
>> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra 
>> wrote:
>> > HI Ashutosh,
>> >
>> > Since there was a question/ comment about relationship with Apache
>> > MetaModel, I am asking to update the proposal to include this
>> > discussion in either "Relationships with Other Apache Products" or
>> > "Alignment" section before going for a VOTE.
>> >
>> > Apache Slider did the same thing with relation to Apache Twill and
>> > Apache Helix projects.
>> >
>> > Thanks,
>> >
>> > - Henry
>> >
>> > On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan 
>> wrote:
>> >> I would like to propose Optiq as an Apache Incubator project.  I have
>> >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland
>> >> posted the text of the proposal below.
>> >>
>> >> Ashutosh.
>> >>
>> >> = Optiq =
>> >> == Abstract ==
>> >>
>> >> Optiq is a framework that allows efficient translation of queries
>> involving
>> >> heterogeneous and federated data.
>> >>
>> >> == Proposal ==
>> >>
>> >> Optiq is a highly customizable engine for parsing and planning queries
>> on
>> >> data in a wide variety of formats. It allows database-like access, and
>> in
>> >> particular a SQL interface and advanced query optimization, for data
>> not
>> >> residing in a traditional database.
>> >>
>> >> == Background ==
>> >>
>> >> Databases were traditionally engineered in a monolithic stack,
>> providing a
>> >> data storage format, data processing algorithms, query parser, query
>> >> planner, built-in functions, metadata repository and connectivity
>> layer.
>> >> They innovate in some areas but rarely in all.
>> >>
>> >> Modern data management systems are decomposing that stack into separate
>> >> components, separating data, processing engine, metadata, and query
>> >> language support. They are highly heterogeneous, with data in multiple
>> >> locations and formats, caching and redundant data, different
>> workloads, and
>> >> processing occurring in different engines.
>> >>
>> >> Query planning (sometimes called query optimization) has always been a
>> key
>> >> function of a DBMS, because it allows the implementors to introduce new
>> >> query-processing algorithms, and allows data administrators to
>> re-organize
>> >> the data without affecting applications built on that data. In a
>> >> componentized system, the query planner integrates the components (data
>> >> formats, engines, algorithms) without introducing unncessary coupling
>> or
>> >> performance tradeoffs.
>> >>
>> >> But building a query planner is hard; many systems muddle along
>> without a
>> >> planner, and indeed a SQL interface, until the demand from their
>> customers
>> >> is overwhelming.
>> >>
>> >> There is an opportunity to make this process more efficient by
>> creating a
>> >> re-usable framework.
>> >>
>> >> == Rationale ==
>> >>
>> >> Optiq allows database-like access, and in particular a SQL interface
>> and
>> >> advanced query optimization, for data not residing in a traditional
>> >> database. It is complementary to many current Hadoop and NoSQL systems,
>> >> which have innovative and performant storage and runtime systems but
>> lack a
>> >> SQL interface and intelligent query translation.
>> >>
>> >> Optiq is already in use by several projects, including Apache Drill,
>> Apache
>> >> Hive and Cascading Lingual, and commercial products.
>> >>
>> >> Optiq's architecture consists of:
>> >>
>> >> An extensible relational algebra.
>> >> SPIs (service-provider interfaces) for metadata (schemas and tables),
>> >> planner rules, statistics, cost-estimates, user-defined functions.
>> >> Built-in sets of rules for logical transformations and common
>> data-sources.
>> >> Two query planning engines driven by rules, statistics, etc. One
>> engine is
>> >> cost-based, the other rule-based.
>> >> Optional SQL parser, validator and translator to relational algebra.
>> >> Optional JDBC driver.
>> >> == Initial Goals ==
>> >>
>> >> The initial goals are be to move the existing codebase to Apache and
>> >> integrate with the Apache development process. Once this is
>> accomplished,
>> >> we plan for incremental development and releases that follow the Apache
>> >> guidelines.
>> >>
>> >> As we move the code into the org.apache namespace, we will restructure
>> >> components as necessary to allow clients to use just the components of
>> >> Optiq that they need.
>> >>
>> >> A version 1.0 release, including pre-built binaries, 

Re: [PROPOSAL] Optiq

2014-05-02 Thread Roman Shaposhnik
On Fri, May 2, 2014 at 11:18 AM, Andrew Purtell  wrote:
> All that I suggest is that candidate Apache projects articulate how they
> differ from related projects, and that we consider the strength of this
> argument when evaluating the long term viability of the effort and
> community. It would be good if proposals have a "related work" section done
> with the diligence and detail as the typical academic publication, I
> haven't seen that at least recently.

Thanks Andrew for articulating it even more clearly -- this is exactly
the extra bit of of info I was suggesting we add to the template.

IOW, an explicit informational section may help bring clarity not
only for the casual IPCM members, but also to the folks proposing
a new project in the first place.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Optiq

2014-05-02 Thread Andrew Purtell
I realize this is a discussion about Optiq in particular, so please pardon
the detour. I won't continue this discussion in this thread further.

On the subject of Optiq, I'd be +1 for incubation considering that at least
two Apache projects incorporate it substantially, and a third is
considering it. It would benefit them and Optiq I should hope.

Getting back to the question of admitting projects with a high degree of
overlap. Or even a fork. In my opinion, this should be considered more
carefully and with a less liberal attitude than I've seen. It would be
unfortunate for incubation to serve as a tool for end runs against well
functioning Apache communities, where the differences are commercial
externalities not technical matters or personal issues between individuals.
What are the substantive technical differences is quite important to
determine. Hand waving shouldn't be sufficient. Hypothetically, maybe the
core difference is not technical but instead the initial committer list is
stacked with individuals from a single organization, and the proposal is
for an as yet undeveloped codebase. Or otherwise rooted in the control
freakery of a third party. The Foundation can become a tool for competition
against healthy projects that has nothing to do with code or abstractions
or personal differences. I think this betrays the Apache Way. Maybe I'm in
an ethical minority.



On Fri, May 2, 2014 at 11:18 AM, Andrew Purtell  wrote:

> All that I suggest is that candidate Apache projects articulate how they
> differ from related projects, and that we consider the strength of this
> argument when evaluating the long term viability of the effort and
> community. It would be good if proposals have a "related work" section done
> with the diligence and detail as the typical academic publication, I
> haven't seen that at least recently.
>
> Differences in project direction leading to new projects (effectively,
> sanctioned forks) is fine, although regrettable, since that would represent
> an acknowledged failure of the Apache community process. "Creative
> competition" between differing abstractions is fine. Etc. But if I come to
> Apache to set up Apache Foo, with presumably the focus and care on
> community development a motivating factor for that (otherwise why shouldn't
> I just go to GitHub?), then if later the Incubator admits Apache FooBar
> (incubating) and Apache FooBaz (incubating) that significantly overlap and
> duplicate my efforts - overriding my concerns or objections - then I'd be
> inclined to not view Apache as a particularly good steward of my community
> development. The devil is in the details, which takes me back to the point
> made in the above paragraph.
>
>
>
>
>
>
>
> On Fri, May 2, 2014 at 10:52 AM, Chris Douglas wrote:
>
>> On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell 
>> wrote:
>> > If not part of the initial proposal, then
>> > at least making a good case as a criteria for graduation, and writing up
>> > related work and how the new project differentiates could be an initial
>> > task done on JIRA after acceptance along the lines of the trademark
>> search.
>>
>> I see this differently. Project overlap (particularly in the
>> incubator) is neither surprising nor regrettable. Recently we've seen
>> several SQL, streaming, and security projects. While these are all
>> mature domains, the "best practices" are still being explored. Each
>> branch in architecture may accommodate a new project, and each path
>> through those tradeoffs will define those communities. They'll also
>> define each other; by way of illustration, a project that's a subset
>> of another becomes the "lightweight" implementation. If the enthusiasm
>> for a project wanes, that's not a tragedy the incubator can prevent by
>> forcing alignment based on the goal of the project. Rejecting a
>> community will not cause them to join an existing one; they'll just
>> leave Apache.
>>
>> More than losing an opportunity to foster a community, a policy
>> favoring consolidation would actively harm innovation and
>> experimentation. A requirement for uniqueness would reward first
>> movers and leave no outlet for legitimate differences in project
>> direction. Granting existing projects authority over prospective
>> communities _because_ they compete is not an optimization. As we saw
>> with HCatalog, sometimes revolutions don't become distinct communities
>> and the effort is reabsorbed. The incubator should continue to support
>> that natural process.
>>
>> Finally, it's not surprising that the incubator will see projects with
>> similar goals in waves. The need for new abstractions is experienced
>> jointly and solutions are explored concurrently. That's a feature of
>> the incubator, not a bug.
>>
>> Articulating the project's "related work" is a useful exercise, which
>> is why it's a section in the proposal. -C
>>
>> > On Thu, May 1, 2014 at 2:22 PM, Henry Saputra > >wrote:
>> >
>> >> Unfortunately, similar projects entering A

Re: [PROPOSAL] Optiq

2014-05-02 Thread Chris Douglas
All fair points. However (as your example demonstrates), referring to
this duplication as "failure" instead of evolution biases the
incubator to protect existing projects. Putting new projects on the
defensive is almost always unfair, unless they're literally forking an
existing project.

As you say, the details are more important than the general point, but
the default lamentation over duplication is, in my view, misguided.
More concretely, the proposal is required to fill out a "related work"
section. We don't need new processes, particularly if that section is
fleshed out in threads like this one. -C

On Fri, May 2, 2014 at 11:18 AM, Andrew Purtell  wrote:
> All that I suggest is that candidate Apache projects articulate how they
> differ from related projects, and that we consider the strength of this
> argument when evaluating the long term viability of the effort and
> community. It would be good if proposals have a "related work" section done
> with the diligence and detail as the typical academic publication, I
> haven't seen that at least recently.
>
> Differences in project direction leading to new projects (effectively,
> sanctioned forks) is fine, although regrettable, since that would represent
> an acknowledged failure of the Apache community process. "Creative
> competition" between differing abstractions is fine. Etc. But if I come to
> Apache to set up Apache Foo, with presumably the focus and care on
> community development a motivating factor for that (otherwise why shouldn't
> I just go to GitHub?), then if later the Incubator admits Apache FooBar
> (incubating) and Apache FooBaz (incubating) that significantly overlap and
> duplicate my efforts - overriding my concerns or objections - then I'd be
> inclined to not view Apache as a particularly good steward of my community
> development. The devil is in the details, which takes me back to the point
> made in the above paragraph.
>
>
>
>
>
>
>
> On Fri, May 2, 2014 at 10:52 AM, Chris Douglas  wrote:
>
>> On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell 
>> wrote:
>> > If not part of the initial proposal, then
>> > at least making a good case as a criteria for graduation, and writing up
>> > related work and how the new project differentiates could be an initial
>> > task done on JIRA after acceptance along the lines of the trademark
>> search.
>>
>> I see this differently. Project overlap (particularly in the
>> incubator) is neither surprising nor regrettable. Recently we've seen
>> several SQL, streaming, and security projects. While these are all
>> mature domains, the "best practices" are still being explored. Each
>> branch in architecture may accommodate a new project, and each path
>> through those tradeoffs will define those communities. They'll also
>> define each other; by way of illustration, a project that's a subset
>> of another becomes the "lightweight" implementation. If the enthusiasm
>> for a project wanes, that's not a tragedy the incubator can prevent by
>> forcing alignment based on the goal of the project. Rejecting a
>> community will not cause them to join an existing one; they'll just
>> leave Apache.
>>
>> More than losing an opportunity to foster a community, a policy
>> favoring consolidation would actively harm innovation and
>> experimentation. A requirement for uniqueness would reward first
>> movers and leave no outlet for legitimate differences in project
>> direction. Granting existing projects authority over prospective
>> communities _because_ they compete is not an optimization. As we saw
>> with HCatalog, sometimes revolutions don't become distinct communities
>> and the effort is reabsorbed. The incubator should continue to support
>> that natural process.
>>
>> Finally, it's not surprising that the incubator will see projects with
>> similar goals in waves. The need for new abstractions is experienced
>> jointly and solutions are explored concurrently. That's a feature of
>> the incubator, not a bug.
>>
>> Articulating the project's "related work" is a useful exercise, which
>> is why it's a section in the proposal. -C
>>
>> > On Thu, May 1, 2014 at 2:22 PM, Henry Saputra > >wrote:
>> >
>> >> Unfortunately, similar projects entering Apache incubator are common
>> >> things =(
>> >>
>> >> Even though each original project proposers can argue about
>> >> differences in one way or another, it will eventually decided by
>> >> adoption and community growth, and at the end the quality of the
>> >> project itself.
>> >>
>> >> Some other incoming projects had been in similar questions/concerns
>> >> regarding "competing" with existing ASF projects, e.g.: Twill vs
>> >> Slider, Samza vs Storm vs S4, and several others.
>> >>
>> >>
>> >> - Henry
>> >>
>> >> On Thu, May 1, 2014 at 12:14 AM, Ted Dunning 
>> >> wrote:
>> >> > I think that there is a huge difference between Metamodel and Optiq.
>> >> >
>> >> > In particular:
>> >> >
>> >> > - Optiq provides real SQL including nested queries, correlated
>> >> sub-q

Re: [PROPOSAL] Optiq

2014-05-02 Thread Henry Saputra
Ah sorry, I did not mean "asking to update", I meant "proposing to update".

Thanks,

- Henry

On Fri, May 2, 2014 at 11:20 AM, Henry Saputra  wrote:
> HI Ashutosh,
>
> Since there was a question/ comment about relationship with Apache
> MetaModel, I am asking to update the proposal to include this
> discussion in either "Relationships with Other Apache Products" or
> "Alignment" section before going for a VOTE.
>
> Apache Slider did the same thing with relation to Apache Twill and
> Apache Helix projects.
>
> Thanks,
>
> - Henry
>
> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan  
> wrote:
>> I would like to propose Optiq as an Apache Incubator project.  I have
>> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
>> posted the text of the proposal below.
>>
>> Ashutosh.
>>
>> = Optiq =
>> == Abstract ==
>>
>> Optiq is a framework that allows efficient translation of queries involving
>> heterogeneous and federated data.
>>
>> == Proposal ==
>>
>> Optiq is a highly customizable engine for parsing and planning queries on
>> data in a wide variety of formats. It allows database-like access, and in
>> particular a SQL interface and advanced query optimization, for data not
>> residing in a traditional database.
>>
>> == Background ==
>>
>> Databases were traditionally engineered in a monolithic stack, providing a
>> data storage format, data processing algorithms, query parser, query
>> planner, built-in functions, metadata repository and connectivity layer.
>> They innovate in some areas but rarely in all.
>>
>> Modern data management systems are decomposing that stack into separate
>> components, separating data, processing engine, metadata, and query
>> language support. They are highly heterogeneous, with data in multiple
>> locations and formats, caching and redundant data, different workloads, and
>> processing occurring in different engines.
>>
>> Query planning (sometimes called query optimization) has always been a key
>> function of a DBMS, because it allows the implementors to introduce new
>> query-processing algorithms, and allows data administrators to re-organize
>> the data without affecting applications built on that data. In a
>> componentized system, the query planner integrates the components (data
>> formats, engines, algorithms) without introducing unncessary coupling or
>> performance tradeoffs.
>>
>> But building a query planner is hard; many systems muddle along without a
>> planner, and indeed a SQL interface, until the demand from their customers
>> is overwhelming.
>>
>> There is an opportunity to make this process more efficient by creating a
>> re-usable framework.
>>
>> == Rationale ==
>>
>> Optiq allows database-like access, and in particular a SQL interface and
>> advanced query optimization, for data not residing in a traditional
>> database. It is complementary to many current Hadoop and NoSQL systems,
>> which have innovative and performant storage and runtime systems but lack a
>> SQL interface and intelligent query translation.
>>
>> Optiq is already in use by several projects, including Apache Drill, Apache
>> Hive and Cascading Lingual, and commercial products.
>>
>> Optiq's architecture consists of:
>>
>> An extensible relational algebra.
>> SPIs (service-provider interfaces) for metadata (schemas and tables),
>> planner rules, statistics, cost-estimates, user-defined functions.
>> Built-in sets of rules for logical transformations and common data-sources.
>> Two query planning engines driven by rules, statistics, etc. One engine is
>> cost-based, the other rule-based.
>> Optional SQL parser, validator and translator to relational algebra.
>> Optional JDBC driver.
>> == Initial Goals ==
>>
>> The initial goals are be to move the existing codebase to Apache and
>> integrate with the Apache development process. Once this is accomplished,
>> we plan for incremental development and releases that follow the Apache
>> guidelines.
>>
>> As we move the code into the org.apache namespace, we will restructure
>> components as necessary to allow clients to use just the components of
>> Optiq that they need.
>>
>> A version 1.0 release, including pre-built binaries, will foster wider
>> adoption.
>>
>> == Current Status ==
>>
>> Optiq has had over a dozen minor releases over the last 18 months. Its core
>> SQL parser and validator, and its planning engine and core rules, are
>> mature and robust and are the basis for several production systems; but
>> other components and SPIs are still undergoing rapid evolution.
>>
>> === Meritocracy ===
>>
>> We plan to invest in supporting a meritocracy. We will discuss the
>> requirements in an open forum. We encourage the companies and projects
>> using Optiq to discuss their requirements in an open forum and to
>> participate in development. We will encourage and monitor community
>> participation so that privileges can be extended to those that contribute.
>>
>> Optiq's pluggable architecture encourages develope

Re: [PROPOSAL] Optiq

2014-05-02 Thread Henry Saputra
HI Ashutosh,

Since there was a question/ comment about relationship with Apache
MetaModel, I am asking to update the proposal to include this
discussion in either "Relationships with Other Apache Products" or
"Alignment" section before going for a VOTE.

Apache Slider did the same thing with relation to Apache Twill and
Apache Helix projects.

Thanks,

- Henry

On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan  wrote:
> I would like to propose Optiq as an Apache Incubator project.  I have
> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
> posted the text of the proposal below.
>
> Ashutosh.
>
> = Optiq =
> == Abstract ==
>
> Optiq is a framework that allows efficient translation of queries involving
> heterogeneous and federated data.
>
> == Proposal ==
>
> Optiq is a highly customizable engine for parsing and planning queries on
> data in a wide variety of formats. It allows database-like access, and in
> particular a SQL interface and advanced query optimization, for data not
> residing in a traditional database.
>
> == Background ==
>
> Databases were traditionally engineered in a monolithic stack, providing a
> data storage format, data processing algorithms, query parser, query
> planner, built-in functions, metadata repository and connectivity layer.
> They innovate in some areas but rarely in all.
>
> Modern data management systems are decomposing that stack into separate
> components, separating data, processing engine, metadata, and query
> language support. They are highly heterogeneous, with data in multiple
> locations and formats, caching and redundant data, different workloads, and
> processing occurring in different engines.
>
> Query planning (sometimes called query optimization) has always been a key
> function of a DBMS, because it allows the implementors to introduce new
> query-processing algorithms, and allows data administrators to re-organize
> the data without affecting applications built on that data. In a
> componentized system, the query planner integrates the components (data
> formats, engines, algorithms) without introducing unncessary coupling or
> performance tradeoffs.
>
> But building a query planner is hard; many systems muddle along without a
> planner, and indeed a SQL interface, until the demand from their customers
> is overwhelming.
>
> There is an opportunity to make this process more efficient by creating a
> re-usable framework.
>
> == Rationale ==
>
> Optiq allows database-like access, and in particular a SQL interface and
> advanced query optimization, for data not residing in a traditional
> database. It is complementary to many current Hadoop and NoSQL systems,
> which have innovative and performant storage and runtime systems but lack a
> SQL interface and intelligent query translation.
>
> Optiq is already in use by several projects, including Apache Drill, Apache
> Hive and Cascading Lingual, and commercial products.
>
> Optiq's architecture consists of:
>
> An extensible relational algebra.
> SPIs (service-provider interfaces) for metadata (schemas and tables),
> planner rules, statistics, cost-estimates, user-defined functions.
> Built-in sets of rules for logical transformations and common data-sources.
> Two query planning engines driven by rules, statistics, etc. One engine is
> cost-based, the other rule-based.
> Optional SQL parser, validator and translator to relational algebra.
> Optional JDBC driver.
> == Initial Goals ==
>
> The initial goals are be to move the existing codebase to Apache and
> integrate with the Apache development process. Once this is accomplished,
> we plan for incremental development and releases that follow the Apache
> guidelines.
>
> As we move the code into the org.apache namespace, we will restructure
> components as necessary to allow clients to use just the components of
> Optiq that they need.
>
> A version 1.0 release, including pre-built binaries, will foster wider
> adoption.
>
> == Current Status ==
>
> Optiq has had over a dozen minor releases over the last 18 months. Its core
> SQL parser and validator, and its planning engine and core rules, are
> mature and robust and are the basis for several production systems; but
> other components and SPIs are still undergoing rapid evolution.
>
> === Meritocracy ===
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. We encourage the companies and projects
> using Optiq to discuss their requirements in an open forum and to
> participate in development. We will encourage and monitor community
> participation so that privileges can be extended to those that contribute.
>
> Optiq's pluggable architecture encourages developers to contribute
> extensions such as adapters for data sources, new planning rules, and
> better statistics and cost-estimation functions. We look forward to
> fostering a rich ecosystem of extensions.
>
> === Community ===
>
> Building a data management system requires a hi

Re: [PROPOSAL] Optiq

2014-05-02 Thread Andrew Purtell
All that I suggest is that candidate Apache projects articulate how they
differ from related projects, and that we consider the strength of this
argument when evaluating the long term viability of the effort and
community. It would be good if proposals have a "related work" section done
with the diligence and detail as the typical academic publication, I
haven't seen that at least recently.

Differences in project direction leading to new projects (effectively,
sanctioned forks) is fine, although regrettable, since that would represent
an acknowledged failure of the Apache community process. "Creative
competition" between differing abstractions is fine. Etc. But if I come to
Apache to set up Apache Foo, with presumably the focus and care on
community development a motivating factor for that (otherwise why shouldn't
I just go to GitHub?), then if later the Incubator admits Apache FooBar
(incubating) and Apache FooBaz (incubating) that significantly overlap and
duplicate my efforts - overriding my concerns or objections - then I'd be
inclined to not view Apache as a particularly good steward of my community
development. The devil is in the details, which takes me back to the point
made in the above paragraph.







On Fri, May 2, 2014 at 10:52 AM, Chris Douglas  wrote:

> On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell 
> wrote:
> > If not part of the initial proposal, then
> > at least making a good case as a criteria for graduation, and writing up
> > related work and how the new project differentiates could be an initial
> > task done on JIRA after acceptance along the lines of the trademark
> search.
>
> I see this differently. Project overlap (particularly in the
> incubator) is neither surprising nor regrettable. Recently we've seen
> several SQL, streaming, and security projects. While these are all
> mature domains, the "best practices" are still being explored. Each
> branch in architecture may accommodate a new project, and each path
> through those tradeoffs will define those communities. They'll also
> define each other; by way of illustration, a project that's a subset
> of another becomes the "lightweight" implementation. If the enthusiasm
> for a project wanes, that's not a tragedy the incubator can prevent by
> forcing alignment based on the goal of the project. Rejecting a
> community will not cause them to join an existing one; they'll just
> leave Apache.
>
> More than losing an opportunity to foster a community, a policy
> favoring consolidation would actively harm innovation and
> experimentation. A requirement for uniqueness would reward first
> movers and leave no outlet for legitimate differences in project
> direction. Granting existing projects authority over prospective
> communities _because_ they compete is not an optimization. As we saw
> with HCatalog, sometimes revolutions don't become distinct communities
> and the effort is reabsorbed. The incubator should continue to support
> that natural process.
>
> Finally, it's not surprising that the incubator will see projects with
> similar goals in waves. The need for new abstractions is experienced
> jointly and solutions are explored concurrently. That's a feature of
> the incubator, not a bug.
>
> Articulating the project's "related work" is a useful exercise, which
> is why it's a section in the proposal. -C
>
> > On Thu, May 1, 2014 at 2:22 PM, Henry Saputra  >wrote:
> >
> >> Unfortunately, similar projects entering Apache incubator are common
> >> things =(
> >>
> >> Even though each original project proposers can argue about
> >> differences in one way or another, it will eventually decided by
> >> adoption and community growth, and at the end the quality of the
> >> project itself.
> >>
> >> Some other incoming projects had been in similar questions/concerns
> >> regarding "competing" with existing ASF projects, e.g.: Twill vs
> >> Slider, Samza vs Storm vs S4, and several others.
> >>
> >>
> >> - Henry
> >>
> >> On Thu, May 1, 2014 at 12:14 AM, Ted Dunning 
> >> wrote:
> >> > I think that there is a huge difference between Metamodel and Optiq.
> >> >
> >> > In particular:
> >> >
> >> > - Optiq provides real SQL including nested queries, correlated
> >> sub-queries
> >> > and so on
> >> >
> >> > - Metamodel uses a fluent Java API ... SQL parsing and transformation
> >> > doesn't appear to be a goal
> >> >
> >> > - Optiq provides highly advanced query transformations including
> >> > decorrelations based on estimated execution costs.
> >> >
> >> > - Metamodel appears to provide no significant query transformations
> >> >
> >> > - Optiq only provides query execution as a by-product for testing
> >> >
> >> > - Metamodel has query execution as a central goal
> >> >
> >> > - Optiq provides a form of type inferencing for SQL queries.  This is
> >> > unique to Optiq as far as I know.
> >> >
> >> >
> >> >
> >> > On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen <
> >> > kasper.soren...@humaninference.com> wrote:
> >> >
> >> >> I see 

Re: [PROPOSAL] Optiq

2014-05-02 Thread Chris Douglas
On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell  wrote:
> If not part of the initial proposal, then
> at least making a good case as a criteria for graduation, and writing up
> related work and how the new project differentiates could be an initial
> task done on JIRA after acceptance along the lines of the trademark search.

I see this differently. Project overlap (particularly in the
incubator) is neither surprising nor regrettable. Recently we've seen
several SQL, streaming, and security projects. While these are all
mature domains, the "best practices" are still being explored. Each
branch in architecture may accommodate a new project, and each path
through those tradeoffs will define those communities. They'll also
define each other; by way of illustration, a project that's a subset
of another becomes the "lightweight" implementation. If the enthusiasm
for a project wanes, that's not a tragedy the incubator can prevent by
forcing alignment based on the goal of the project. Rejecting a
community will not cause them to join an existing one; they'll just
leave Apache.

More than losing an opportunity to foster a community, a policy
favoring consolidation would actively harm innovation and
experimentation. A requirement for uniqueness would reward first
movers and leave no outlet for legitimate differences in project
direction. Granting existing projects authority over prospective
communities _because_ they compete is not an optimization. As we saw
with HCatalog, sometimes revolutions don't become distinct communities
and the effort is reabsorbed. The incubator should continue to support
that natural process.

Finally, it's not surprising that the incubator will see projects with
similar goals in waves. The need for new abstractions is experienced
jointly and solutions are explored concurrently. That's a feature of
the incubator, not a bug.

Articulating the project's "related work" is a useful exercise, which
is why it's a section in the proposal. -C

> On Thu, May 1, 2014 at 2:22 PM, Henry Saputra wrote:
>
>> Unfortunately, similar projects entering Apache incubator are common
>> things =(
>>
>> Even though each original project proposers can argue about
>> differences in one way or another, it will eventually decided by
>> adoption and community growth, and at the end the quality of the
>> project itself.
>>
>> Some other incoming projects had been in similar questions/concerns
>> regarding "competing" with existing ASF projects, e.g.: Twill vs
>> Slider, Samza vs Storm vs S4, and several others.
>>
>>
>> - Henry
>>
>> On Thu, May 1, 2014 at 12:14 AM, Ted Dunning 
>> wrote:
>> > I think that there is a huge difference between Metamodel and Optiq.
>> >
>> > In particular:
>> >
>> > - Optiq provides real SQL including nested queries, correlated
>> sub-queries
>> > and so on
>> >
>> > - Metamodel uses a fluent Java API ... SQL parsing and transformation
>> > doesn't appear to be a goal
>> >
>> > - Optiq provides highly advanced query transformations including
>> > decorrelations based on estimated execution costs.
>> >
>> > - Metamodel appears to provide no significant query transformations
>> >
>> > - Optiq only provides query execution as a by-product for testing
>> >
>> > - Metamodel has query execution as a central goal
>> >
>> > - Optiq provides a form of type inferencing for SQL queries.  This is
>> > unique to Optiq as far as I know.
>> >
>> >
>> >
>> > On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen <
>> > kasper.soren...@humaninference.com> wrote:
>> >
>> >> I see a lot of conceptual similarity between Optiq and the Apache
>> >> MetaModel (incubator) project [1]. Maybe something can be done to align
>> the
>> >> two projects, so that we avoid having two incubating projects that do
>> >> basically the same thing?
>> >>
>> >> Or maybe there's some glaring difference that I am missing? At least it
>> >> seems to me both to be projects that try to provide uniform querying
>> >> capabilities to a wide array of data backends. Both project also favor a
>> >> type-safe Java querying API instead of a String/SQL oriented query API.
>> >>
>> >> Regards,
>> >> Kasper Sørensen
>> >>
>> >> [1] http://metamodel.incubator.apache.org/
>> >>
>> >> 
>> >> From: Ashutosh Chauhan [hashut...@apache.org]
>> >> Sent: 01 May 2014 00:21
>> >> To: general@incubator.apache.org
>> >> Subject: [PROPOSAL] Optiq
>> >>
>> >> I would like to propose Optiq as an Apache Incubator project.  I have
>> >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland
>> >> posted the text of the proposal below.
>> >>
>> >> Ashutosh.
>> >>
>> >> = Optiq =
>> >> == Abstract ==
>> >>
>> >> Optiq is a framework that allows efficient translation of queries
>> involving
>> >> heterogeneous and federated data.
>> >>
>> >> == Proposal ==
>> >>
>> >> Optiq is a highly customizable engine for parsing and planning queries
>> on
>> >> data in a wide variety of formats. It allows database-like acce

RE: [PROPOSAL] Optiq

2014-05-02 Thread Kasper Sørensen
I feel the same way. And to clarify my position a bit - I am in no way against 
having Optiq in the incubator, it sounds like a very impressive library. I was 
merely probing if it would be possible to merge or standardize some of the 
aspects of the projects - work together where it's possible, and differentiate 
where it makes sense.

-Original Message-
From: shaposh...@gmail.com [mailto:shaposh...@gmail.com] On Behalf Of Roman 
Shaposhnik
Sent: 2. maj 2014 01:49
To: general@incubator.apache.org
Subject: Re: [PROPOSAL] Optiq

On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell  wrote:
> One could imagine as part of the case for incubation and graduation 
> both an articulation of the project's place in the larger ecosystem, 
> similar to how academic papers customarily place their work and novel 
> findings within the larger field in 'Related Work'. If not part of the 
> initial proposal, then at least making a good case as a criteria for 
> graduation, and writing up related work and how the new project 
> differentiates could be an initial task done on JIRA after acceptance along 
> the lines of the trademark search.

I would be a strong +1 to modify our proposal template to include a section 
like that. It will be, if nothing else, a strong forcing function to spend some 
time considering similar projects.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


Re: [PROPOSAL] Optiq

2014-05-02 Thread Steven Noels
On Wed, Apr 30, 2014, at 03:21 PM, Ashutosh Chauhan wrote:

> I would like to propose Optiq as an Apache Incubator project.  I have
> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal
> and
> posted the text of the proposal below.
> 
> Ashutosh.

Given the importance of Optiq for several larger projects, my belief is
that it is immensely relevant to see it transform into a community
project under the ASF stewardship. I very much second the torch-carrying
remark of Ted.

Steven.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Optiq

2014-05-01 Thread Roman Shaposhnik
On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell  wrote:
> One could imagine as part of the case for incubation and graduation both an
> articulation of the project's place in the larger ecosystem, similar to how
> academic papers customarily place their work and novel findings within the
> larger field in 'Related Work'. If not part of the initial proposal, then
> at least making a good case as a criteria for graduation, and writing up
> related work and how the new project differentiates could be an initial
> task done on JIRA after acceptance along the lines of the trademark search.

I would be a strong +1 to modify our proposal template to include a section
like that. It will be, if nothing else, a strong forcing function to spend some
time considering similar projects.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [PROPOSAL] Optiq

2014-05-01 Thread Ted Dunning
On Thu, May 1, 2014 at 10:19 PM, Kasper Sørensen <
kasper.soren...@humaninference.com> wrote:

> - Can you explain or link to more information about the type inference you
> mention?
>

The type inferencing is used by Drill.

The problem is that strong typing is normally required to parse SQL
statements.  If you don't even yet know what columns exist, parsing a query
is difficult for normal SQL parsers. Generating good code for such
situations has to be delayed until type information is available.

See

http://tnachen.wordpress.com/2013/11/05/lifetime-of-a-query-in-drill-alpha-release/

https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit


Re: [PROPOSAL] Optiq

2014-05-01 Thread Andrew Purtell
One could imagine as part of the case for incubation and graduation both an
articulation of the project's place in the larger ecosystem, similar to how
academic papers customarily place their work and novel findings within the
larger field in 'Related Work'. If not part of the initial proposal, then
at least making a good case as a criteria for graduation, and writing up
related work and how the new project differentiates could be an initial
task done on JIRA after acceptance along the lines of the trademark search.



On Thu, May 1, 2014 at 2:22 PM, Henry Saputra wrote:

> Unfortunately, similar projects entering Apache incubator are common
> things =(
>
> Even though each original project proposers can argue about
> differences in one way or another, it will eventually decided by
> adoption and community growth, and at the end the quality of the
> project itself.
>
> Some other incoming projects had been in similar questions/concerns
> regarding "competing" with existing ASF projects, e.g.: Twill vs
> Slider, Samza vs Storm vs S4, and several others.
>
>
> - Henry
>
> On Thu, May 1, 2014 at 12:14 AM, Ted Dunning 
> wrote:
> > I think that there is a huge difference between Metamodel and Optiq.
> >
> > In particular:
> >
> > - Optiq provides real SQL including nested queries, correlated
> sub-queries
> > and so on
> >
> > - Metamodel uses a fluent Java API ... SQL parsing and transformation
> > doesn't appear to be a goal
> >
> > - Optiq provides highly advanced query transformations including
> > decorrelations based on estimated execution costs.
> >
> > - Metamodel appears to provide no significant query transformations
> >
> > - Optiq only provides query execution as a by-product for testing
> >
> > - Metamodel has query execution as a central goal
> >
> > - Optiq provides a form of type inferencing for SQL queries.  This is
> > unique to Optiq as far as I know.
> >
> >
> >
> > On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen <
> > kasper.soren...@humaninference.com> wrote:
> >
> >> I see a lot of conceptual similarity between Optiq and the Apache
> >> MetaModel (incubator) project [1]. Maybe something can be done to align
> the
> >> two projects, so that we avoid having two incubating projects that do
> >> basically the same thing?
> >>
> >> Or maybe there's some glaring difference that I am missing? At least it
> >> seems to me both to be projects that try to provide uniform querying
> >> capabilities to a wide array of data backends. Both project also favor a
> >> type-safe Java querying API instead of a String/SQL oriented query API.
> >>
> >> Regards,
> >> Kasper Sørensen
> >>
> >> [1] http://metamodel.incubator.apache.org/
> >>
> >> 
> >> From: Ashutosh Chauhan [hashut...@apache.org]
> >> Sent: 01 May 2014 00:21
> >> To: general@incubator.apache.org
> >> Subject: [PROPOSAL] Optiq
> >>
> >> I would like to propose Optiq as an Apache Incubator project.  I have
> >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland
> >> posted the text of the proposal below.
> >>
> >> Ashutosh.
> >>
> >> = Optiq =
> >> == Abstract ==
> >>
> >> Optiq is a framework that allows efficient translation of queries
> involving
> >> heterogeneous and federated data.
> >>
> >> == Proposal ==
> >>
> >> Optiq is a highly customizable engine for parsing and planning queries
> on
> >> data in a wide variety of formats. It allows database-like access, and
> in
> >> particular a SQL interface and advanced query optimization, for data not
> >> residing in a traditional database.
> >>
> >> == Background ==
> >>
> >> Databases were traditionally engineered in a monolithic stack,
> providing a
> >> data storage format, data processing algorithms, query parser, query
> >> planner, built-in functions, metadata repository and connectivity layer.
> >> They innovate in some areas but rarely in all.
> >>
> >> Modern data management systems are decomposing that stack into separate
> >> components, separating data, processing engine, metadata, and query
> >> language support. They are highly heterogeneous, with data in multiple
> >> locations and formats, caching and redundant data, different workloads,
> and
> >> processing occurring in different engines.
> >>
> >> Query planning (sometimes called query optimization) has always been a
> key
> >> function of a DBMS, because it allows the implementors to introduce new
> >> query-processing algorithms, and allows data administrators to
> re-organize
> >> the data without affecting applications built on that data. In a
> >> componentized system, the query planner integrates the components (data
> >> formats, engines, algorithms) without introducing unncessary coupling or
> >> performance tradeoffs.
> >>
> >> But building a query planner is hard; many systems muddle along without
> a
> >> planner, and indeed a SQL interface, until the demand from their
> customers
> >> is overwhelming.
> >>
> >> There is an opportunity to make this pr

Re: [PROPOSAL] Optiq

2014-05-01 Thread Henry Saputra
Unfortunately, similar projects entering Apache incubator are common things =(

Even though each original project proposers can argue about
differences in one way or another, it will eventually decided by
adoption and community growth, and at the end the quality of the
project itself.

Some other incoming projects had been in similar questions/concerns
regarding "competing" with existing ASF projects, e.g.: Twill vs
Slider, Samza vs Storm vs S4, and several others.


- Henry

On Thu, May 1, 2014 at 12:14 AM, Ted Dunning  wrote:
> I think that there is a huge difference between Metamodel and Optiq.
>
> In particular:
>
> - Optiq provides real SQL including nested queries, correlated sub-queries
> and so on
>
> - Metamodel uses a fluent Java API ... SQL parsing and transformation
> doesn't appear to be a goal
>
> - Optiq provides highly advanced query transformations including
> decorrelations based on estimated execution costs.
>
> - Metamodel appears to provide no significant query transformations
>
> - Optiq only provides query execution as a by-product for testing
>
> - Metamodel has query execution as a central goal
>
> - Optiq provides a form of type inferencing for SQL queries.  This is
> unique to Optiq as far as I know.
>
>
>
> On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen <
> kasper.soren...@humaninference.com> wrote:
>
>> I see a lot of conceptual similarity between Optiq and the Apache
>> MetaModel (incubator) project [1]. Maybe something can be done to align the
>> two projects, so that we avoid having two incubating projects that do
>> basically the same thing?
>>
>> Or maybe there's some glaring difference that I am missing? At least it
>> seems to me both to be projects that try to provide uniform querying
>> capabilities to a wide array of data backends. Both project also favor a
>> type-safe Java querying API instead of a String/SQL oriented query API.
>>
>> Regards,
>> Kasper Sørensen
>>
>> [1] http://metamodel.incubator.apache.org/
>>
>> 
>> From: Ashutosh Chauhan [hashut...@apache.org]
>> Sent: 01 May 2014 00:21
>> To: general@incubator.apache.org
>> Subject: [PROPOSAL] Optiq
>>
>> I would like to propose Optiq as an Apache Incubator project.  I have
>> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
>> posted the text of the proposal below.
>>
>> Ashutosh.
>>
>> = Optiq =
>> == Abstract ==
>>
>> Optiq is a framework that allows efficient translation of queries involving
>> heterogeneous and federated data.
>>
>> == Proposal ==
>>
>> Optiq is a highly customizable engine for parsing and planning queries on
>> data in a wide variety of formats. It allows database-like access, and in
>> particular a SQL interface and advanced query optimization, for data not
>> residing in a traditional database.
>>
>> == Background ==
>>
>> Databases were traditionally engineered in a monolithic stack, providing a
>> data storage format, data processing algorithms, query parser, query
>> planner, built-in functions, metadata repository and connectivity layer.
>> They innovate in some areas but rarely in all.
>>
>> Modern data management systems are decomposing that stack into separate
>> components, separating data, processing engine, metadata, and query
>> language support. They are highly heterogeneous, with data in multiple
>> locations and formats, caching and redundant data, different workloads, and
>> processing occurring in different engines.
>>
>> Query planning (sometimes called query optimization) has always been a key
>> function of a DBMS, because it allows the implementors to introduce new
>> query-processing algorithms, and allows data administrators to re-organize
>> the data without affecting applications built on that data. In a
>> componentized system, the query planner integrates the components (data
>> formats, engines, algorithms) without introducing unncessary coupling or
>> performance tradeoffs.
>>
>> But building a query planner is hard; many systems muddle along without a
>> planner, and indeed a SQL interface, until the demand from their customers
>> is overwhelming.
>>
>> There is an opportunity to make this process more efficient by creating a
>> re-usable framework.
>>
>> == Rationale ==
>>
>> Optiq allows database-like access, and in particular a SQL interface and
>> advanced query optimization, for data not residing in a traditional
>> database. It is complementary to many current Hadoop and NoSQL systems,
>> which have innovative and performant storage and runtime systems but lack a
>> SQL interface and intelligent query translation.
>>
>> Optiq is already in use by several projects, including Apache Drill, Apache
>> Hive and Cascading Lingual, and commercial products.
>>
>> Optiq's architecture consists of:
>>
>> An extensible relational algebra.
>> SPIs (service-provider interfaces) for metadata (schemas and tables),
>> planner rules, statistics, cost-estimates, user-defined functions.
>> Built-in sets 

RE: [PROPOSAL] Optiq

2014-05-01 Thread Kasper Sørensen
I am certainly not questioning the power of Optiq. Just noting that I think it 
has a lot of similarities, and obviously there are differences. But even in 
your list I find a lot of the points to be close or similar. I would hate to 
start a war over words in this thread, since I actually only wanted to point to 
a related project. But to your points:

- MetaModel also provides SQL support, including nested/sub-queries. Not 
correlated sub-queries though, which is a difference.

- SQL parsing is a goal. Not exactly sure what you mean by "transformations"; I 
think it's either User Defined Functions (UDFs) or transformations of the query 
itself to fit with a particular backend. UDFs are currently not supported no, 
but we do quite a lot of tricks to transform and optimize the query plan based 
on the backing store.

- Sounds like Optiq is ahead of MM in terms of query transformations.

- What do you mean when you say query execution is not a central goal of Optiq? 
What would you otherwise be needing your query for?

- Can you explain or link to more information about the type inference you 
mention?

Best regards,
Kasper


From: Ted Dunning [ted.dunn...@gmail.com]
Sent: 01 May 2014 09:14
To: general@incubator.apache.org
Subject: Re: [PROPOSAL] Optiq

I think that there is a huge difference between Metamodel and Optiq.

In particular:

- Optiq provides real SQL including nested queries, correlated sub-queries
and so on

- Metamodel uses a fluent Java API ... SQL parsing and transformation
doesn't appear to be a goal

- Optiq provides highly advanced query transformations including
decorrelations based on estimated execution costs.

- Metamodel appears to provide no significant query transformations

- Optiq only provides query execution as a by-product for testing

- Metamodel has query execution as a central goal

- Optiq provides a form of type inferencing for SQL queries.  This is
unique to Optiq as far as I know.



On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen <
kasper.soren...@humaninference.com> wrote:

> I see a lot of conceptual similarity between Optiq and the Apache
> MetaModel (incubator) project [1]. Maybe something can be done to align the
> two projects, so that we avoid having two incubating projects that do
> basically the same thing?
>
> Or maybe there's some glaring difference that I am missing? At least it
> seems to me both to be projects that try to provide uniform querying
> capabilities to a wide array of data backends. Both project also favor a
> type-safe Java querying API instead of a String/SQL oriented query API.
>
> Regards,
> Kasper Sørensen
>
> [1] http://metamodel.incubator.apache.org/
>
> 
> From: Ashutosh Chauhan [hashut...@apache.org]
> Sent: 01 May 2014 00:21
> To: general@incubator.apache.org
> Subject: [PROPOSAL] Optiq
>
> I would like to propose Optiq as an Apache Incubator project.  I have
> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
> posted the text of the proposal below.
>
> Ashutosh.
>
> = Optiq =
> == Abstract ==
>
> Optiq is a framework that allows efficient translation of queries involving
> heterogeneous and federated data.
>
> == Proposal ==
>
> Optiq is a highly customizable engine for parsing and planning queries on
> data in a wide variety of formats. It allows database-like access, and in
> particular a SQL interface and advanced query optimization, for data not
> residing in a traditional database.
>
> == Background ==
>
> Databases were traditionally engineered in a monolithic stack, providing a
> data storage format, data processing algorithms, query parser, query
> planner, built-in functions, metadata repository and connectivity layer.
> They innovate in some areas but rarely in all.
>
> Modern data management systems are decomposing that stack into separate
> components, separating data, processing engine, metadata, and query
> language support. They are highly heterogeneous, with data in multiple
> locations and formats, caching and redundant data, different workloads, and
> processing occurring in different engines.
>
> Query planning (sometimes called query optimization) has always been a key
> function of a DBMS, because it allows the implementors to introduce new
> query-processing algorithms, and allows data administrators to re-organize
> the data without affecting applications built on that data. In a
> componentized system, the query planner integrates the components (data
> formats, engines, algorithms) without introducing unncessary coupling or
> performance tradeoffs.
>
> But building a query planner is hard; many systems muddle along without a
> planner, and indeed a SQL interface, until the demand fro

Re: [PROPOSAL] Optiq

2014-05-01 Thread Alan Gates
Apache Hive has recently started work to integrate with Optiq as well.  Having 
it as an Apache project will be good for both Optiq and Apache.

Alan.

On Apr 30, 2014, at 3:21 PM, Ashutosh Chauhan  wrote:

> I would like to propose Optiq as an Apache Incubator project.  I have
> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
> posted the text of the proposal below.
> 
> Ashutosh.
> 
> = Optiq =
> == Abstract ==
> 
> Optiq is a framework that allows efficient translation of queries involving
> heterogeneous and federated data.
> 
> == Proposal ==
> 
> Optiq is a highly customizable engine for parsing and planning queries on
> data in a wide variety of formats. It allows database-like access, and in
> particular a SQL interface and advanced query optimization, for data not
> residing in a traditional database.
> 
> == Background ==
> 
> Databases were traditionally engineered in a monolithic stack, providing a
> data storage format, data processing algorithms, query parser, query
> planner, built-in functions, metadata repository and connectivity layer.
> They innovate in some areas but rarely in all.
> 
> Modern data management systems are decomposing that stack into separate
> components, separating data, processing engine, metadata, and query
> language support. They are highly heterogeneous, with data in multiple
> locations and formats, caching and redundant data, different workloads, and
> processing occurring in different engines.
> 
> Query planning (sometimes called query optimization) has always been a key
> function of a DBMS, because it allows the implementors to introduce new
> query-processing algorithms, and allows data administrators to re-organize
> the data without affecting applications built on that data. In a
> componentized system, the query planner integrates the components (data
> formats, engines, algorithms) without introducing unncessary coupling or
> performance tradeoffs.
> 
> But building a query planner is hard; many systems muddle along without a
> planner, and indeed a SQL interface, until the demand from their customers
> is overwhelming.
> 
> There is an opportunity to make this process more efficient by creating a
> re-usable framework.
> 
> == Rationale ==
> 
> Optiq allows database-like access, and in particular a SQL interface and
> advanced query optimization, for data not residing in a traditional
> database. It is complementary to many current Hadoop and NoSQL systems,
> which have innovative and performant storage and runtime systems but lack a
> SQL interface and intelligent query translation.
> 
> Optiq is already in use by several projects, including Apache Drill, Apache
> Hive and Cascading Lingual, and commercial products.
> 
> Optiq's architecture consists of:
> 
> An extensible relational algebra.
> SPIs (service-provider interfaces) for metadata (schemas and tables),
> planner rules, statistics, cost-estimates, user-defined functions.
> Built-in sets of rules for logical transformations and common data-sources.
> Two query planning engines driven by rules, statistics, etc. One engine is
> cost-based, the other rule-based.
> Optional SQL parser, validator and translator to relational algebra.
> Optional JDBC driver.
> == Initial Goals ==
> 
> The initial goals are be to move the existing codebase to Apache and
> integrate with the Apache development process. Once this is accomplished,
> we plan for incremental development and releases that follow the Apache
> guidelines.
> 
> As we move the code into the org.apache namespace, we will restructure
> components as necessary to allow clients to use just the components of
> Optiq that they need.
> 
> A version 1.0 release, including pre-built binaries, will foster wider
> adoption.
> 
> == Current Status ==
> 
> Optiq has had over a dozen minor releases over the last 18 months. Its core
> SQL parser and validator, and its planning engine and core rules, are
> mature and robust and are the basis for several production systems; but
> other components and SPIs are still undergoing rapid evolution.
> 
> === Meritocracy ===
> 
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. We encourage the companies and projects
> using Optiq to discuss their requirements in an open forum and to
> participate in development. We will encourage and monitor community
> participation so that privileges can be extended to those that contribute.
> 
> Optiq's pluggable architecture encourages developers to contribute
> extensions such as adapters for data sources, new planning rules, and
> better statistics and cost-estimation functions. We look forward to
> fostering a rich ecosystem of extensions.
> 
> === Community ===
> 
> Building a data management system requires a high degree of technical
> skill, and correspondingly, the community of developers directly using
> Optiq is potentially fairly small, albeit highly technical and engaged. But
> we also expect

Re: [PROPOSAL] Optiq

2014-05-01 Thread Robert Metzger
I agree with Ted. Optiq is a full fledged cost-based query optimization
framework for relational workloads.

I also want to highlight Optiq's JDBC infrastructure (and ODBC at a later
point as well). Rather than implementing the JDBC specification, Optiq
users only have to implement a few interfaces for JDBC support.


The Stratosphere project (which recently entered the Apache Incubator) also
decided for using Optiq for their SQL interface. After some research, we
found that Optiq is a perfect fit for our requirements. And I can confirm
from a developer's perspective that Optiq is doing a great job.

I am confident that Optiq will become the standard query optimization
framework for SQL-on-"BigData". With Drill and Hive relying on Optiq, major
projects already invested into the project.

Robert



On Thu, May 1, 2014 at 9:14 AM, Ted Dunning  wrote:

> I think that there is a huge difference between Metamodel and Optiq.
>
> In particular:
>
> - Optiq provides real SQL including nested queries, correlated sub-queries
> and so on
>
> - Metamodel uses a fluent Java API ... SQL parsing and transformation
> doesn't appear to be a goal
>
> - Optiq provides highly advanced query transformations including
> decorrelations based on estimated execution costs.
>
> - Metamodel appears to provide no significant query transformations
>
> - Optiq only provides query execution as a by-product for testing
>
> - Metamodel has query execution as a central goal
>
> - Optiq provides a form of type inferencing for SQL queries.  This is
> unique to Optiq as far as I know.
>
>
>
> On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen <
> kasper.soren...@humaninference.com> wrote:
>
> > I see a lot of conceptual similarity between Optiq and the Apache
> > MetaModel (incubator) project [1]. Maybe something can be done to align
> the
> > two projects, so that we avoid having two incubating projects that do
> > basically the same thing?
> >
> > Or maybe there's some glaring difference that I am missing? At least it
> > seems to me both to be projects that try to provide uniform querying
> > capabilities to a wide array of data backends. Both project also favor a
> > type-safe Java querying API instead of a String/SQL oriented query API.
> >
> > Regards,
> > Kasper Sørensen
> >
> > [1] http://metamodel.incubator.apache.org/
> >
> > 
> > From: Ashutosh Chauhan [hashut...@apache.org]
> > Sent: 01 May 2014 00:21
> > To: general@incubator.apache.org
> > Subject: [PROPOSAL] Optiq
> >
> > I would like to propose Optiq as an Apache Incubator project.  I have
> > posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland
> > posted the text of the proposal below.
> >
> > Ashutosh.
> >
> > = Optiq =
> > == Abstract ==
> >
> > Optiq is a framework that allows efficient translation of queries
> involving
> > heterogeneous and federated data.
> >
> > == Proposal ==
> >
> > Optiq is a highly customizable engine for parsing and planning queries on
> > data in a wide variety of formats. It allows database-like access, and in
> > particular a SQL interface and advanced query optimization, for data not
> > residing in a traditional database.
> >
> > == Background ==
> >
> > Databases were traditionally engineered in a monolithic stack, providing
> a
> > data storage format, data processing algorithms, query parser, query
> > planner, built-in functions, metadata repository and connectivity layer.
> > They innovate in some areas but rarely in all.
> >
> > Modern data management systems are decomposing that stack into separate
> > components, separating data, processing engine, metadata, and query
> > language support. They are highly heterogeneous, with data in multiple
> > locations and formats, caching and redundant data, different workloads,
> and
> > processing occurring in different engines.
> >
> > Query planning (sometimes called query optimization) has always been a
> key
> > function of a DBMS, because it allows the implementors to introduce new
> > query-processing algorithms, and allows data administrators to
> re-organize
> > the data without affecting applications built on that data. In a
> > componentized system, the query planner integrates the components (data
> > formats, engines, algorithms) without introducing unncessary coupling or
> > performance tradeoffs.
> >
> > But building a query planner is hard; many systems muddle along without a
> > planner, and indeed a SQL interface, until the demand from their
> customers
> > is overwhelming.
> >
> > There is an opportunity to make this process more efficient by creating a
> > re-usable framework.
> >
> > == Rationale ==
> >
> > Optiq allows database-like access, and in particular a SQL interface and
> > advanced query optimization, for data not residing in a traditional
> > database. It is complementary to many current Hadoop and NoSQL systems,
> > which have innovative and performant storage and runtime systems but
> lack a
> > SQL inter

Re: [PROPOSAL] Optiq

2014-05-01 Thread Ted Dunning
I think that there is a huge difference between Metamodel and Optiq.

In particular:

- Optiq provides real SQL including nested queries, correlated sub-queries
and so on

- Metamodel uses a fluent Java API ... SQL parsing and transformation
doesn't appear to be a goal

- Optiq provides highly advanced query transformations including
decorrelations based on estimated execution costs.

- Metamodel appears to provide no significant query transformations

- Optiq only provides query execution as a by-product for testing

- Metamodel has query execution as a central goal

- Optiq provides a form of type inferencing for SQL queries.  This is
unique to Optiq as far as I know.



On Thu, May 1, 2014 at 8:57 AM, Kasper Sørensen <
kasper.soren...@humaninference.com> wrote:

> I see a lot of conceptual similarity between Optiq and the Apache
> MetaModel (incubator) project [1]. Maybe something can be done to align the
> two projects, so that we avoid having two incubating projects that do
> basically the same thing?
>
> Or maybe there's some glaring difference that I am missing? At least it
> seems to me both to be projects that try to provide uniform querying
> capabilities to a wide array of data backends. Both project also favor a
> type-safe Java querying API instead of a String/SQL oriented query API.
>
> Regards,
> Kasper Sørensen
>
> [1] http://metamodel.incubator.apache.org/
>
> 
> From: Ashutosh Chauhan [hashut...@apache.org]
> Sent: 01 May 2014 00:21
> To: general@incubator.apache.org
> Subject: [PROPOSAL] Optiq
>
> I would like to propose Optiq as an Apache Incubator project.  I have
> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
> posted the text of the proposal below.
>
> Ashutosh.
>
> = Optiq =
> == Abstract ==
>
> Optiq is a framework that allows efficient translation of queries involving
> heterogeneous and federated data.
>
> == Proposal ==
>
> Optiq is a highly customizable engine for parsing and planning queries on
> data in a wide variety of formats. It allows database-like access, and in
> particular a SQL interface and advanced query optimization, for data not
> residing in a traditional database.
>
> == Background ==
>
> Databases were traditionally engineered in a monolithic stack, providing a
> data storage format, data processing algorithms, query parser, query
> planner, built-in functions, metadata repository and connectivity layer.
> They innovate in some areas but rarely in all.
>
> Modern data management systems are decomposing that stack into separate
> components, separating data, processing engine, metadata, and query
> language support. They are highly heterogeneous, with data in multiple
> locations and formats, caching and redundant data, different workloads, and
> processing occurring in different engines.
>
> Query planning (sometimes called query optimization) has always been a key
> function of a DBMS, because it allows the implementors to introduce new
> query-processing algorithms, and allows data administrators to re-organize
> the data without affecting applications built on that data. In a
> componentized system, the query planner integrates the components (data
> formats, engines, algorithms) without introducing unncessary coupling or
> performance tradeoffs.
>
> But building a query planner is hard; many systems muddle along without a
> planner, and indeed a SQL interface, until the demand from their customers
> is overwhelming.
>
> There is an opportunity to make this process more efficient by creating a
> re-usable framework.
>
> == Rationale ==
>
> Optiq allows database-like access, and in particular a SQL interface and
> advanced query optimization, for data not residing in a traditional
> database. It is complementary to many current Hadoop and NoSQL systems,
> which have innovative and performant storage and runtime systems but lack a
> SQL interface and intelligent query translation.
>
> Optiq is already in use by several projects, including Apache Drill, Apache
> Hive and Cascading Lingual, and commercial products.
>
> Optiq's architecture consists of:
>
> An extensible relational algebra.
> SPIs (service-provider interfaces) for metadata (schemas and tables),
> planner rules, statistics, cost-estimates, user-defined functions.
> Built-in sets of rules for logical transformations and common data-sources.
> Two query planning engines driven by rules, statistics, etc. One engine is
> cost-based, the other rule-based.
> Optional SQL parser, validator and translator to relational algebra.
> Optional JDBC driver.
> == Initial Goals ==
>
> The initial goals are be to move the existing codebase to Apache and
> integrate with the Apache development process. Once this is accomplished,
> we plan for incremental development and releases that follow the Apache
> guidelines.
>
> As we move the code into the org.apache namespace, we will restructure
> components as necessary to allow clients to use just th

RE: [PROPOSAL] Optiq

2014-04-30 Thread Kasper Sørensen
I see a lot of conceptual similarity between Optiq and the Apache MetaModel 
(incubator) project [1]. Maybe something can be done to align the two projects, 
so that we avoid having two incubating projects that do basically the same 
thing?

Or maybe there's some glaring difference that I am missing? At least it seems 
to me both to be projects that try to provide uniform querying capabilities to 
a wide array of data backends. Both project also favor a type-safe Java 
querying API instead of a String/SQL oriented query API.

Regards,
Kasper Sørensen

[1] http://metamodel.incubator.apache.org/


From: Ashutosh Chauhan [hashut...@apache.org]
Sent: 01 May 2014 00:21
To: general@incubator.apache.org
Subject: [PROPOSAL] Optiq

I would like to propose Optiq as an Apache Incubator project.  I have
posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
posted the text of the proposal below.

Ashutosh.

= Optiq =
== Abstract ==

Optiq is a framework that allows efficient translation of queries involving
heterogeneous and federated data.

== Proposal ==

Optiq is a highly customizable engine for parsing and planning queries on
data in a wide variety of formats. It allows database-like access, and in
particular a SQL interface and advanced query optimization, for data not
residing in a traditional database.

== Background ==

Databases were traditionally engineered in a monolithic stack, providing a
data storage format, data processing algorithms, query parser, query
planner, built-in functions, metadata repository and connectivity layer.
They innovate in some areas but rarely in all.

Modern data management systems are decomposing that stack into separate
components, separating data, processing engine, metadata, and query
language support. They are highly heterogeneous, with data in multiple
locations and formats, caching and redundant data, different workloads, and
processing occurring in different engines.

Query planning (sometimes called query optimization) has always been a key
function of a DBMS, because it allows the implementors to introduce new
query-processing algorithms, and allows data administrators to re-organize
the data without affecting applications built on that data. In a
componentized system, the query planner integrates the components (data
formats, engines, algorithms) without introducing unncessary coupling or
performance tradeoffs.

But building a query planner is hard; many systems muddle along without a
planner, and indeed a SQL interface, until the demand from their customers
is overwhelming.

There is an opportunity to make this process more efficient by creating a
re-usable framework.

== Rationale ==

Optiq allows database-like access, and in particular a SQL interface and
advanced query optimization, for data not residing in a traditional
database. It is complementary to many current Hadoop and NoSQL systems,
which have innovative and performant storage and runtime systems but lack a
SQL interface and intelligent query translation.

Optiq is already in use by several projects, including Apache Drill, Apache
Hive and Cascading Lingual, and commercial products.

Optiq's architecture consists of:

An extensible relational algebra.
SPIs (service-provider interfaces) for metadata (schemas and tables),
planner rules, statistics, cost-estimates, user-defined functions.
Built-in sets of rules for logical transformations and common data-sources.
Two query planning engines driven by rules, statistics, etc. One engine is
cost-based, the other rule-based.
Optional SQL parser, validator and translator to relational algebra.
Optional JDBC driver.
== Initial Goals ==

The initial goals are be to move the existing codebase to Apache and
integrate with the Apache development process. Once this is accomplished,
we plan for incremental development and releases that follow the Apache
guidelines.

As we move the code into the org.apache namespace, we will restructure
components as necessary to allow clients to use just the components of
Optiq that they need.

A version 1.0 release, including pre-built binaries, will foster wider
adoption.

== Current Status ==

Optiq has had over a dozen minor releases over the last 18 months. Its core
SQL parser and validator, and its planning engine and core rules, are
mature and robust and are the basis for several production systems; but
other components and SPIs are still undergoing rapid evolution.

=== Meritocracy ===

We plan to invest in supporting a meritocracy. We will discuss the
requirements in an open forum. We encourage the companies and projects
using Optiq to discuss their requirements in an open forum and to
participate in development. We will encourage and monitor community
participation so that privileges can be extended to those that contribute.

Optiq's pluggable architecture encourages developers to contribute
extensions such as adapters for data sources, new planning rules, and
better stat

Re: [PROPOSAL] Optiq

2014-04-30 Thread Ted Dunning
Optiq has been a key technology underscoring the progress of Drill.  It has
wide applicability for any project that needs SQL parsing and cost based
optimization.

Julian has been carrying this torch for a long time, but I really think
that having a wider community would help.



On Thu, May 1, 2014 at 12:21 AM, Ashutosh Chauhan wrote:

> I would like to propose Optiq as an Apache Incubator project.  I have
> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
> posted the text of the proposal below.
>
> Ashutosh.
>
> = Optiq =
> == Abstract ==
>
> Optiq is a framework that allows efficient translation of queries involving
> heterogeneous and federated data.
>
> == Proposal ==
>
> Optiq is a highly customizable engine for parsing and planning queries on
> data in a wide variety of formats. It allows database-like access, and in
> particular a SQL interface and advanced query optimization, for data not
> residing in a traditional database.
>
> == Background ==
>
> Databases were traditionally engineered in a monolithic stack, providing a
> data storage format, data processing algorithms, query parser, query
> planner, built-in functions, metadata repository and connectivity layer.
> They innovate in some areas but rarely in all.
>
> Modern data management systems are decomposing that stack into separate
> components, separating data, processing engine, metadata, and query
> language support. They are highly heterogeneous, with data in multiple
> locations and formats, caching and redundant data, different workloads, and
> processing occurring in different engines.
>
> Query planning (sometimes called query optimization) has always been a key
> function of a DBMS, because it allows the implementors to introduce new
> query-processing algorithms, and allows data administrators to re-organize
> the data without affecting applications built on that data. In a
> componentized system, the query planner integrates the components (data
> formats, engines, algorithms) without introducing unncessary coupling or
> performance tradeoffs.
>
> But building a query planner is hard; many systems muddle along without a
> planner, and indeed a SQL interface, until the demand from their customers
> is overwhelming.
>
> There is an opportunity to make this process more efficient by creating a
> re-usable framework.
>
> == Rationale ==
>
> Optiq allows database-like access, and in particular a SQL interface and
> advanced query optimization, for data not residing in a traditional
> database. It is complementary to many current Hadoop and NoSQL systems,
> which have innovative and performant storage and runtime systems but lack a
> SQL interface and intelligent query translation.
>
> Optiq is already in use by several projects, including Apache Drill, Apache
> Hive and Cascading Lingual, and commercial products.
>
> Optiq's architecture consists of:
>
> An extensible relational algebra.
> SPIs (service-provider interfaces) for metadata (schemas and tables),
> planner rules, statistics, cost-estimates, user-defined functions.
> Built-in sets of rules for logical transformations and common data-sources.
> Two query planning engines driven by rules, statistics, etc. One engine is
> cost-based, the other rule-based.
> Optional SQL parser, validator and translator to relational algebra.
> Optional JDBC driver.
> == Initial Goals ==
>
> The initial goals are be to move the existing codebase to Apache and
> integrate with the Apache development process. Once this is accomplished,
> we plan for incremental development and releases that follow the Apache
> guidelines.
>
> As we move the code into the org.apache namespace, we will restructure
> components as necessary to allow clients to use just the components of
> Optiq that they need.
>
> A version 1.0 release, including pre-built binaries, will foster wider
> adoption.
>
> == Current Status ==
>
> Optiq has had over a dozen minor releases over the last 18 months. Its core
> SQL parser and validator, and its planning engine and core rules, are
> mature and robust and are the basis for several production systems; but
> other components and SPIs are still undergoing rapid evolution.
>
> === Meritocracy ===
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. We encourage the companies and projects
> using Optiq to discuss their requirements in an open forum and to
> participate in development. We will encourage and monitor community
> participation so that privileges can be extended to those that contribute.
>
> Optiq's pluggable architecture encourages developers to contribute
> extensions such as adapters for data sources, new planning rules, and
> better statistics and cost-estimation functions. We look forward to
> fostering a rich ecosystem of extensions.
>
> === Community ===
>
> Building a data management system requires a high degree of technical
> skill, and correspondingly, the community of developers dir