Re: [DISCUSS] Quickstep incubation proposal

2016-03-22 Thread Tom Barber
No, absolutely my comment wasn't supposed to have any insinuation about
whether the project should get incubated or not from a proposal
perspective. It was just a round about way of saying, I like the proposal,
its fresh, looks sane and is something that's a bit different so it gets a
+1 from me.

On Tue, Mar 22, 2016 at 7:09 PM, Konstantin Boudnik  wrote:

> That's a fair statement. In general, however, it isn't a concern of the
> Incubator if a proposed podling have some sort of resemblance with some
> other
> software out there. IINM, no one was rejected because they want to develop
> yet
> another web-application server or something like this.
>
> Cos
>
> On Tue, Mar 22, 2016 at 06:44PM, Tom Barber wrote:
> > I actually have an opinion!
> >
> > I saw yet another database engine land and my heart sank
> >
> > Then I did some digging into quickstep and realised it was more of a
> > "traditional" database that might take on the likes of Exasol etc rather
> > than plugging more SQL into NOSQL etc(from what I gather) and I am happy
> to
> > see it pitched.
> >
> > Tom
> >
> > On Tue, Mar 22, 2016 at 6:41 PM, Konstantin Boudnik 
> wrote:
> >
> > > It's been a week since this thread started and surprisingly there
> isn't any
> > > reaction so far. Is it safe to assume the silent consensus has been
> > > reached?
> > >
> > > Cos
> > >
> > > On Tue, Mar 15, 2016 at 04:52PM, Roman Shaposhnik wrote:
> > > > Hi!
> > > >
> > > > It is my pleasure to present the proposal to incubate the Quickstep
> > > project
> > > > at the Apache Software Foundation. Quickstep is a high-performance
> > > > next generation, database engine available under Apache License 2.0.
> > > >
> > > > The text of the proposal is included below and is also available at
> > > >https://wiki.apache.org/incubator/QuickstepProposal
> > > >
> > > > Thanks,
> > > > Roman.
> > > >
> > > > == Abstract ==
> > > >
> > > > Quickstep is a high-performance database engine. It is designed to
> (1)
> > > > convert data to insights at bare-metal speed, (2) support multiple
> > > > query surfaces including SQL (the first (and current) version only
> > > > supports SQL, and (3) deliver bare-metal performance on any hardware
> > > > (including running on a laptop, running on a high-end (single node)
> > > > server, and running on a distributed cluster). Since its inception,
> > > > the project has been planned to deliver a high-performance single
> node
> > > > system first, followed by a distributed system.
> > > >
> > > > Quickstep is composed of several different modules that handle
> > > > different concerns of a database system. The main modules are:
> > > >   * Utility - Reusable general-purpose code that is used by many
> other
> > > modules.
> > > >   * Threading - Provides a cross-platform abstraction for threads and
> > > > synchronization primitives that abstract the underlying OS threading
> > > > features.
> > > >   * Types - The core type system used across all of Quickstep.
> Handles
> > > > details of how SQL types are stored, parsed, serialized &
> > > > deserialized, and converted. Also includes basic containers for typed
> > > > values (tuples and column-vectors) and low-level operations that
> apply
> > > > to typed values (e.g. basic arithmetic and comparisons).
> > > >   * Catalog - Tracks database schema as well as physical storage
> > > > information for relations (e.g. which physical blocks store a
> > > > relation's data, and any physical partitioning and placement
> > > > information).
> > > >   * Storage - Physically stores relational data in self-contained,
> > > > self-describing blocks, both in-memory and on persistent storage
> (disk
> > > > or a distributed filesystem). Also includes some heavyweight run-time
> > > > data structures used in query processing (e.g. hash tables for join
> > > > and aggregation). Includes a buffer manager component for managing
> > > > memory use and a file manager component that handles data
> persistence.
> > > >   * Compression - Implements ordered dictionary compression. Several
> > > > storage formats in the Storage module are capable of storing
> > > > compressed column data and evaluating some expressions directly on
> > > > compressed data without decompressing. The common code supporting
> > > > compression is in this module.
> > > >   * Expressions - Builds on the simple operations provided by the
> > > > Types module to support arbitrarily complex expressions over data,
> > > > including scalar expressions, predicates, and aggregate functions
> with
> > > > and without grouping.
> > > >   * Relational Operators - This module provides the building blocks
> > > > for queries in Quickstep. A query is represented as a directed
> acyclic
> > > > graph of relational operators, each of which is responsible for
> > > > applying some relational-algebraic operation(s) to transform its
> > > > input. Operators generate individual self-contained "work orders"

Re: [DISCUSS] Quickstep incubation proposal

2016-03-22 Thread Konstantin Boudnik
That's a fair statement. In general, however, it isn't a concern of the
Incubator if a proposed podling have some sort of resemblance with some other
software out there. IINM, no one was rejected because they want to develop yet
another web-application server or something like this.

Cos

On Tue, Mar 22, 2016 at 06:44PM, Tom Barber wrote:
> I actually have an opinion!
> 
> I saw yet another database engine land and my heart sank
> 
> Then I did some digging into quickstep and realised it was more of a
> "traditional" database that might take on the likes of Exasol etc rather
> than plugging more SQL into NOSQL etc(from what I gather) and I am happy to
> see it pitched.
> 
> Tom
> 
> On Tue, Mar 22, 2016 at 6:41 PM, Konstantin Boudnik  wrote:
> 
> > It's been a week since this thread started and surprisingly there isn't any
> > reaction so far. Is it safe to assume the silent consensus has been
> > reached?
> >
> > Cos
> >
> > On Tue, Mar 15, 2016 at 04:52PM, Roman Shaposhnik wrote:
> > > Hi!
> > >
> > > It is my pleasure to present the proposal to incubate the Quickstep
> > project
> > > at the Apache Software Foundation. Quickstep is a high-performance
> > > next generation, database engine available under Apache License 2.0.
> > >
> > > The text of the proposal is included below and is also available at
> > >https://wiki.apache.org/incubator/QuickstepProposal
> > >
> > > Thanks,
> > > Roman.
> > >
> > > == Abstract ==
> > >
> > > Quickstep is a high-performance database engine. It is designed to (1)
> > > convert data to insights at bare-metal speed, (2) support multiple
> > > query surfaces including SQL (the first (and current) version only
> > > supports SQL, and (3) deliver bare-metal performance on any hardware
> > > (including running on a laptop, running on a high-end (single node)
> > > server, and running on a distributed cluster). Since its inception,
> > > the project has been planned to deliver a high-performance single node
> > > system first, followed by a distributed system.
> > >
> > > Quickstep is composed of several different modules that handle
> > > different concerns of a database system. The main modules are:
> > >   * Utility - Reusable general-purpose code that is used by many other
> > modules.
> > >   * Threading - Provides a cross-platform abstraction for threads and
> > > synchronization primitives that abstract the underlying OS threading
> > > features.
> > >   * Types - The core type system used across all of Quickstep. Handles
> > > details of how SQL types are stored, parsed, serialized &
> > > deserialized, and converted. Also includes basic containers for typed
> > > values (tuples and column-vectors) and low-level operations that apply
> > > to typed values (e.g. basic arithmetic and comparisons).
> > >   * Catalog - Tracks database schema as well as physical storage
> > > information for relations (e.g. which physical blocks store a
> > > relation's data, and any physical partitioning and placement
> > > information).
> > >   * Storage - Physically stores relational data in self-contained,
> > > self-describing blocks, both in-memory and on persistent storage (disk
> > > or a distributed filesystem). Also includes some heavyweight run-time
> > > data structures used in query processing (e.g. hash tables for join
> > > and aggregation). Includes a buffer manager component for managing
> > > memory use and a file manager component that handles data persistence.
> > >   * Compression - Implements ordered dictionary compression. Several
> > > storage formats in the Storage module are capable of storing
> > > compressed column data and evaluating some expressions directly on
> > > compressed data without decompressing. The common code supporting
> > > compression is in this module.
> > >   * Expressions - Builds on the simple operations provided by the
> > > Types module to support arbitrarily complex expressions over data,
> > > including scalar expressions, predicates, and aggregate functions with
> > > and without grouping.
> > >   * Relational Operators - This module provides the building blocks
> > > for queries in Quickstep. A query is represented as a directed acyclic
> > > graph of relational operators, each of which is responsible for
> > > applying some relational-algebraic operation(s) to transform its
> > > input. Operators generate individual self-contained "work orders" that
> > > can be executed independently. Most operators are parallelism-friendly
> > > and generate one work-order per storage block of input.
> > >   * Query Execution - Handles the actual scheduling and execution of
> > > work from a query at runtime. The central class is the Foreman, an
> > > independent thread with a global view of the query plan and progress.
> > > The Foreman dispatches work-orders to stateless Worker threads and
> > > monitors their progress, and also coordinates streaming of partial
> > > results between producers and consumers in a query 

Re: [DISCUSS] Quickstep incubation proposal

2016-03-22 Thread Tom Barber
I actually have an opinion!

I saw yet another database engine land and my heart sank

Then I did some digging into quickstep and realised it was more of a
"traditional" database that might take on the likes of Exasol etc rather
than plugging more SQL into NOSQL etc(from what I gather) and I am happy to
see it pitched.

Tom

On Tue, Mar 22, 2016 at 6:41 PM, Konstantin Boudnik  wrote:

> It's been a week since this thread started and surprisingly there isn't any
> reaction so far. Is it safe to assume the silent consensus has been
> reached?
>
> Cos
>
> On Tue, Mar 15, 2016 at 04:52PM, Roman Shaposhnik wrote:
> > Hi!
> >
> > It is my pleasure to present the proposal to incubate the Quickstep
> project
> > at the Apache Software Foundation. Quickstep is a high-performance
> > next generation, database engine available under Apache License 2.0.
> >
> > The text of the proposal is included below and is also available at
> >https://wiki.apache.org/incubator/QuickstepProposal
> >
> > Thanks,
> > Roman.
> >
> > == Abstract ==
> >
> > Quickstep is a high-performance database engine. It is designed to (1)
> > convert data to insights at bare-metal speed, (2) support multiple
> > query surfaces including SQL (the first (and current) version only
> > supports SQL, and (3) deliver bare-metal performance on any hardware
> > (including running on a laptop, running on a high-end (single node)
> > server, and running on a distributed cluster). Since its inception,
> > the project has been planned to deliver a high-performance single node
> > system first, followed by a distributed system.
> >
> > Quickstep is composed of several different modules that handle
> > different concerns of a database system. The main modules are:
> >   * Utility - Reusable general-purpose code that is used by many other
> modules.
> >   * Threading - Provides a cross-platform abstraction for threads and
> > synchronization primitives that abstract the underlying OS threading
> > features.
> >   * Types - The core type system used across all of Quickstep. Handles
> > details of how SQL types are stored, parsed, serialized &
> > deserialized, and converted. Also includes basic containers for typed
> > values (tuples and column-vectors) and low-level operations that apply
> > to typed values (e.g. basic arithmetic and comparisons).
> >   * Catalog - Tracks database schema as well as physical storage
> > information for relations (e.g. which physical blocks store a
> > relation's data, and any physical partitioning and placement
> > information).
> >   * Storage - Physically stores relational data in self-contained,
> > self-describing blocks, both in-memory and on persistent storage (disk
> > or a distributed filesystem). Also includes some heavyweight run-time
> > data structures used in query processing (e.g. hash tables for join
> > and aggregation). Includes a buffer manager component for managing
> > memory use and a file manager component that handles data persistence.
> >   * Compression - Implements ordered dictionary compression. Several
> > storage formats in the Storage module are capable of storing
> > compressed column data and evaluating some expressions directly on
> > compressed data without decompressing. The common code supporting
> > compression is in this module.
> >   * Expressions - Builds on the simple operations provided by the
> > Types module to support arbitrarily complex expressions over data,
> > including scalar expressions, predicates, and aggregate functions with
> > and without grouping.
> >   * Relational Operators - This module provides the building blocks
> > for queries in Quickstep. A query is represented as a directed acyclic
> > graph of relational operators, each of which is responsible for
> > applying some relational-algebraic operation(s) to transform its
> > input. Operators generate individual self-contained "work orders" that
> > can be executed independently. Most operators are parallelism-friendly
> > and generate one work-order per storage block of input.
> >   * Query Execution - Handles the actual scheduling and execution of
> > work from a query at runtime. The central class is the Foreman, an
> > independent thread with a global view of the query plan and progress.
> > The Foreman dispatches work-orders to stateless Worker threads and
> > monitors their progress, and also coordinates streaming of partial
> > results between producers and consumers in a query plan DAG to
> > maximize parallelism. This module also includes the QueryContext
> > class, which holds global shared state for an individual query and is
> > designed to support easy serialization/deserialization for distributed
> > execution.
> >   * Parser - A simple SQL lexer and parser that parses SQL syntax into
> > an abstract syntax tree for consumption by the Query Optimizer.
> >   * Query Optimizer - Takes the abstract syntax tree generated by the
> > parser and transforms it into a runable query-plan DAG 

Re: [DISCUSS] Quickstep incubation proposal

2016-03-22 Thread Konstantin Boudnik
It's been a week since this thread started and surprisingly there isn't any
reaction so far. Is it safe to assume the silent consensus has been reached?

Cos

On Tue, Mar 15, 2016 at 04:52PM, Roman Shaposhnik wrote:
> Hi!
> 
> It is my pleasure to present the proposal to incubate the Quickstep project
> at the Apache Software Foundation. Quickstep is a high-performance
> next generation, database engine available under Apache License 2.0.
> 
> The text of the proposal is included below and is also available at
>https://wiki.apache.org/incubator/QuickstepProposal
> 
> Thanks,
> Roman.
> 
> == Abstract ==
> 
> Quickstep is a high-performance database engine. It is designed to (1)
> convert data to insights at bare-metal speed, (2) support multiple
> query surfaces including SQL (the first (and current) version only
> supports SQL, and (3) deliver bare-metal performance on any hardware
> (including running on a laptop, running on a high-end (single node)
> server, and running on a distributed cluster). Since its inception,
> the project has been planned to deliver a high-performance single node
> system first, followed by a distributed system.
> 
> Quickstep is composed of several different modules that handle
> different concerns of a database system. The main modules are:
>   * Utility - Reusable general-purpose code that is used by many other 
> modules.
>   * Threading - Provides a cross-platform abstraction for threads and
> synchronization primitives that abstract the underlying OS threading
> features.
>   * Types - The core type system used across all of Quickstep. Handles
> details of how SQL types are stored, parsed, serialized &
> deserialized, and converted. Also includes basic containers for typed
> values (tuples and column-vectors) and low-level operations that apply
> to typed values (e.g. basic arithmetic and comparisons).
>   * Catalog - Tracks database schema as well as physical storage
> information for relations (e.g. which physical blocks store a
> relation's data, and any physical partitioning and placement
> information).
>   * Storage - Physically stores relational data in self-contained,
> self-describing blocks, both in-memory and on persistent storage (disk
> or a distributed filesystem). Also includes some heavyweight run-time
> data structures used in query processing (e.g. hash tables for join
> and aggregation). Includes a buffer manager component for managing
> memory use and a file manager component that handles data persistence.
>   * Compression - Implements ordered dictionary compression. Several
> storage formats in the Storage module are capable of storing
> compressed column data and evaluating some expressions directly on
> compressed data without decompressing. The common code supporting
> compression is in this module.
>   * Expressions - Builds on the simple operations provided by the
> Types module to support arbitrarily complex expressions over data,
> including scalar expressions, predicates, and aggregate functions with
> and without grouping.
>   * Relational Operators - This module provides the building blocks
> for queries in Quickstep. A query is represented as a directed acyclic
> graph of relational operators, each of which is responsible for
> applying some relational-algebraic operation(s) to transform its
> input. Operators generate individual self-contained "work orders" that
> can be executed independently. Most operators are parallelism-friendly
> and generate one work-order per storage block of input.
>   * Query Execution - Handles the actual scheduling and execution of
> work from a query at runtime. The central class is the Foreman, an
> independent thread with a global view of the query plan and progress.
> The Foreman dispatches work-orders to stateless Worker threads and
> monitors their progress, and also coordinates streaming of partial
> results between producers and consumers in a query plan DAG to
> maximize parallelism. This module also includes the QueryContext
> class, which holds global shared state for an individual query and is
> designed to support easy serialization/deserialization for distributed
> execution.
>   * Parser - A simple SQL lexer and parser that parses SQL syntax into
> an abstract syntax tree for consumption by the Query Optimizer.
>   * Query Optimizer - Takes the abstract syntax tree generated by the
> parser and transforms it into a runable query-plan DAG for the Query
> Execution module. The Query Optimizer is responsible for resolving
> references to relations and attributes in the query, checking it for
> semantic correctness, and applying optimizations (e.g. filter
> pushdown, column pruning, join ordering) as part of the transformation
> process.
>   * Command-Line Interface - An interactive SQL shell interface to Quickstep.
> 
> Quickstep is implemented in C++ and does not require many external
> libraries to run. Quickstep is currently an open source project
> licensed under the Apache License Version 

[DISCUSS] Quickstep incubation proposal

2016-03-15 Thread Roman Shaposhnik
Hi!

It is my pleasure to present the proposal to incubate the Quickstep project
at the Apache Software Foundation. Quickstep is a high-performance
next generation, database engine available under Apache License 2.0.

The text of the proposal is included below and is also available at
   https://wiki.apache.org/incubator/QuickstepProposal

Thanks,
Roman.

== Abstract ==

Quickstep is a high-performance database engine. It is designed to (1)
convert data to insights at bare-metal speed, (2) support multiple
query surfaces including SQL (the first (and current) version only
supports SQL, and (3) deliver bare-metal performance on any hardware
(including running on a laptop, running on a high-end (single node)
server, and running on a distributed cluster). Since its inception,
the project has been planned to deliver a high-performance single node
system first, followed by a distributed system.

Quickstep is composed of several different modules that handle
different concerns of a database system. The main modules are:
  * Utility - Reusable general-purpose code that is used by many other modules.
  * Threading - Provides a cross-platform abstraction for threads and
synchronization primitives that abstract the underlying OS threading
features.
  * Types - The core type system used across all of Quickstep. Handles
details of how SQL types are stored, parsed, serialized &
deserialized, and converted. Also includes basic containers for typed
values (tuples and column-vectors) and low-level operations that apply
to typed values (e.g. basic arithmetic and comparisons).
  * Catalog - Tracks database schema as well as physical storage
information for relations (e.g. which physical blocks store a
relation's data, and any physical partitioning and placement
information).
  * Storage - Physically stores relational data in self-contained,
self-describing blocks, both in-memory and on persistent storage (disk
or a distributed filesystem). Also includes some heavyweight run-time
data structures used in query processing (e.g. hash tables for join
and aggregation). Includes a buffer manager component for managing
memory use and a file manager component that handles data persistence.
  * Compression - Implements ordered dictionary compression. Several
storage formats in the Storage module are capable of storing
compressed column data and evaluating some expressions directly on
compressed data without decompressing. The common code supporting
compression is in this module.
  * Expressions - Builds on the simple operations provided by the
Types module to support arbitrarily complex expressions over data,
including scalar expressions, predicates, and aggregate functions with
and without grouping.
  * Relational Operators - This module provides the building blocks
for queries in Quickstep. A query is represented as a directed acyclic
graph of relational operators, each of which is responsible for
applying some relational-algebraic operation(s) to transform its
input. Operators generate individual self-contained "work orders" that
can be executed independently. Most operators are parallelism-friendly
and generate one work-order per storage block of input.
  * Query Execution - Handles the actual scheduling and execution of
work from a query at runtime. The central class is the Foreman, an
independent thread with a global view of the query plan and progress.
The Foreman dispatches work-orders to stateless Worker threads and
monitors their progress, and also coordinates streaming of partial
results between producers and consumers in a query plan DAG to
maximize parallelism. This module also includes the QueryContext
class, which holds global shared state for an individual query and is
designed to support easy serialization/deserialization for distributed
execution.
  * Parser - A simple SQL lexer and parser that parses SQL syntax into
an abstract syntax tree for consumption by the Query Optimizer.
  * Query Optimizer - Takes the abstract syntax tree generated by the
parser and transforms it into a runable query-plan DAG for the Query
Execution module. The Query Optimizer is responsible for resolving
references to relations and attributes in the query, checking it for
semantic correctness, and applying optimizations (e.g. filter
pushdown, column pruning, join ordering) as part of the transformation
process.
  * Command-Line Interface - An interactive SQL shell interface to Quickstep.

Quickstep is implemented in C++ and does not require many external
libraries to run. Quickstep is currently an open source project
licensed under the Apache License Version 2.0 and governed by a group
of engineers at Pivotal.

Quickstep began in 2011 as a research project in the Computer Sciences
Department at the University of Wisconsin
https://quickstep.cs.wisc.edu/ and the copyrights underlying the
project was transferred to a company called Quickstep Technologies,
which was acquired by Pivotal in 2015.

== Proposal ==
The goal of this proposal is to