2017-01-10 12:53 GMT+03:00 Alexander Korotkov <[email protected]>:
>
> 1. What project ideas we have?
>
>
Hi!
We would like to propose a project on rewriting PostgreSQL executor from
traditional Volcano-style [1] to so-called push-based architecture as
implemented in
Hyper [2][3] and VitesseDB [4]. The idea is to reverse the direction of
data flow
control: instead of pulling up tuples one-by-one with ExecProcNode(), we
suggest
pushing them from below to top until blocking operator (e.g. Aggregation) is
encountered. There’s a good example and more detailed explanation for this
approach in [2].
The advantages of this approach:
* It allows to completely avoid the need of loading/storing the internal
state of the bottommost
(scanning) nodes, which will significantly reduce overhead. With current
pull-based model,
we call functions like heapgettup_pagemode() (and many others)
number-of-tuples-to-retrieve
times, while in push-based model we will call them only once. Currently,
we have
implemented a prototype for SeqScan node and achieved 2x speedup on query
“select * from lineitem”;
* The number of memory accesses is minimized; generally better code and
data locality,
cache is used more effectively;
* Switching to push model also makes a good base for building effective
JIT-compiler.
Currently we have working LLVM-based JIT compiler for expressions [5],
as well as whole query
JIT-compiler [6], which speeds up TPC-H queries up to 4-5 times, but the
latter took manually
re-implementing the executor logic with LLVM API using push model to get
this speedup. JIT-compiling
from original Postgres C code didn't give significant improvement
because of Volcano-style model
inherent inefficiency. After making a switch to push-model we expect to
achieve speedup comparable
to stand-alone JIT, but using the same code for both JIT and the
interpreter.
Also, while working on this project, we are likely be revealing and fixing
other
weak places of the current query executor. Volcano-style model is known to
have
inadequate performance characteristics [7][8], e.g. function call overhead,
and we should deal with it anyway. We also plan to make relatively small
patches,
which will optimize the redundant reload of the internal state in the
current pull-model.
Many DB systems with support of full query compilation (e.g. LegoBase [9],
Hekaton [10]) implement it in push-based manner.
Also we have seen in the mailing list that Kumar Rajeev had been
investigating this idea too, and he reported that the results were
impressive (unfortunately, without specifying more details):
https://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904%40szxeml521-mbs.china.huawei.com
References
[1] Graefe G.. Volcano — an extensible and parallel query evaluation
system. IEEE Trans. Knowl. Data Eng.,6(1): 120–135, 1994.
[2] Efficiently Compiling Efficient Query Plans for Modern Hardware,
http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
[3] Compiling Database Queries into Machine Code,
http://sites.computer.org/debull/A14mar/p3.pdf
[4]
https://docs.google.com/presentation/d/1R0po7_Wa9fym5U9Y5qHXGlUi77nSda2LlZXPuAxtd-M/pub?slide=id.g9b338944f_4_131
[5] PostgreSQL with JIT compiler for expressions,
https://github.com/ispras/postgres
[6] LLVM Cauldron, slides,
http://llvm.org/devmtg/2016-09/slides/Melnik-PostgreSQLLLVM.pdf
[7] MonetDB/X100: Hyper-Pipelining Query Execution
http://cidrdb.org/cidr2005/papers/P19.pdf
[8] Vectorization vs. Compilation in Query Execution,
https://pdfs.semanticscholar.org/dcee/b1e11d3b078b0157325872a581b51402ff66.pdf
[9] http://www.vldb.org/pvldb/vol7/p853-klonatos.pdf
[10]
https://www.microsoft.com/en-us/research/wp-content/uploads/2013/06/Hekaton-Sigmod2013-final.pdf
--
*Best Regards,**Ruben.* <[email protected]>
ISP RAS.
Project title: Implementing push-based query executor
Project Description
Currently, PostgreSQL uses traditional Volcano-style [1] query execution model.
While it is a simple and flexible model, it behaves poorly on modern superscalar
CPUs [2][3] due to lack of locality and frequent instruction mispredictions.
It becomes a major issue for complex OLAP queries with CPU-heavy workloads.
We propose to implement so-called push-based query executor model
as described in [4][5], which improves code and data locality and cache usage
itself; also push-based executor can serve as a platform
for efficient JIT query compilation.
See [6] for more details.
Skills needed
The ability to understand and modify PostgresSQL executor code;
The ability to run careful in-memory benchmarks to demonstrate the result;
The ability to profile Postgres in order to find slow code;
Understanding modern processors features (pipelining, superscalar CPUs,
branch prediction, etc) would be very helpful.
Difficulty Level
Moderate-level; however, microoptimizations might be hard.
Probably it will also be hard to keep the whole architecture as clean as it is
now.
Expected Outcomes
Patch with implemented push-based query executor; small patches,
which will optimize the current query executor; benchmarks showing the
performance of queries on plain Postgres and with this patch applied.
References
[1] Graefe G.. Volcano — an extensible and parallel query evaluation system.
IEEE Trans. Knowl. Data Eng.,6(1): 120–135, 1994.
[2] MonetDB/X100: Hyper-Pipelining Query Execution
http://cidrdb.org/cidr2005/papers/P19.pdf
[3] Vectorization vs. Compilation in Query Execution,
https://pdfs.semanticscholar.org/dcee/b1e11d3b078b0157325872a581b51402ff66.pdf
[4] Efficiently Compiling Efficient Query Plans for Modern Hardware,
http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
[5] Compiling Database Queries into Machine Code,
http://sites.computer.org/debull/A14mar/p3.pdf
[6] [message link, not the whole thread]
https://www.postgresql.org/message-id/flat/CAJEAwVFnYMenEe2A9srVuNVemAoW%2BtT_uEs%3D2p427KfsegsJPw%40mail.gmail.com#CAJEAwVFnYMenEe2A9srVuNVemAoW+tT_uEs=2p427kfsegs...@mail.gmail.com
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers