hmm, the morsels paper looks really interesting at first sight. Let's see if we can get a poc working in PostgreSQL? :-)
On Tue, May 10, 2016 at 9:42 PM, Konstantin Knizhnik < k.knizh...@postgrespro.ru> wrote: > On 05/10/2016 08:26 PM, Robert Haas wrote: > >> On Tue, May 10, 2016 at 3:00 AM, konstantin knizhnik >> <k.knizh...@postgrespro.ru> wrote: >> >>> What's wrong with it that worker is blocked? You can just have more >>> workers >>> (more than CPU cores) to let other of them continue to do useful work. >>> >> Not really. The workers are all running the same plan, so they'll all >> make the same decision about which node needs to be executed next. If >> that node can't accommodate multiple processes trying to execute it at >> the same time, it will have to block all of them but the first one. >> Adding more processes just increases the number of processes sitting >> around doing nothing. >> > > Doesn't this actually mean that we need to have normal job scheduler which > is given queue of jobs and having some pool of threads will be able to > orginize efficient execution of queries? Optimizer can build pipeline > (graph) of tasks, which corresponds to execution plan nodes, i.e. SeqScan, > Sort, ... Each task is splitted into several jobs which can be concurretly > scheduled by task dispatcher. So you will not have blocked worker waiting > for something and all system resources will be utilized. Such approach with > dispatcher allows to implement quotas, priorities,... Also dispatches can > care about NUMA and cache optimizations which is especially critical on > modern architectures. One more reference: > http://db.in.tum.de/~leis/papers/morsels.pdf > > Sorry, may be I wrong, but I still think that async.ops is "multitasking > for poor":) > Yes, maintaining threads and especially separate processes adds > significant overhead. It leads to extra resources consumption and context > switches are quite expensive. And I know from my own experience that > replacing several concurrent processes performing some IO (for example with > sockets) with just one process using multiplexing allows to increase > performance. But still async. ops. is just a way to make programmer > responsible for managing state machine instead of relying on OS tomake > context switches. Manual transmission is still more efficient than > automatic transmission. But still most drives prefer last one;) > > Seriously, I carefully read your response to Kochei, but still not > convinced that async. ops. is what we need. Or may be we just understand > different thing by this notion. > > > > >> But there are some researches, for example: >>> >>> http://www.vldb.org/pvldb/vol4/p539-neumann.pdf >>> >>> showing that the same or even better effect can be achieved by generation >>> native code for query execution plan (which is not so difficult now, >>> thanks >>> to LLVM). >>> It eliminates interpretation overhead and increase cache locality. >>> I get similar results with my own experiments of accelerating SparkSQL. >>> Instead of native code generation I used conversion of query plans to C >>> code >>> and experiment with different data representation. "Horisontal model" >>> with >>> loading columns on demands shows better performance than columnar store. >>> >> Yes, I think this approach should also be considered. >> >> At this moment (February) them have implemented translation of only few >>> PostgreSQL operators used by ExecQuals and do not support aggregates. >>> Them get about 2 times increase of speed at synthetic queries and 25% >>> increase at TPC-H Q1 (for Q1 most critical is generation of native code >>> for >>> aggregates, because ExecQual itself takes only 6% of time for this >>> query). >>> Actually these 25% for Q1 were achieved not by using dynamic code >>> generation, but switching from PULL to PUSH model in executor. >>> It seems to be yet another interesting PostgreSQL executor >>> transformation. >>> As far as I know, them are going to publish result of their work to open >>> source... >>> >> Interesting. You may notice that in "asynchronous mode" my prototype >> works using a push model of sorts. Maybe that should be taken >> further. >> >> > > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Bert Desmet 0477/305361