On Wed, Oct 29, 2014 at 4:48 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > If you do wish to pursue || Seq Scan, then a working prototype would > help. It allows us to see that there is an open source solution we are > working to solve the problems for. People can benchmark it, understand > the benefits and issues it raises and that would help focus attention > on the problems you are trying to solve in infrastructure. People may > have suggestions on how to solve or avoid those that you hadn't > thought of.
I've mulled that over a bit and it might be worth pursuing further. Of course there's always the trade-off: doing that means not doing something else. > As I mentioned previously when you started discussing shared memory > segments, parallel sort does NOT require shared memory. The only thing > you need to share are files. Split the problem into N pieces, sort > them to produce N files and then merge the files using existing code. > That only applies to large sorts, but then those are the ones you > cared about doing in parallel anyway. A simple implementation of this would work only for simple pass-by-value types, like integers. Pass-by-reference types require the comparator to de-TOAST, and some other types require catalog lookups. I don't think that's very useful: Noah previously did some analysis of this problem and concluded (with apologies if I'm remember the details incorrectly here) that the comparator for strings was something like 1000x as expensive as the comparator for integers, and that you basically couldn't get the latter to take enough time to be worth parallelizing. I care much more about getting the general infrastructure in place to make parallel programming feasible in PostgreSQL than I do about getting one particular case working. And more than feasible: I want it to be relatively straightforward. That's not simple, but the potential rewards are great. Let's face it: there are people here who are much better than I am at hacking on the planner and especially the executor than I am. Why haven't any of those people implemented parallel anything? I think it's because, right now, it's just too darn hard. I'm trying to reduce that to something approaching the difficulty of writing normal PostgreSQL backend code, and I think I'm 6-12 patches away from that. This is one of them and, yeah, it's not done, and, yeah, we might not get to parallel anything this release and, yeah, things would be going faster if I could work on parallelism full time. But I think that the progress we are making is meaningful and the goal is within sight. I appreciate that you'd probably attack this problem from a different direction than I'm attacking it from, but I still think that what I'm trying to do is a legitimate direction of attack which, by the way, does not preclude anybody else from attacking it from a different direction and, indeed, such a development would be most welcome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers