On 08/24/2014 09:40 AM, Haribabu Kommi wrote: > Any suggestions?
Another point I didn't raise first time around, but that's IMO quite significant, is that you haven't addressed why this approach to fully parallel seqscans is useful and solves real problems in effective ways. It might seem obvious - "of course they're useful". But I see two things they'd address: - CPU-limited sequential scans, where expensive predicates are filtering the scan; and - I/O limited sequential scans, where the predicates already execute fast enough on one CPU, so most time is spent waiting for more disk I/O. The problem I see with your design is that it's not going to be useful for a large class of CPU-limited scans where the predicate isn't composed entirely of immutable functions an operators. Especially since immutable-only predicates are the best candidates for expression indexes anyway. While it'd likely be useful for I/O limited scans, it's going to increase contention on shared_buffers locking and page management. More importantly, is it the most efficient way to solve the problem with I/O limited scans? I would seriously suggest looking at first adding support for asynchronous I/O across ranges of extents during a sequential scan. You might not need multiple worker backends at all. I'm sure using async I/O to implement effective_io_concurrency for seqscans has been been discussed and explored before, so again I think some time in the list archives might make sense. I don't know if it makes sense to do something as complex and parallel multi-process seqscans without having a path forward for supporting non-immutable functions - probably with fmgr API enhancements, additional function labels ("PARALLEL"), etc, depending on what you find is needed. Do you have specific workloads where you see this as useful, and where doing async I/O and readahead within a single back-end wouldn't solve the same problem? >>> 3. In the executor Init phase, Try to copy the necessary data required >>> by the workers and start the workers. >> >> Copy how? >> >> Back-ends can only communicate with each other over shared memory, >> signals, and using sockets. > > Sorry for not being clear, copying those data structures into dynamic > shared memory only. > From there the workers can access. That'll probably work with read-only data, but it's not viable for read/write data unless you use a big lock to protect it, in which case you lose the parallelism you want to achieve. You'd have to classify what may be modified during scan execution carefully and determine if you need to feed any of the resulting modifications back to the original backend - and how to merge modifications by multiple workers, if it's even possible. That's going to involve a detailed structure-by-structure analysis and seems likely to be error prone and buggy. I think you should probably talk to Robert Haas about what he's been doing over the last couple of years on parallel query. >>> 4. In the executor run phase, just get the tuples which are sent by >>> the workers and process them further in the plan node execution. >> >> Again, how do you propose to copy these back to the main bgworker? > > With the help of message queues that are created in the dynamic shared memory, > the workers can send the data to the queue. On other side the main > backend receives the tuples from the queue. OK, so you plan to implement shmem queues. That'd be a useful starting point, as it'd be something that would be useful in its own right. You'd have to be able to handle individual values that're than the ring buffer or whatever you're using for transfers, in case you're dealing with already-detoasted tuples or in-memory tuples. Again, chatting with Robert and others who've worked on dynamic shmem, parallel query, etc would be wise here. > Yes you are correct. For that reason only I am thinking of Supporting > of functions > that only dependent on input variables and are not modifying any global data. You'll want to be careful with that. Nothing stops an immutable function referencing a cache in a C global that it initializes one and then treats as read only, for example. I suspect you'll need a per-function whitelist. I'd love to be wrong. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers