On Wed, Feb 8, 2017 at 8:07 AM, Robert Haas <robertmh...@gmail.com> wrote:
Thanks for the detailed review and your valuable feedback. > I think it would be useful to break the remaining patch into two > parts, one that enhances the tidbitmap.c API and another that uses > that to implement Parallel Bitmap Heap Scan. I have done that. > There are several cosmetic problems here. You may have noticed that > all function prototypes in PostgreSQL header files are explicitly > declared extern; yours should be, too. Also, there is extra > whitespace before some of the variable names here, like > "ParallelTBMInfo * tbminfo" instead of "ParallelTBMInfo *tbminfo". If > that is due to pgindent, the solution is to add the typedef names to > your local typedef list. Also, tbm_restore_shared_members doesn't > actually exist. Fixed > 1. Add an additional argument to tbm_create(), dsa_area *dsa. If it's > NULL, we allocate a backend-private memory for the hash entries as > normal. If it's true, we use the dsa_area to store the hash entries, > using the infrastructure added by your 0002 and revised in > c3c4f6e1740be256178cd7551d5b8a7677159b74. (You can use a flag in the > BitmapIndexScan and BitmapOr to decide whether to pass NULL or > es_query_dsa.) Done > > 2. Add a new function dsa_pointer tbm_prepare_shared_iterate(TIDBitmap > *tbm) which allocates shared iteration arrays using the DSA passed to > tbm_create() and returns a dsa_pointer to the result. Arrange this so > that if it's called more than once, each call returns a separate > iterator, so that you can call it once to get the main iterator and a > second time for the prefetch iterator, but have both of those point to > the same underlying iteration arrays. > > 3. Add a new function TBMSharedIterator > *tbm_attach_shared_iterate(dsa_area *dsa, dsa_pointer) which is called > once per backend and gets passed the dsa_pointer from the previous > step. > > 4. Add a new function TBMIterateResult > *tbm_shared_iterate(TBMSharedIterator *) to fetch the next result. I have tried to get these three API's as you explained with one change. In tbm_attach_shared_iterate I need to pass TBM also because each worker will create their own copy of TBM. Those workers need to get the TBM-related information from the shared location. Even though I try to access some of the information like chunk, npages from directly shared location, but some other information like base pointer to dsa element, RelptrPagetableEntry etc. should be local to each worker (after conversion from dsa pointer to local pointer). > > As compared with your proposal, this involves only 3 functions instead > of 7 (plus one new argument to another function), and I think it's a > lot clearer what those functions are actually doing. You don't need > tbm_estimate_parallel_tbminfo() any more because the data being passed > from one backend to another is precisely a dsa_pointer, and the bitmap > scan can just leave space for that in its own estimator function. You > don't need tbm_update_shared_members() separately from > tbm_begin_shared_iterate() separately from tbm_init_shared_iterator() > because tbm_prepare_shared_iterate() can do all that stuff. You don't > need tbm_set_parallel() because the additional argument to > tbm_create() takes care of that need. Right > > The way you've got ParallelTBMInfo set up right now doesn't respect > the separation of concerns between nodeBitmapHeapscan.c and > tidbitmap.c properly. tidbitmap.c shouldn't know that the caller > intends on creating two iterators, one of which is for prefetching. > The design above hopefully addresses that: each call to > tbm_prepare_shared_iterate returns a dsa_pointer to a separate chunk > of shared iterator state, but tidbitmap.c does not need to know how > many times that will be called. Done > > Apart from the above, this patch will need a rebase over today's > commits, Done and please make sure all functions you add have high-quality > header comments. I tried my best, please check the latest patch (0001). Apart from those, there are some more changes. 1. I have moved the dsa_pointer and dsa_area declaration from dsa.h to postgres .h, an independent patch is attached for the same. Because we need to declare function headers with dsa_pointer in tidbitmap.h, but tidbitmap.h also used in FRONTEND, therefore, this can not include dsa.h (as it contains atomics.h). 2. I noticed there was one defect in my code related to handling the TBM_ONE_PAGE, In initial version, there was no problem, but it got introduced somewhere in intermediate versions. I have fixed the same. There were two option to do that 1. Don’t switch to TBM_ONE_PAGE in case of parallel mode (ideally this can not happen only worst estimation can get us there) and directly got to TBM_HASH 2. In shared information keep space for sharing a PagetableEntry. I have implemented 2nd (in the initial versions I implemented with 1st). Note: Patch is also rebased on top of guc_parallel_index_scan_v1.patch from Parallel Index Scan thread[1] [1] https://www.postgresql.org/message-id/CAA4eK1%2BTnM4pXQbvn7OXqam%2Bk_HZqb0ROZUMxOiL6DWJYCyYow%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
move-dsa-to-postgres-h.patch
Description: Binary data
0001-tidbitmap-support-shared-v1.patch
Description: Binary data
0002-parallel-bitmap-heapscan-v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers