On Wed, Jul 29, 2015 at 11:48 PM, Andres Freund <and...@anarazel.de> wrote: > > On 2015-07-29 12:54:59 -0400, Robert Haas wrote: > > I would try to avoid changing lwlock.c. It's pretty easy when so > > doing to create mechanisms that work now but make further upgrades to > > the general lwlock mechanism difficult. I'd like to avoid that. > > I'm massively doubtful that re-implementing parts of lwlock.c is the > better outcome. Then you have two different infrastructures you need to > improve over time.
I agree and modified the patch to use 32-bit atomics based on idea suggested by Robert and didn't modify lwlock.c. > I understand. IMHO it will be a good idea though to ensure that the patch does not cause regression for other setups such as a less powerful > machine or while running with lower number of clients. > Okay, I have tried it on CentOS VM, but the data is totally screwed (at same client count across 2 runs the variation is quite high ranging from 300 to 3000 tps) to make any meaning out of it. I think if you want to collect data on less powerful m/c, then also atleast we should ensure that it is taken in a way that we are sure that it is not due to noise and there is actual regression, then only we can decide whether we should investigate that or not. Can you please try taking data with attached script (perf_pgbench_tpcb_write.sh), few things you need to change in script based on your setup (environment variables defined in beginning of script like PGDATA), other thing is that you need to ensure that name of binaries for HEAD and Patch should be changed in script if you are naming them differently. It will generate the performance data in test*.txt files. Also try to ensure that checkpoint should be configured such that it should not occur in-between tests, else it will be difficult to conclude the results. Some parameters you might want to consider for the same are checkpoint_timeout, checkpoint_completion_target, min_wal_size, max_wal_size. >> >> >> I was telling that fact even without my patch. Basically I have >> tried by commenting ProcArrayLock in ProcArrayEndTransaction. >> > I did not get that. You mean the TPS is same if you run with commenting out ProcArrayLock in ProcArrayEndTransaction? Yes, TPS is almost same as with Patch. > Is that safe to do? No, that is not safe. I have done that just to see what is the maximum gain we can get by reducing the contention around ProcArrayLock. >> >> >> >> No, autovacuum generates I/O due to which sometimes there >> is more variation in Write tests. > Sure, but on an average does it still show similar improvements? > Yes, number with and without autovacuum show similar trend, it's just that you can see somewhat better results (more performance improvement) with autovacuum=off and do manual vacuum at end of each test. > Or does the test becomes IO bound and hence the bottleneck shifts somewhere > else? Can you please post those numbers as well when you get chance? The numbers posted in initial mail or in this mail are with autovacuum =on. >> >> >> >> So among these only step 2 can be common among different >> algorithms, other's need some work specific to each optimization. >> > Right, but if we could encapsulate that work in a function that just needs to work on some shared memory, I think we can build such infrastructure. For now, I have encapsulated the code into 2 separate functions, rather than extending LWLock.c as that could easily lead to code which might not be used in future even though currently it sounds like that and in-future if we need to use same technique elsewhere then we can look into it. > > >> > >> > Regarding the patch, the compare-and-exchange function calls that you've used would work only for 64-bit machines, right? You would need to use equivalent 32-bit calls on a 32-bit machine. > > >> >> I thought that internal API will automatically take care of it, >> example for msvc it uses _InterlockedCompareExchange64 >> which if doesn't work on 32-bit systems or is not defined, then >> we have to use 32-bit version, but I am not certain about >> that fact. >> > Hmm. The pointer will be a 32-bit field on a 32-bit machine. I don't know how exchanging that with 64-bit integer be safe. > True, but that has to be in-general taken care by 64bit atomic API's, like in this case it should fallback to either 32-bit version of API or something that can work on 32-bit m/c. I think fallback support is missing as of now in 64-bit API's which we should have if we want to use those API's, but anyway for now I have modified the patch to use 32-bit atomics. Performance Data with modified patch. pgbench setup ------------------------ scale factor - 300 Data is on magnetic disk and WAL on ssd. pgbench -M prepared tpc-b Data is median of 3 - 30 min runs, the detailed data for all the 3 runs is in attached document (perf_write_procarraylock_data.ods) Head : commit c53f7387 Patch : group_xid_clearing_at_trans_end_rel_v2 Client_Count/Patch_ver (TPS) 1 8 16 32 64 128 HEAD 972 6004 11060 20074 23839 17798 Patch 1005 6260 11368 20318 30775 30215 With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
group_xid_clearing_at_trans_end_v2.patch
Description: Binary data
perf_pgbench_tpcb_write.sh
Description: Bourne shell script
perf_write_procarraylock_data.ods
Description: application/vnd.oasis.opendocument.spreadsheet
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers