Re: [HACKERS] LWLock optimization for multicore Power machines

2017-08-30 Thread Sokolov Yura
On 2017-08-30 16:24, Tom Lane wrote: Alexander Korotkov writes: It doesn't seems to make sense to consider this patch unless we get access to suitable Power machine to reproduce benefits. This is why I'm going to mark this patch "Returned with feedback". Once we

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-08-30 Thread Tom Lane
Alexander Korotkov writes: > It doesn't seems to make sense to consider this patch unless we get access > to suitable Power machine to reproduce benefits. > This is why I'm going to mark this patch "Returned with feedback". > Once we would get access to the appropriate

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-08-30 Thread Alexander Korotkov
On Thu, Apr 6, 2017 at 5:38 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Thu, Apr 6, 2017 at 5:37 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > >> On Thu, Apr 6, 2017 at 2:16 AM, Andres Freund wrote: >> >>> On 2017-04-03 11:56:13 -0700,

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-04-06 Thread Alexander Korotkov
On Thu, Apr 6, 2017 at 5:37 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Thu, Apr 6, 2017 at 2:16 AM, Andres Freund wrote: > >> On 2017-04-03 11:56:13 -0700, Andres Freund wrote: >> > Have you done x86 benchmarking? >> >> I think unless such benchmarking is

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-04-06 Thread Alexander Korotkov
On Thu, Apr 6, 2017 at 2:16 AM, Andres Freund wrote: > On 2017-04-03 11:56:13 -0700, Andres Freund wrote: > > > > > +/* > > > + * Generic implementation of pg_atomic_fetch_mask_add_u32() via loop > > > + * of compare & exchange. > > > + */ > > > +static inline uint32 > > >

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-04-05 Thread Tom Lane
Andres Freund writes: > On 2017-04-03 11:56:13 -0700, Andres Freund wrote: >> Have you done x86 benchmarking? > I think unless such benchmarking is done in the next 24h we need to move > this patch to the next CF... In theory, inlining the _impl function should lead to

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-04-05 Thread Andres Freund
Hi, On 2017-04-03 11:56:13 -0700, Andres Freund wrote: > > > +/* > > + * Generic implementation of pg_atomic_fetch_mask_add_u32() via loop > > + * of compare & exchange. > > + */ > > +static inline uint32 > > +pg_atomic_fetch_mask_add_u32_impl(volatile pg_atomic_uint32 *ptr, > > +

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-04-03 Thread Andres Freund
Hi, On 2017-03-31 13:38:31 +0300, Alexander Korotkov wrote: > > It seems that on this platform definition of atomics should be provided by > > fallback.h. But it doesn't because I already defined > > PG_HAVE_ATOMIC_U32_SUPPORT > > in arch-ppc.h. I think in this case we shouldn't provide

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-31 Thread Tom Lane
Alexander Korotkov writes: >> It seems that on this platform definition of atomics should be provided by >> fallback.h. But it doesn't because I already defined >> PG_HAVE_ATOMIC_U32_SUPPORT >> in arch-ppc.h. I think in this case we shouldn't provide ppc-specific >>

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-31 Thread Alexander Korotkov
On Sun, Mar 26, 2017 at 12:29 AM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > On Sat, Mar 25, 2017 at 11:32 PM, Tom Lane wrote: > >> Alexander Korotkov writes: >> > I moved PPC implementation of pg_atomic_fetch_mask_add_u32() into >> >

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Alexander Korotkov
On Sat, Mar 25, 2017 at 11:32 PM, Tom Lane wrote: > Alexander Korotkov writes: > > I moved PPC implementation of pg_atomic_fetch_mask_add_u32() into > > port/atomics/arch-ppc.h. I also had to declare pg_atomic_uint32 there to > > satisfy usage of

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Tom Lane
Alexander Korotkov writes: > I moved PPC implementation of pg_atomic_fetch_mask_add_u32() into > port/atomics/arch-ppc.h. I also had to declare pg_atomic_uint32 there to > satisfy usage of this type as argument > of pg_atomic_fetch_mask_add_u32_impl(). Hm, you did

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Alexander Korotkov
On Sat, Mar 25, 2017 at 8:44 PM, Tom Lane wrote: > Alexander Korotkov writes: > > [ lwlock-power-3.patch ] > > I experimented with this patch a bit. I can't offer help on testing it > on large PPC machines, but I can report that it breaks the

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Tom Lane
Alexander Korotkov writes: > [ lwlock-power-3.patch ] I experimented with this patch a bit. I can't offer help on testing it on large PPC machines, but I can report that it breaks the build on Apple PPC machines, apparently because of nonstandard assembly syntax. I

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-25 Thread Alexander Korotkov
On Thu, Mar 16, 2017 at 8:35 PM, David Steele wrote: > On 2/21/17 9:54 AM, Bernd Helmle wrote: > > Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov: > >> +1 > >> And you could try to use pg_wait_sampling > >>

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-24 Thread David Steele
Hi Alexander, On 3/16/17 1:35 PM, David Steele wrote: On 2/21/17 9:54 AM, Bernd Helmle wrote: Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov: +1 And you could try to use pg_wait_sampling to sampling of wait events. I've

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-16 Thread David Steele
On 2/21/17 9:54 AM, Bernd Helmle wrote: > Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov: >> +1 >> And you could try to use pg_wait_sampling >> to sampling of wait >> events. > > I've tried this with your example from your

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-21 Thread Bernd Helmle
Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov: > +1 > And you could try to use pg_wait_sampling > to sampling of wait > events. I've tried this with your example from your blog post[1] and got this: (pgbench scale 1000)

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-21 Thread Alexander Korotkov
On Tue, Feb 21, 2017 at 1:47 PM, Bernd Helmle wrote: > Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov: > > I think it would make sense to run more kinds of tests. Could you > > try set > > of tests provided by Tomas Vondra? > > If even we wouldn't see

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-16 Thread Bernd Helmle
Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov: > +1 > And you could try to use pg_wait_sampling > to sampling of wait > events. Okay, i'm going to try this. Currently Tomas' scripts are still running, i'll provide updates as

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-14 Thread Alexander Korotkov
On Mon, Feb 13, 2017 at 10:17 PM, Tomas Vondra wrote: > On 02/13/2017 03:16 PM, Bernd Helmle wrote: > >> Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov: >> >>> Thus, I see reasons why in your tests absolute results are lower than >>> in my >>>

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-13 Thread Tomas Vondra
On 02/13/2017 03:16 PM, Bernd Helmle wrote: Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov: Thus, I see reasons why in your tests absolute results are lower than in my previous tests. 1. You use 28 physical cores while I was using 32 physical cores. 2. You run tests in

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-13 Thread Bernd Helmle
Am Samstag, den 11.02.2017, 15:42 +0300 schrieb Alexander Korotkov: > Thus, I see reasons why in your tests absolute results are lower than > in my > previous tests. > 1.  You use 28 physical cores while I was using 32 physical cores. > 2.  You run tests in PowerVM while I was running test on bare

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-11 Thread Tomas Vondra
On 02/11/2017 01:42 PM, Alexander Korotkov wrote: I think it would make sense to run more kinds of tests. Could you try set of tests provided by Tomas Vondra? If even we wouldn't see win some of the tests, it would be still valuable to see that there is no regression there. FWIW it

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-11 Thread Alexander Korotkov
On Wed, Feb 8, 2017 at 5:00 PM, Bernd Helmle wrote: > Am Dienstag, den 07.02.2017, 16:48 +0300 schrieb Alexander Korotkov: > > But win isn't > > as high as I observed earlier. And I wonder why absolute numbers are > > lower > > than in our earlier experiments. We used IBM

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-08 Thread Bernd Helmle
Am Dienstag, den 07.02.2017, 16:48 +0300 schrieb Alexander Korotkov: > But win isn't > as high as I observed earlier.  And I wonder why absolute numbers are > lower > than in our earlier experiments.  We used IBM E880 which is actually > two Did you run your tests on bare metal or were they also

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-07 Thread Alexander Korotkov
On Tue, Feb 7, 2017 at 3:16 PM, Bernd Helmle wrote: > Am Montag, den 06.02.2017, 22:44 +0300 schrieb Alexander Korotkov: > >2. Also could you run each test longer: 3-5 mins, and run them with >variety of clients count? > > > So here are some other results. I've

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-07 Thread Bernd Helmle
Am Montag, den 06.02.2017, 22:44 +0300 schrieb Alexander Korotkov: >    2. Also could you run each test longer: 3-5 mins, and run them > with > >    variety of clients count? So here are some other results. I've changed max_connections to 300. The bench was prewarmed and run 300s each. I could

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-06 Thread Bernd Helmle
On Mon, 2017-02-06 at 22:44 +0300, Alexander Korotkov wrote: > Results looks strange for me.  I wonder why there is difference > between > lwlock-power-1.patch and lwlock-power-3.patch?  From my intuition, it > shouldn't be there because it's not much difference between them.  > Thus, I > have

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-06 Thread Andres Freund
Hi, On 2017-02-03 20:01:03 +0300, Alexander Korotkov wrote: > Using assembly in lwlock.c looks rough. This is why I refactored it by > introducing new atomic operation pg_atomic_fetch_mask_add_u32 (see > lwlock-power-2.patch). It checks that all masked bits are clear and then > adds to

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-06 Thread Alexander Korotkov
On Mon, Feb 6, 2017 at 8:28 PM, Bernd Helmle wrote: > Am Montag, den 06.02.2017, 16:45 +0300 schrieb Alexander Korotkov: > > I tried lwlock-power-2.patch on multicore Power machine we have in > > PostgresPro. > > I realized that using labels in assembly isn't safe. Thus, I

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-06 Thread Bernd Helmle
Am Montag, den 06.02.2017, 16:45 +0300 schrieb Alexander Korotkov: > I tried lwlock-power-2.patch on multicore Power machine we have in > PostgresPro. > I realized that using labels in assembly isn't safe.  Thus, I removed > labels and use relative jumps instead (lwlock-power-2.patch). >

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-06 Thread Alexander Korotkov
On Fri, Feb 3, 2017 at 11:31 PM, Bernd Helmle wrote: > > UPD: It appears that Postgres Pro have access to big Power machine > > now. > > So, I can do testing/benchmarking myself. > > We currently also have access to a LPAR on an E850 machine with 4 > sockets POWER8 running

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-03 Thread Robert Haas
On Fri, Feb 3, 2017 at 12:01 PM, Alexander Korotkov wrote: > Hi everybody! > > During FOSDEM/PGDay 2017 developer meeting I said that I have some special > assembly optimization for multicore Power machines. From the answers of > other hackers I realized following. > >

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-03 Thread Bernd Helmle
On Fri, 2017-02-03 at 20:11 +0300, Alexander Korotkov wrote: > On Fri, Feb 3, 2017 at 8:01 PM, Alexander Korotkov < > a.korot...@postgrespro.ru> wrote: > > > Unfortunately, I have no big enough Power machine at hand to > > reproduce > > that results.  Actually, I have no Power machine at hand at

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-02-03 Thread Alexander Korotkov
On Fri, Feb 3, 2017 at 8:01 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > Unfortunately, I have no big enough Power machine at hand to reproduce > that results. Actually, I have no Power machine at hand at all. So, > lwlock-power-2.patch was written "blindly". I would very

[HACKERS] LWLock optimization for multicore Power machines

2017-02-03 Thread Alexander Korotkov
Hi everybody! During FOSDEM/PGDay 2017 developer meeting I said that I have some special assembly optimization for multicore Power machines. From the answers of other hackers I realized following. 1. There are some big Power machines with PostgreSQL in production use. Not as many as