[go-nuts] Re: Tweaking sync.Pool for mostly-empty pools
Just FYI, there is currently the following activity regarding sync.Pool optimizations that may land to go1.11: - Increasing per-P pool capacity, which should reduce lock contention on shared items. https://go-review.googlesource.com/c/go/+/49110 . - Avoiding pool resets during GC - https://github.com/golang/go/issues/22950 . On Thursday, February 15, 2018 at 1:56:34 AM UTC+2, Carlo Alberto Ferraris wrote: > > In an attempt to reduce the time pprof says is spent in sync.Pool > (internal workloads, sorry) I modified Get and getSlow to skip locking the > per-P shared pools if the pools are likely to be empty. This yields > promising results, but I'm not sure the approach is sound since the check I > do is inherently racy. > > As a (artificial and contrived) benchmark, I'm using this: > > func BenchmarkPoolUnderflow(b *testing.B) { > var p Pool > b.RunParallel(func(pb *testing.PB) { > for pb.Next() { > p.Put(1) > p.Get() > p.Get() > } > }) > } > > This is meant to simulate a pool in which more or objects are Get() than > are Put() (it wouldn't make much sense to simulate a pool in which we only > get, so to keep things simple I opted for a 2:1 ratio) > > The change I applied to Get and getSlow is the following. Starting from > the current pattern of: > > l := ... # per-P poolLocal > l.Lock() > last := len(l.shared) - 1 > if last >= 0 { > x = l.shared[last] > l.shared = l.shared[:last] > } > l.Unlock() > > I add a check (not protected by the mutex, that is the expensive op we're > trying to skip if it's not necessary) to see if the pool is likely to be > non-empty: > > l := ... # per-P poolLocal > if len(l.shared) > 0 { # the racy check for non-emptiness > l.Lock() > last := len(l.shared) - 1 > if last >= 0 { > x = l.shared[last] > l.shared = l.shared[:last] > } > l.Unlock() > } > > I know I should not call this a benign race, but in this case I don't see > how this can lead to problems. If the racy check gets it right, then it's > almost a net win. If if it gets it wrong, either we do what we do now (i.e. > we lock, just to find an empty pool), or we skip an otherwise non-empty > pool - thereby failing to immediately return an otherwise reusable object > (note that 1. there is a per-P shared pool for each P, so I'd estimate the > chances of this happening on all of them to be pretty low and 2. the Pool > documentation explicitly say that Get is allowed to treat the pool as > empty). Also note that the race detector is already explicitly disabled in > all sync.Pool methods. > > The reason I'm asking is to understand whether my reasoning is sound and, > regardless, if anybody has suggestions about how to do this in a better way. > > The current results (of the approach above, plus some refactoring to > recover some lost performance on the other benchmarks) on my laptop are the > following: > > name old time/op new time/op delta > Pool-4 14.5ns ± 3% 14.2ns ± 2% -1.64% (p=0.023 n=9+10) > PoolOverflow-4 1.99µs ±12% 1.78µs ± 1% -10.62% (p=0.000 n=10+8) > PoolUnderflow-4 152ns ± 6%30ns ± 1% -80.00% (p=0.000 n=10+8) > > (the first two benchmarks are already part of sync.Pool, the last one is > the one I described above) > > Any feedback is welcome. If this is deemed safe I'm going to submit a CL. > > Carlo > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [go-nuts] Re: Tweaking sync.Pool for mostly-empty pools
On Mon, Feb 19, 2018 at 4:56 AM, Chris Hopkins wrote: > > I would have expected the compiler to allowed to change: > if len(l.shared) > 0 { # the racy check for non-emptiness > l.Lock() > last := len(l.shared) - 1 > to > tmp := len(l.shared) > if tmp > 0 { # the racy check for non-emptiness > l.Lock() > last := tmp - 1 > > I'd be interested if this were a legal optimisation. Were it so, then you > could get illegal behaviour. I don't think this is allowed. I don't think the compiler can assume that a load before a lock and a load after a lock return the same value. Ian > On Friday, 16 February 2018 14:57:59 UTC, Carlo Alberto Ferraris wrote: >> >> Also, keep in mind, "being there nothing in the pool" is the common state >> of pools immediately after every GC. >> >> On Friday, February 16, 2018 at 12:05:27 PM UTC+9, Kevin Malachowski >> wrote: >>> >>> If there is likely nothing in the Pool, then maybe it's better to not use >>> one at all. Can you compare the internal workload with an implementation >>> where all callers just call the `New` function directly? What's the purpose >>> of using a pooled memory if there's often nothing in the pool? >>> >>> On Wednesday, February 14, 2018 at 3:56:34 PM UTC-8, Carlo Alberto >>> Ferraris wrote: In an attempt to reduce the time pprof says is spent in sync.Pool (internal workloads, sorry) I modified Get and getSlow to skip locking the per-P shared pools if the pools are likely to be empty. This yields promising results, but I'm not sure the approach is sound since the check I do is inherently racy. As a (artificial and contrived) benchmark, I'm using this: func BenchmarkPoolUnderflow(b *testing.B) { var p Pool b.RunParallel(func(pb *testing.PB) { for pb.Next() { p.Put(1) p.Get() p.Get() } }) } This is meant to simulate a pool in which more or objects are Get() than are Put() (it wouldn't make much sense to simulate a pool in which we only get, so to keep things simple I opted for a 2:1 ratio) The change I applied to Get and getSlow is the following. Starting from the current pattern of: l := ... # per-P poolLocal l.Lock() last := len(l.shared) - 1 if last >= 0 { x = l.shared[last] l.shared = l.shared[:last] } l.Unlock() I add a check (not protected by the mutex, that is the expensive op we're trying to skip if it's not necessary) to see if the pool is likely to be non-empty: l := ... # per-P poolLocal if len(l.shared) > 0 { # the racy check for non-emptiness l.Lock() last := len(l.shared) - 1 if last >= 0 { x = l.shared[last] l.shared = l.shared[:last] } l.Unlock() } I know I should not call this a benign race, but in this case I don't see how this can lead to problems. If the racy check gets it right, then it's almost a net win. If if it gets it wrong, either we do what we do now (i.e. we lock, just to find an empty pool), or we skip an otherwise non-empty pool - thereby failing to immediately return an otherwise reusable object (note that 1. there is a per-P shared pool for each P, so I'd estimate the chances of this happening on all of them to be pretty low and 2. the Pool documentation explicitly say that Get is allowed to treat the pool as empty). Also note that the race detector is already explicitly disabled in all sync.Pool methods. The reason I'm asking is to understand whether my reasoning is sound and, regardless, if anybody has suggestions about how to do this in a better way. The current results (of the approach above, plus some refactoring to recover some lost performance on the other benchmarks) on my laptop are the following: name old time/op new time/op delta Pool-4 14.5ns ± 3% 14.2ns ± 2% -1.64% (p=0.023 n=9+10) PoolOverflow-4 1.99µs ±12% 1.78µs ± 1% -10.62% (p=0.000 n=10+8) PoolUnderflow-4 152ns ± 6%30ns ± 1% -80.00% (p=0.000 n=10+8) (the first two benchmarks are already part of sync.Pool, the last one is the one I described above) Any feedback is welcome. If this is deemed safe I'm going to submit a CL. Carlo > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr
[go-nuts] Re: Tweaking sync.Pool for mostly-empty pools
I would have expected the compiler to allowed to change: if len(l.shared) > 0 { # the racy check for non-emptiness l.Lock() last := len(l.shared) - 1 to tmp := len(l.shared) if tmp > 0 { # the racy check for non-emptiness l.Lock() last := tmp - 1 I'd be interested if this were a legal optimisation. Were it so, then you could get illegal behaviour. Maybe? Chris On Friday, 16 February 2018 14:57:59 UTC, Carlo Alberto Ferraris wrote: > > Also, keep in mind, "being there nothing in the pool" is the common state > of pools immediately after every GC. > > On Friday, February 16, 2018 at 12:05:27 PM UTC+9, Kevin Malachowski wrote: >> >> If there is likely nothing in the Pool, then maybe it's better to not use >> one at all. Can you compare the internal workload with an implementation >> where all callers just call the `New` function directly? What's the purpose >> of using a pooled memory if there's often nothing in the pool? >> >> On Wednesday, February 14, 2018 at 3:56:34 PM UTC-8, Carlo Alberto >> Ferraris wrote: >>> >>> In an attempt to reduce the time pprof says is spent in sync.Pool >>> (internal workloads, sorry) I modified Get and getSlow to skip locking the >>> per-P shared pools if the pools are likely to be empty. This yields >>> promising results, but I'm not sure the approach is sound since the check I >>> do is inherently racy. >>> >>> As a (artificial and contrived) benchmark, I'm using this: >>> >>> func BenchmarkPoolUnderflow(b *testing.B) { >>> var p Pool >>> b.RunParallel(func(pb *testing.PB) { >>> for pb.Next() { >>> p.Put(1) >>> p.Get() >>> p.Get() >>> } >>> }) >>> } >>> >>> This is meant to simulate a pool in which more or objects are Get() than >>> are Put() (it wouldn't make much sense to simulate a pool in which we only >>> get, so to keep things simple I opted for a 2:1 ratio) >>> >>> The change I applied to Get and getSlow is the following. Starting from >>> the current pattern of: >>> >>> l := ... # per-P poolLocal >>> l.Lock() >>> last := len(l.shared) - 1 >>> if last >= 0 { >>> x = l.shared[last] >>> l.shared = l.shared[:last] >>> } >>> l.Unlock() >>> >>> I add a check (not protected by the mutex, that is the expensive op >>> we're trying to skip if it's not necessary) to see if the pool is likely to >>> be non-empty: >>> >>> l := ... # per-P poolLocal >>> if len(l.shared) > 0 { # the racy check for non-emptiness >>> l.Lock() >>> last := len(l.shared) - 1 >>> if last >= 0 { >>> x = l.shared[last] >>> l.shared = l.shared[:last] >>> } >>> l.Unlock() >>> } >>> >>> I know I should not call this a benign race, but in this case I don't >>> see how this can lead to problems. If the racy check gets it right, then >>> it's almost a net win. If if it gets it wrong, either we do what we do now >>> (i.e. we lock, just to find an empty pool), or we skip an otherwise >>> non-empty pool - thereby failing to immediately return an otherwise >>> reusable object (note that 1. there is a per-P shared pool for each P, so >>> I'd estimate the chances of this happening on all of them to be pretty low >>> and 2. the Pool documentation explicitly say that Get is allowed to treat >>> the pool as empty). Also note that the race detector is already explicitly >>> disabled in all sync.Pool methods. >>> >>> The reason I'm asking is to understand whether my reasoning is sound >>> and, regardless, if anybody has suggestions about how to do this in a >>> better way. >>> >>> The current results (of the approach above, plus some refactoring to >>> recover some lost performance on the other benchmarks) on my laptop are the >>> following: >>> >>> name old time/op new time/op delta >>> Pool-4 14.5ns ± 3% 14.2ns ± 2% -1.64% (p=0.023 n=9+10) >>> PoolOverflow-4 1.99µs ±12% 1.78µs ± 1% -10.62% (p=0.000 n=10+8) >>> PoolUnderflow-4 152ns ± 6%30ns ± 1% -80.00% (p=0.000 n=10+8) >>> >>> (the first two benchmarks are already part of sync.Pool, the last one is >>> the one I described above) >>> >>> Any feedback is welcome. If this is deemed safe I'm going to submit a CL. >>> >>> Carlo >>> >>> >>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[go-nuts] Re: Tweaking sync.Pool for mostly-empty pools
Also, keep in mind, "being there nothing in the pool" is the common state of pools immediately after every GC. On Friday, February 16, 2018 at 12:05:27 PM UTC+9, Kevin Malachowski wrote: > > If there is likely nothing in the Pool, then maybe it's better to not use > one at all. Can you compare the internal workload with an implementation > where all callers just call the `New` function directly? What's the purpose > of using a pooled memory if there's often nothing in the pool? > > On Wednesday, February 14, 2018 at 3:56:34 PM UTC-8, Carlo Alberto > Ferraris wrote: >> >> In an attempt to reduce the time pprof says is spent in sync.Pool >> (internal workloads, sorry) I modified Get and getSlow to skip locking the >> per-P shared pools if the pools are likely to be empty. This yields >> promising results, but I'm not sure the approach is sound since the check I >> do is inherently racy. >> >> As a (artificial and contrived) benchmark, I'm using this: >> >> func BenchmarkPoolUnderflow(b *testing.B) { >> var p Pool >> b.RunParallel(func(pb *testing.PB) { >> for pb.Next() { >> p.Put(1) >> p.Get() >> p.Get() >> } >> }) >> } >> >> This is meant to simulate a pool in which more or objects are Get() than >> are Put() (it wouldn't make much sense to simulate a pool in which we only >> get, so to keep things simple I opted for a 2:1 ratio) >> >> The change I applied to Get and getSlow is the following. Starting from >> the current pattern of: >> >> l := ... # per-P poolLocal >> l.Lock() >> last := len(l.shared) - 1 >> if last >= 0 { >> x = l.shared[last] >> l.shared = l.shared[:last] >> } >> l.Unlock() >> >> I add a check (not protected by the mutex, that is the expensive op we're >> trying to skip if it's not necessary) to see if the pool is likely to be >> non-empty: >> >> l := ... # per-P poolLocal >> if len(l.shared) > 0 { # the racy check for non-emptiness >> l.Lock() >> last := len(l.shared) - 1 >> if last >= 0 { >> x = l.shared[last] >> l.shared = l.shared[:last] >> } >> l.Unlock() >> } >> >> I know I should not call this a benign race, but in this case I don't see >> how this can lead to problems. If the racy check gets it right, then it's >> almost a net win. If if it gets it wrong, either we do what we do now (i.e. >> we lock, just to find an empty pool), or we skip an otherwise non-empty >> pool - thereby failing to immediately return an otherwise reusable object >> (note that 1. there is a per-P shared pool for each P, so I'd estimate the >> chances of this happening on all of them to be pretty low and 2. the Pool >> documentation explicitly say that Get is allowed to treat the pool as >> empty). Also note that the race detector is already explicitly disabled in >> all sync.Pool methods. >> >> The reason I'm asking is to understand whether my reasoning is sound and, >> regardless, if anybody has suggestions about how to do this in a better way. >> >> The current results (of the approach above, plus some refactoring to >> recover some lost performance on the other benchmarks) on my laptop are the >> following: >> >> name old time/op new time/op delta >> Pool-4 14.5ns ± 3% 14.2ns ± 2% -1.64% (p=0.023 n=9+10) >> PoolOverflow-4 1.99µs ±12% 1.78µs ± 1% -10.62% (p=0.000 n=10+8) >> PoolUnderflow-4 152ns ± 6%30ns ± 1% -80.00% (p=0.000 n=10+8) >> >> (the first two benchmarks are already part of sync.Pool, the last one is >> the one I described above) >> >> Any feedback is welcome. If this is deemed safe I'm going to submit a CL. >> >> Carlo >> >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[go-nuts] Re: Tweaking sync.Pool for mostly-empty pools
Kevin, We're already using pool only where it's not hurting performance (compared to a bare new). I could probably come up with ad-hoc pools for specific cases, but the approach I described in my previous mail seemed promising because it makes Pool applicable also in cases where you put in less than you get out (not necessarily with the crazy 2:1 ratio I'm using in the simplified benchmark), but having a pool still helps compared to having no pool at all. Note that it also help (10% faster) in the case in which you put in as much you get, and you have multiple items (PoolOverflow). Carlo On Friday, February 16, 2018 at 12:05:27 PM UTC+9, Kevin Malachowski wrote: > > If there is likely nothing in the Pool, then maybe it's better to not use > one at all. Can you compare the internal workload with an implementation > where all callers just call the `New` function directly? What's the purpose > of using a pooled memory if there's often nothing in the pool? > > On Wednesday, February 14, 2018 at 3:56:34 PM UTC-8, Carlo Alberto > Ferraris wrote: >> >> In an attempt to reduce the time pprof says is spent in sync.Pool >> (internal workloads, sorry) I modified Get and getSlow to skip locking the >> per-P shared pools if the pools are likely to be empty. This yields >> promising results, but I'm not sure the approach is sound since the check I >> do is inherently racy. >> >> As a (artificial and contrived) benchmark, I'm using this: >> >> func BenchmarkPoolUnderflow(b *testing.B) { >> var p Pool >> b.RunParallel(func(pb *testing.PB) { >> for pb.Next() { >> p.Put(1) >> p.Get() >> p.Get() >> } >> }) >> } >> >> This is meant to simulate a pool in which more or objects are Get() than >> are Put() (it wouldn't make much sense to simulate a pool in which we only >> get, so to keep things simple I opted for a 2:1 ratio) >> >> The change I applied to Get and getSlow is the following. Starting from >> the current pattern of: >> >> l := ... # per-P poolLocal >> l.Lock() >> last := len(l.shared) - 1 >> if last >= 0 { >> x = l.shared[last] >> l.shared = l.shared[:last] >> } >> l.Unlock() >> >> I add a check (not protected by the mutex, that is the expensive op we're >> trying to skip if it's not necessary) to see if the pool is likely to be >> non-empty: >> >> l := ... # per-P poolLocal >> if len(l.shared) > 0 { # the racy check for non-emptiness >> l.Lock() >> last := len(l.shared) - 1 >> if last >= 0 { >> x = l.shared[last] >> l.shared = l.shared[:last] >> } >> l.Unlock() >> } >> >> I know I should not call this a benign race, but in this case I don't see >> how this can lead to problems. If the racy check gets it right, then it's >> almost a net win. If if it gets it wrong, either we do what we do now (i.e. >> we lock, just to find an empty pool), or we skip an otherwise non-empty >> pool - thereby failing to immediately return an otherwise reusable object >> (note that 1. there is a per-P shared pool for each P, so I'd estimate the >> chances of this happening on all of them to be pretty low and 2. the Pool >> documentation explicitly say that Get is allowed to treat the pool as >> empty). Also note that the race detector is already explicitly disabled in >> all sync.Pool methods. >> >> The reason I'm asking is to understand whether my reasoning is sound and, >> regardless, if anybody has suggestions about how to do this in a better way. >> >> The current results (of the approach above, plus some refactoring to >> recover some lost performance on the other benchmarks) on my laptop are the >> following: >> >> name old time/op new time/op delta >> Pool-4 14.5ns ± 3% 14.2ns ± 2% -1.64% (p=0.023 n=9+10) >> PoolOverflow-4 1.99µs ±12% 1.78µs ± 1% -10.62% (p=0.000 n=10+8) >> PoolUnderflow-4 152ns ± 6%30ns ± 1% -80.00% (p=0.000 n=10+8) >> >> (the first two benchmarks are already part of sync.Pool, the last one is >> the one I described above) >> >> Any feedback is welcome. If this is deemed safe I'm going to submit a CL. >> >> Carlo >> >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[go-nuts] Re: Tweaking sync.Pool for mostly-empty pools
If there is likely nothing in the Pool, then maybe it's better to not use one at all. Can you compare the internal workload with an implementation where all callers just call the `New` function directly? What's the purpose of using a pooled memory if there's often nothing in the pool? On Wednesday, February 14, 2018 at 3:56:34 PM UTC-8, Carlo Alberto Ferraris wrote: > > In an attempt to reduce the time pprof says is spent in sync.Pool > (internal workloads, sorry) I modified Get and getSlow to skip locking the > per-P shared pools if the pools are likely to be empty. This yields > promising results, but I'm not sure the approach is sound since the check I > do is inherently racy. > > As a (artificial and contrived) benchmark, I'm using this: > > func BenchmarkPoolUnderflow(b *testing.B) { > var p Pool > b.RunParallel(func(pb *testing.PB) { > for pb.Next() { > p.Put(1) > p.Get() > p.Get() > } > }) > } > > This is meant to simulate a pool in which more or objects are Get() than > are Put() (it wouldn't make much sense to simulate a pool in which we only > get, so to keep things simple I opted for a 2:1 ratio) > > The change I applied to Get and getSlow is the following. Starting from > the current pattern of: > > l := ... # per-P poolLocal > l.Lock() > last := len(l.shared) - 1 > if last >= 0 { > x = l.shared[last] > l.shared = l.shared[:last] > } > l.Unlock() > > I add a check (not protected by the mutex, that is the expensive op we're > trying to skip if it's not necessary) to see if the pool is likely to be > non-empty: > > l := ... # per-P poolLocal > if len(l.shared) > 0 { # the racy check for non-emptiness > l.Lock() > last := len(l.shared) - 1 > if last >= 0 { > x = l.shared[last] > l.shared = l.shared[:last] > } > l.Unlock() > } > > I know I should not call this a benign race, but in this case I don't see > how this can lead to problems. If the racy check gets it right, then it's > almost a net win. If if it gets it wrong, either we do what we do now (i.e. > we lock, just to find an empty pool), or we skip an otherwise non-empty > pool - thereby failing to immediately return an otherwise reusable object > (note that 1. there is a per-P shared pool for each P, so I'd estimate the > chances of this happening on all of them to be pretty low and 2. the Pool > documentation explicitly say that Get is allowed to treat the pool as > empty). Also note that the race detector is already explicitly disabled in > all sync.Pool methods. > > The reason I'm asking is to understand whether my reasoning is sound and, > regardless, if anybody has suggestions about how to do this in a better way. > > The current results (of the approach above, plus some refactoring to > recover some lost performance on the other benchmarks) on my laptop are the > following: > > name old time/op new time/op delta > Pool-4 14.5ns ± 3% 14.2ns ± 2% -1.64% (p=0.023 n=9+10) > PoolOverflow-4 1.99µs ±12% 1.78µs ± 1% -10.62% (p=0.000 n=10+8) > PoolUnderflow-4 152ns ± 6%30ns ± 1% -80.00% (p=0.000 n=10+8) > > (the first two benchmarks are already part of sync.Pool, the last one is > the one I described above) > > Any feedback is welcome. If this is deemed safe I'm going to submit a CL. > > Carlo > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.