Re: Optimize memory allocation code
On Fri, Sep 25, 2020 at 07:37:07PM -0500, Merlin Moncure wrote: On Fri, Sep 25, 2020 at 7:32 PM Li Japin wrote: > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud wrote: > > Hi, > > On Sat, Sep 26, 2020 at 12:14 AM Li Japin wrote: >> >> Hi, hackers! >> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0() >> to allocate space, thereby I think we can reduce duplication of code. > > The code is duplicated on purpose. There's a comment at the beginning > that mentions it: > > /* duplicates MemoryContextAllocZero to avoid increased overhead */ > > Same for MemoryContextAllocZero() itself. Thanks! How big is this overhead? Is there any way I can test it? Profiler. For example, oprofile. In hot areas of the code (memory allocation is very hot), profiling is the first step. Maybe a micro-benchmark would be better, e.g. a function with a loop doing many palloc/palloc0 calls, or something similar. FWIW I wonder what kind of overhead is this meant to avoid, the comment unfortunaly does not go into any details. I suppose it's to not do extra function calls, but maybe there's something else going on. And maybe the overhead is much lower on modern CPUs (although this seems to come from 8396447cdbd in 2013, so it's not that old). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Optimize memory allocation code
On Sep 29, 2020, at 9:30 PM, Alvaro Herrera mailto:alvhe...@2ndquadrant.com>> wrote: On 2020-Sep-26, Li Japin wrote: Thanks! How big is this overhead? Is there any way I can test it? You could also have a look at the assembly code that your compiler generates -- particularly examine how it changes. Thanks for your advice! The origin assembly code for palloc0 is: 00517690 : 517690: 55 push %rbp 517691: 53 push %rbx 517692: 48 89 fb mov%rdi,%rbx 517695: 48 83 ec 08 sub$0x8,%rsp 517699: 48 81 ff ff ff ff 3f cmp$0x3fff,%rdi 5176a0: 48 8b 2d d9 0c 48 00 mov0x480cd9(%rip),%rbp# 998380 5176a7: 0f 87 d5 00 00 00 ja 517782 5176ad: 48 8b 45 10 mov0x10(%rbp),%rax 5176b1: 48 89 fe mov%rdi,%rsi 5176b4: c6 45 04 00 movb $0x0,0x4(%rbp) 5176b8: 48 89 ef mov%rbp,%rdi 5176bb: ff 10 callq *(%rax) 5176bd: 48 85 c0 test %rax,%rax 5176c0: 48 89 c1 mov%rax,%rcx 5176c3: 74 5b je 517720 5176c5: f6 c3 07 test $0x7,%bl 5176c8: 75 36 jne517700 5176ca: 48 81 fb 00 04 00 00 cmp$0x400,%rbx 5176d1: 77 2d ja 517700 5176d3: 48 01 c3 add%rax,%rbx 5176d6: 48 39 d8 cmp%rbx,%rax 5176d9: 73 35 jae517710 5176db: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 5176e0: 48 83 c0 08 add$0x8,%rax 5176e4: 48 c7 40 f8 00 00 00 movq $0x0,-0x8(%rax) 5176eb: 00 5176ec: 48 39 c3 cmp%rax,%rbx 5176ef: 77 ef ja 5176e0 5176f1: 48 83 c4 08 add$0x8,%rsp 5176f5: 48 89 c8 mov%rcx,%rax 5176f8: 5b pop%rbx 5176f9: 5d pop%rbp 5176fa: c3 retq 5176fb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 517700: 48 89 cf mov%rcx,%rdi 517703: 48 89 da mov%rbx,%rdx 517706: 31 f6 xor%esi,%esi 517708: e8 e3 0e ba ff callq b85f0 51770d: 48 89 c1 mov%rax,%rcx 517710: 48 83 c4 08 add$0x8,%rsp 517714: 48 89 c8 mov%rcx,%rax 517717: 5b pop%rbx 517718: 5d pop%rbp 517719: c3 retq 51771a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 517720: 48 8b 3d 51 0c 48 00 mov0x480c51(%rip),%rdi# 998378 517727: be 64 00 00 00 mov$0x64,%esi 51772c: e8 1f f9 ff ff callq 517050 517731: 31 f6 xor%esi,%esi 517733: bf 14 00 00 00 mov$0x14,%edi 517738: e8 53 6d fd ff callq 4ee490 51773d: bf c5 20 00 00 mov$0x20c5,%edi 517742: e8 99 9b fd ff callq 4f12e0 517747: 48 8d 3d 07 54 03 00 lea0x35407(%rip),%rdi# 54cb55 <__func__.7554+0x45> 51774e: 31 c0 xor%eax,%eax 517750: e8 ab 9d fd ff callq 4f1500 517755: 48 8b 55 38 mov0x38(%rbp),%rdx 517759: 48 8d 3d 80 11 16 00 lea0x161180(%rip),%rdi# 6788e0 <__func__.6248+0x150> 517760: 48 89 de mov%rbx,%rsi 517763: 31 c0 xor%eax,%eax 517765: e8 56 a2 fd ff callq 4f19c0 51776a: 48 8d 15 ff 11 16 00 lea0x1611ff(%rip),%rdx# 678970 <__func__.7326> 517771: 48 8d 3d 20 11 16 00 lea0x161120(%rip),%rdi# 678898 <__func__.6248+0x108> 517778: be eb 03 00 00 mov$0x3eb,%esi 51777d: e8 0e 95 fd ff callq 4f0c90 517782: 31 f6 xor%esi,%esi 517784: bf 14 00 00 00 mov$0x14,%edi 517789: e8 02 6d fd ff callq 4ee490 51778e: 48 8d 3d db 10 16 00 lea0x1610db(%rip),%rdi# 678870 <__func__.6248+0xe0> 517795: 48 89 de mov%rbx,%rsi 517798: 31 c0 xor%eax,%eax 51779a: e8 91 98 fd ff callq 4f1030 51779f: 48 8d 15 ca 11 16 00 lea0x1611ca(%rip),%rdx# 678970 <__func__.7326> 5177a6: 48 8d 3d eb 10 16 00 lea0x1610eb(%rip),%rdi# 678898 <__func__.6248+0x108> 5177ad: be df 03 00 00 mov$0x3df,%esi 5177b2: e8 d9 94 fd ff callq 4f0c90 5177b7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1) 5177be: 00 00 After modified, the palloc0 assembly code is: 00517690 : 517690: 53 push %rbx 517691: 48 89 fb mov%rdi,%rbx 517694: e8 17 ff ff ff callq 5175b0 517699: f6 c3 07 test $0x7,%bl 51769c: 48 89 c1 mov%rax,%rcx 51769f: 75 2f jne5176d0 5176a1: 48 81 fb 00 04 00 00 cmp$0x400,%rbx 5176a8: 77 26 ja 5176d0 5176aa: 48 01 c3 add%rax,%rbx 5176ad: 48 39 d8 cmp%rbx,%rax
Re: Optimize memory allocation code
On 2020-Sep-26, Li Japin wrote: > Thanks! How big is this overhead? Is there any way I can test it? You could also have a look at the assembly code that your compiler generates -- particularly examine how it changes. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Optimize memory allocation code
On Fri, Sep 25, 2020 at 7:32 PM Li Japin wrote: > > > > > On Sep 26, 2020, at 8:09 AM, Julien Rouhaud wrote: > > > > Hi, > > > > On Sat, Sep 26, 2020 at 12:14 AM Li Japin wrote: > >> > >> Hi, hackers! > >> > >> I find the palloc0() is similar to the palloc(), we can use palloc() > >> inside palloc0() > >> to allocate space, thereby I think we can reduce duplication of code. > > > > The code is duplicated on purpose. There's a comment at the beginning > > that mentions it: > > > > /* duplicates MemoryContextAllocZero to avoid increased overhead */ > > > > Same for MemoryContextAllocZero() itself. > > Thanks! How big is this overhead? Is there any way I can test it? Profiler. For example, oprofile. In hot areas of the code (memory allocation is very hot), profiling is the first step. merlin
Re: Optimize memory allocation code
> On Sep 26, 2020, at 8:09 AM, Julien Rouhaud wrote: > > Hi, > > On Sat, Sep 26, 2020 at 12:14 AM Li Japin wrote: >> >> Hi, hackers! >> >> I find the palloc0() is similar to the palloc(), we can use palloc() inside >> palloc0() >> to allocate space, thereby I think we can reduce duplication of code. > > The code is duplicated on purpose. There's a comment at the beginning > that mentions it: > > /* duplicates MemoryContextAllocZero to avoid increased overhead */ > > Same for MemoryContextAllocZero() itself. Thanks! How big is this overhead? Is there any way I can test it? Best regards! -- Japin Li
Re: Optimize memory allocation code
Hi, On Sat, Sep 26, 2020 at 12:14 AM Li Japin wrote: > > Hi, hackers! > > I find the palloc0() is similar to the palloc(), we can use palloc() inside > palloc0() > to allocate space, thereby I think we can reduce duplication of code. The code is duplicated on purpose. There's a comment at the beginning that mentions it: /* duplicates MemoryContextAllocZero to avoid increased overhead */ Same for MemoryContextAllocZero() itself.
Optimize memory allocation code
Hi, hackers! I find the palloc0() is similar to the palloc(), we can use palloc() inside palloc0() to allocate space, thereby I think we can reduce duplication of code. Best regards! -- Japin Li 0001-Optimize-memory-allocation-code.patch Description: 0001-Optimize-memory-allocation-code.patch