Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-16 Thread Roman Savochenko

Hello, Florian

14.08.19 15:07, Florian Weimer пише:

Is there a way to reproduce your results easily?  Upstream, we're
looking for workloads which are difficult to handle for glibc's malloc
and its default settings, so that we hopefully can improve things
eventually.

This way of the ready builds of the application and LiveDisks is
simplest one for me, than writing a test application with simulation
such sort complex load, so you can already install the application,
start and observer.

I meant: Is there a reproduction recipe someone could use, without being
familiar with the application?


Sure, and I have wrote such one in the first email, without specifying 
the program.


About this program, it is OpenSCADA, which packages you may get here 
http://oscada.org/en/main/download/ for the Work version and for Debian 
versions from 7 to 10.


About installing, you may read this one 
http://oscada.org/wiki/Special:MyLanguage/Documents/How_to/Install but 
shortly, you need to install the package openscada-model-aglks, after 
connection a Debian repository of this program.


$ wget http://ftp.oscada.org/Debian/10/openscada/openscada.list
$ cp openscada.list /etc/apt/sources.list.d
$ $ wget -O - http://ftp.oscada.org/Misc/pkgSignKey | sudo apt-key add -
$ apt-get update; apt-get install openscada-model-aglks

The package openscada-model-aglks is a ready configuration and data to 
start and work since it is a simulator itself.


Next let's per the stages:
1. Start the program and set the initial state, fixing the memory 
allocation — measuring the initial memory consumption value
> Just start the program from the desktop menu for the entry "Simulator 
"AGLKS" on the open SCADA system" or by the command:

$ openscada_AGLKS

> Wait for about one minute to fix the memory consumption
> Open the page: 
http://oscada.org/wiki/images/4/42/WebVision_wvis_cfg.png , where you 
can control of the WEB-sessions opening and closing, so allocating and 
freeing the memory — such sort of the iterations.
> Set the "Life time of the sessions" on the page to 1 minute instead 
10, to decrease the waiting time
> In a Web-browser open the page "http://localhost:10002/WebVision;, 
this is the initial memory consumption value.


2. Perform the allocation-freeing iteration
2.1. Open the first Web-interface page from a Web-browser of the host system
> The first page is "http://localhost:10002/WebVision/prj_AGLKS;

2.2. Close the page on the Web-browser
2.3. Wait to close-freeing session of the first Web-interface page on 
the program side, 1 minute — measuring the iteration memory consumption 
value

3. Return to the stage 2 and repeating 5 iterations

But I think, the problem related in linking the areas to the threads, 
and such sort of programs as OpenSCADA, in the Web-mode, recreate the 
threads which then rebind to different arenas, why we have such sort of 
memory leak into the arenas.


And it seems is a conceptual problem of the arenas in GLibC.

Regards, Roman



Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-14 Thread Florian Weimer
* Roman Savochenko:

>> Is there a way to reproduce your results easily?  Upstream, we're
>> looking for workloads which are difficult to handle for glibc's malloc
>> and its default settings, so that we hopefully can improve things
>> eventually.
>
> This way of the ready builds of the application and LiveDisks is
> simplest one for me, than writing a test application with simulation
> such sort complex load, so you can already install the application,
> start and observer.

I meant: Is there a reproduction recipe someone could use, without being
familiar with the application?

Thanks,
Florian



Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-11 Thread Roman Savochenko

Hello, Aurelien and Florian

08.08.19 20:00, Aurelien Jarno wrote:
Thanks you Florian, setting the environment MALLOC_ARENA_MAX=1 I have 
got

the memory effectivity some better even than in Debian 7!



Thanks for the feedback. I think we can therefore considered this bug as
solved. Closing it.

OK, if you think such sort Debian behaviour is good:
- there impossible or hard now to different and detect where is the 
application's memory leak and the developers may always complain to 
GLibC and Debian. :)
- there impossible now, for default, to use Debian into dynamic 
applications limited for memory, which run more than days, like to many 
embedded systems, PLC and so on.



09.08.19 14:53, Florian Weimer wrote:

Thanks you Florian, setting the environment MALLOC_ARENA_MAX=1 I have
got the memory effectivity some better even than in Debian 7!

Is there a way to reproduce your results easily?  Upstream, we're
looking for workloads which are difficult to handle for glibc's malloc
and its default settings, so that we hopefully can improve things
eventually.


This way of the ready builds of the application and LiveDisks is 
simplest one for me, than writing a test application with simulation 
such sort complex load, so you can already install the application, 
start and observer.


About my after-measures — I have set the environment variable 
MALLOC_ARENA_MAX=1 for all my builds of the live disks 
 
of the automation Linux distributive 
.


07.08.19 09:09, Roman Savochenko wrote:


I have a real-task environment on Debian 9 which consume
   initially about 1.6 GB and after a pair days of such work it
   consume about 6GB!


To demonstrate how this problem can be awful in real tasks, I have wrote 
the memory consumption tendency for the both default environment and 
under MALLOC_ARENA_MAX=1.


Also I have wrote the CPU load, to demonstrate of this environment 
variable influence to the performance.


So, the tendency of the memory consumption on a real big application at 
the default conditions and during two days is:





And the tendency of the memory consumption on the same real big 
application at the environment variable MALLOC_ARENA_MAX=1 and during 
two days is:





So, influence to the performance is slightly notating and counted about 
5% (15 > 20%), but the last environment also was under higher user 
loading than the previous one.


Regards, Roman



Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-09 Thread Florian Weimer
* Roman Savochenko:

> Thanks you Florian, setting the environment MALLOC_ARENA_MAX=1 I have
> got the memory effectivity some better even than in Debian 7!

Is there a way to reproduce your results easily?  Upstream, we're
looking for workloads which are difficult to handle for glibc's malloc
and its default settings, so that we hopefully can improve things
eventually.

Thanks,
Florian



Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-07 Thread Roman Savochenko

Hello, Carlos

07.08.19 16:54, Carlos O'Donell wrote:


On Wed, Aug 7, 2019 at 2:12 AM Roman Savochenko 
mailto:romansavoche...@gmail.com>> wrote:


So, we have got such regression, and I have to think about
back-using Debian 7 on such sort dynamic environments and forget
all new ones. :(


The primary thing to determine is if this extra memory is due to 
application demand or not.


Sure not and I have tested that by *valgrind*, and this process of 
fragmentation is really satiated after pointed number in the table .



To determine that I usually use a set of malloc tracing utilities:
https://pagure.io/glibc-malloc-trace-utils

These let you capture the direct API calls and graph the application 
demand, which you can compare to the real usage.


Then you can take your trace of malloc API calls, which represents 
your workload, and run it in the simulator with different tunable 
parameters to see if they make any difference or if the simulator 
reproduces your excess usage. If it does then you can use the workload 
and the simulator as your test case to provide to upstream glibc 
developers to look at the problem.


Thanks, but we have just resolved this problem as a disadvantage of the 
memory arenas, setting which to the 1 completely removes this 
extra-consumption on such kind tasks.




Regards, Roman



Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-07 Thread Roman Savochenko

Hello, Florian

07.08.19 17:04, Florian Weimer wrote:

* Roman Savochenko:

Initial condition of the problem representing is a program in the
single source code, built on-and for Debian 7, 8, 9, 10 with a result
in the Live disks.

I think glibc 2.13 as shipped by Debian was not built with
--enable-experimental-malloc, so it doesn't use arenas.  This can
substantially decrease RSS usage compared to later versions.  You can
get similar behavior by setting the MALLOC_ARENA_MAX environment
variable to 1 or 2.


Thanks you Florian, setting the environment MALLOC_ARENA_MAX=1 I have 
got the memory effectivity some better even than in Debian 7!





Debian 10 also adds a thread cache, which further increases RSS size.
See the manual

   


for details how to change thread cache behavior.


Thanks, this manual I have read from the problem start but not to the 
end. :)


The thread cache has not now of significant influence but I will hold it 
in my mind.


Regards, Roman



Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-07 Thread Florian Weimer
* Roman Savochenko:

> Initial condition of the problem representing is a program in the
> single source code, built on-and for Debian 7, 8, 9, 10 with a result
> in the Live disks.

I think glibc 2.13 as shipped by Debian was not built with
--enable-experimental-malloc, so it doesn't use arenas.  This can
substantially decrease RSS usage compared to later versions.  You can
get similar behavior by setting the MALLOC_ARENA_MAX environment
variable to 1 or 2.

Debian 10 also adds a thread cache, which further increases RSS size.
See the manual

  


for details how to change thread cache behavior.

Thanks,
Florian



Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-07 Thread Carlos O'Donell
On Wed, Aug 7, 2019 at 2:12 AM Roman Savochenko 
wrote:

> So, we have got such regression, and I have to think about back-using
> Debian 7 on such sort dynamic environments and forget all new ones. :(
>

The primary thing to determine is if this extra memory is due to
application demand or not.

To determine that I usually use a set of malloc tracing utilities:
https://pagure.io/glibc-malloc-trace-utils

These let you capture the direct API calls and graph the application
demand, which you can compare to the real usage.

Then you can take your trace of malloc API calls, which represents your
workload, and run it in the simulator with different tunable parameters to
see if they make any difference or if the simulator reproduces your excess
usage. If it does then you can use the workload and the simulator as your
test case to provide to upstream glibc developers to look at the problem.

Cheers,
Carlos.


Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-07 Thread Roman Savochenko

Hello, Aurelien Jarno

06.08.19 23:57, Aurelien Jarno wrote:



The live disks were started under VirtualBox 5.2, where the got data was
measured by *top*.

Can you details more precisely on you measure the memory used? Do you
just get the line corresponding to the process you want to monitor?

Sure

Which column do you take?


"RES", sure



liboscada.so 2.3 MB, ui_WebVision.so 0.9 MB

 From the data we have the memory effectivity on AMD64 and I386 platform:

and the absolute initial size for the both platform:

This indeed really shows an increase in memory consumption with the GNU
libc and the GCC versions. Have you tried to see if it comes mostly from
the GLIBC or from GCC?


No, but I thought initially about GCC, and it version included to this 
table.


After deep familiarizing the problem I saw the GCC of the different 
version build mostly equal in the size binaries and the memory allocator 
is a part of GLibC as the functions malloc(), realloc(), free().



For example you can try to build your application
with GCC 7 on Debian 10.

This I am going to try.

  You can try to build your application on Debian
9 and run it on Debian 10 provided you do not have incompatible
libraries. Also do you use exactly the same versions of other libraries
in your tests?


I have used libraries of the corresponded distributive which does not 
influence to the relative and final extra memory consumption, at least.


And that is not possible:


I know about "the Memory Allocation Tunables" 
(https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html)
and have tried them but:
- I have not got any effect from environments like to
"GLIBC_TUNABLES=glibc.malloc.trim_threshold=128" on Debian 10

GLIBC_TUNABLES should work on Debian 10. Now depending on the workload
you might see more or less effects.
Sure, it is near the measuring method error and far from the Debian 7 
values.

- If the tunables real work, why their do not apply globally (on the system
level) to return the memory effectivity to the level of the Debian 7 (GLibC
2.13)?

Because every workload behave differently, and also not everybody cares
about the same. You seem to care about memory usage, some other care
about performance. The idea is to get a balanced memory allocator which
can be tuned.


This is only a representative example, and in the real life this problem 
is far worse.


I have a real-task environment on Debian 9 which consume initially about 
1.6 GB and after a pair days of such work it consume about 6GB!


In other hand, I also have an old environment on Debian 7 which consumes 
very small extra memory, really frees the consumed memory and works the 
same fast.



- If the new memory allocator (into GLibC 2.28) is so good, how can I return
its memory effectivity to the level of the Debian 7 (GLibC 2.13)?

I have no idea about that, maybe playing with the other tunables.


It is not possible, or if you show me some real working examples, I will 
try.



It's
also not impossible some of the increase is due to the security hardening
that has been enabled in debian over time.


So, we have got such regression, and I have to think about back-using 
Debian 7 on such sort dynamic environments and forget all new ones. :(


Regards, Roman



Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-06 Thread Aurelien Jarno
Hi,

On 2019-08-06 23:01, Roman Savochenko wrote:
> Package: libc6
> Version: 2.19, 2.24, 2.28
> Severity: normal
> 
> --- Please enter the report below this line. ---
> Initial condition of the problem representing is a program in the single
> source code, built on-and for Debian 7, 8, 9, 10 with a result in the Live
> disks.
> 
> The program represents a web-interface of several pages, where here used
> only the first page.
> 
> In building the first page there used wide range of memory chunks: small
> objects of the C++ classes (~100 bytes), resources of the image files (~10
> kbytes), GD memory blocks (~1 kbytes), so on
> 
> The live disks were started under VirtualBox 5.2, where the got data was
> measured by *top*.

Can you details more precisely on you measure the memory used? Do you
just get the line corresponding to the process you want to monitor?
Which column do you take?

> The data measuring under VirtualBox performed into the next stages:
> 1. Start the program and set the initial state, fixing the memory allocation
> — *measuring the initial memory consumption value*
> 2. Perform the allocation-freeing iteration
> 2.1. Open the first Web-interface page from a Web-browser of the host system
> 2.2. Close the page on the Web-browser
> 2.3. Wait to close-freeing session of the first Web-interface page on the
> program side, 1 minute — *measuring the iteration **memory consumption
> value*
> 3. Return to the stage 2 and repeating 5 iterations
> 
> The stage 2.3 tested to real freeing all the allocated memory blocks both by
> the objects counters and by *valgrind*!
> 
> In the result we have next data:
> 
> Environment   Initially, MB   Iter. 1, MB Iter. 2, MB Iter. 3, MB 
> Iter. 4,
> MBIter. 5, MB Resume
> Debian 10 amd64, GLibC 2.28, GCC 8.3.0182 191.5   199 206 
> 212 212
> Satiated on the iteration*4*, base consumption 9.5 MB, extra consumption 20
> MB (*200* %), liboscada.so 3.5 MB, ui_WebVision.so 0.74 MB
> Debian 9 amd64, GLibC 2.24, GCC 6.3.0 160 170 178 179 
> 183 185 Satiated
> on the iteration*5*, base consumption 10 MB, extra consumption 15 MB
> (*150* %), liboscada.so 3.5 MB, ui_WebVision.so 0.72 MB
> Debian 8 amd64, GLibC 2.19, GCC 4.9.2 125.5   133 139 139 
> 139 139
> Satiated on the iteration*2*, base consumption 7.5 MB, extra consumption 6
> MB (*80* %), liboscada.so 3.8 MB, ui_WebVision.so 0.79 MB
> Debian 7 amd64, GLibC 2.13, GCC 4.7.2 101 108 111 112 
> 112 112 Satiated
> on the iteration*2*, base consumption 7 MB, extra consumption 4 MB (*57* %),
> liboscada.so 3.4 MB, ui_WebVision.so 0.85 MB
> Debian 10 i386, GLibC 2.28, GCC 8.3.0 151 158 162.5   166 
> 166 166
> Satiated on the iteration*3*, base consumption 7 MB, extra consumption 8 MB
> (*114* %), liboscada.so 3.7 MB, ui_WebVision.so 0.9 MB
> Debian 9 i386, GLibC 2.24, GCC 6.3.0  125 131 132 136 136 
> 139 Satiated
> on the iteration*5*, base consumption 6 MB, extra consumption 8 MB
> (*133* %), liboscada.so 3.7 MB, ui_WebVision.so 0.9 MB
> Debian 8 i386, GLibC 2.19, GCC 4.9.2  92.599  101.5   103 103.5   
> 103.5
>   Satiated on the iteration*2*, base consumption 6.5 MB, extra consumption
> 4.5 (*69* %), liboscada.so 3.6 MB, ui_WebVision.so 0.94 MB
> Debian 7 i386, GLibC 2.13, GCC 4.7.2  70  76  76  76  77  
> 77  Satiated on
> the iteration*2*, base consumption 6 MB, extra consumption 1 MB (*16* %),
> liboscada.so 3.6 MB, ui_WebVision.so 0.9 MB
> ALTLinux 6 i386, GLibC 2.11.3, GCC 4.5.4  69  74  75  75  
> 75  75 Satiated on
> the iteration*2*, base consumption 5 MB, extra consumption 1 MB (*20* %),
> liboscada.so 2.3 MB, ui_WebVision.so 0.9 MB
> 
> From the data we have the memory effectivity on AMD64 and I386 platform:
> 
> and the absolute initial size for the both platform:

This indeed really shows an increase in memory consumption with the GNU
libc and the GCC versions. Have you tried to see if it comes mostly from
the GLIBC or from GCC? For example you can try to build your application
with GCC 7 on Debian 10. You can try to build your application on Debian
9 and run it on Debian 10 provided you do not have incompatible
libraries. Also do you use exactly the same versions of other libraries
in your tests?

> I know about "the Memory Allocation Tunables" 
> (https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html)
> and have tried them but:
> - I have not got any effect from environments like to
> "GLIBC_TUNABLES=glibc.malloc.trim_threshold=128" on Debian 10

GLIBC_TUNABLES should work on Debian 10. Now depending on the workload
you might see more or less effects.

> - If the tunables real work, why their do not apply globally (on the system
> level) to return the memory effectivity to the level of the Debian 7 (GLibC
> 2.13)?

Because every workload behave 

Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator

2019-08-06 Thread Roman Savochenko

Package: libc6
Version: 2.19, 2.24, 2.28
Severity: normal

--- Please enter the report below this line. ---
Initial condition of the problem representing is a program in the single 
source code, built on-and for Debian 7, 8, 9, 10 with a result in the 
Live disks.


The program represents a web-interface of several pages, where here used 
only the first page.


In building the first page there used wide range of memory chunks: small 
objects of the C++ classes (~100 bytes), resources of the image files 
(~10 kbytes), GD memory blocks (~1 kbytes), so on


The live disks were started under VirtualBox 5.2, where the got data was 
measured by *top*.


The data measuring under VirtualBox performed into the next stages:
1. Start the program and set the initial state, fixing the memory 
allocation — *measuring the initial memory consumption value*

2. Perform the allocation-freeing iteration
2.1. Open the first Web-interface page from a Web-browser of the host system
2.2. Close the page on the Web-browser
2.3. Wait to close-freeing session of the first Web-interface page on 
the program side, 1 minute — *measuring the iteration **memory 
consumption value*

3. Return to the stage 2 and repeating 5 iterations

The stage 2.3 tested to real freeing all the allocated memory blocks 
both by the objects counters and by *valgrind*!


In the result we have next data:

Environment 	Initially, MB 	Iter. 1, MB 	Iter. 2, MB 	Iter. 3, MB 	Iter. 
4, MB 	Iter. 5, MB 	Resume
Debian 10 amd64, GLibC 2.28, GCC 8.3.0 	182 	191.5 	199 	206 	212 	212 
Satiated on the iteration*4*, base consumption 9.5 MB, extra consumption 
20 MB (*200* %), liboscada.so 3.5 MB, ui_WebVision.so 0.74 MB
Debian 9 amd64, GLibC 2.24, GCC 6.3.0 	160 	170 	178 	179 	183 	185 
Satiated on the iteration*5*, base consumption 10 MB, extra consumption 
15 MB (*150* %), liboscada.so 3.5 MB, ui_WebVision.so 0.72 MB
Debian 8 amd64, GLibC 2.19, GCC 4.9.2 	125.5 	133 	139 	139 	139 	139 
Satiated on the iteration*2*, base consumption 7.5 MB, extra consumption 
6 MB (*80* %), liboscada.so 3.8 MB, ui_WebVision.so 0.79 MB
Debian 7 amd64, GLibC 2.13, GCC 4.7.2 	101 	108 	111 	112 	112 	112 
Satiated on the iteration*2*, base consumption 7 MB, extra consumption 4 
MB (*57* %), liboscada.so 3.4 MB, ui_WebVision.so 0.85 MB
Debian 10 i386, GLibC 2.28, GCC 8.3.0 	151 	158 	162.5 	166 	166 	166 
Satiated on the iteration*3*, base consumption 7 MB, extra consumption 8 
MB (*114* %), liboscada.so 3.7 MB, ui_WebVision.so 0.9 MB
Debian 9 i386, GLibC 2.24, GCC 6.3.0 	125 	131 	132 	136 	136 	139 
Satiated on the iteration*5*, base consumption 6 MB, extra consumption 8 
MB (*133* %), liboscada.so 3.7 MB, ui_WebVision.so 0.9 MB
Debian 8 i386, GLibC 2.19, GCC 4.9.2 	92.5 	99 	101.5 	103 	103.5 	103.5 
	Satiated on the iteration*2*, base consumption 6.5 MB, extra 
consumption 4.5 (*69* %), liboscada.so 3.6 MB, ui_WebVision.so 0.94 MB
Debian 7 i386, GLibC 2.13, GCC 4.7.2 	70 	76 	76 	76 	77 	77 	Satiated 
on the iteration*2*, base consumption 6 MB, extra consumption 1 MB 
(*16* %), liboscada.so 3.6 MB, ui_WebVision.so 0.9 MB
ALTLinux 6 i386, GLibC 2.11.3, GCC 4.5.4 	69 	74 	75 	75 	75 	75 
Satiated on the iteration*2*, base consumption 5 MB, extra consumption 1 
MB (*20* %), liboscada.so 2.3 MB, ui_WebVision.so 0.9 MB


From the data we have the memory effectivity on AMD64 and I386 platform:

and the absolute initial size for the both platform:



I know about "the Memory Allocation Tunables" 
(https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html) 
and have tried them but:
- I have not got any effect from environments like to 
"GLIBC_TUNABLES=glibc.malloc.trim_threshold=128" on Debian 10
- If the tunables real work, why their do not apply globally (on the 
system level) to return the memory effectivity to the level of the 
Debian 7 (GLibC 2.13)?
- If the new memory allocator (into GLibC 2.28) is so good, how can I 
return its memory effectivity to the level of the Debian 7 (GLibC 2.13)?


The tested program and the analyse page provided on the page 
http://oscada.org/wiki/Modules/WebVision#Efficiency



--- System information. ---

Architecture:
Kernel: Any for i386, amd64

Debian Release: 8, 9, 10
500 stable-updates ftp.ua.debian.org
500 stable security.debian.org
500 stable ftp.ua.debian.org

--- Package information. ---
Package's Depends field is empty.

Package's Recommends field is empty.

Package's Suggests field is empty.