Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator
Hello, Florian 14.08.19 15:07, Florian Weimer пише: Is there a way to reproduce your results easily? Upstream, we're looking for workloads which are difficult to handle for glibc's malloc and its default settings, so that we hopefully can improve things eventually. This way of the ready builds of the application and LiveDisks is simplest one for me, than writing a test application with simulation such sort complex load, so you can already install the application, start and observer. I meant: Is there a reproduction recipe someone could use, without being familiar with the application? Sure, and I have wrote such one in the first email, without specifying the program. About this program, it is OpenSCADA, which packages you may get here http://oscada.org/en/main/download/ for the Work version and for Debian versions from 7 to 10. About installing, you may read this one http://oscada.org/wiki/Special:MyLanguage/Documents/How_to/Install but shortly, you need to install the package openscada-model-aglks, after connection a Debian repository of this program. $ wget http://ftp.oscada.org/Debian/10/openscada/openscada.list $ cp openscada.list /etc/apt/sources.list.d $ $ wget -O - http://ftp.oscada.org/Misc/pkgSignKey | sudo apt-key add - $ apt-get update; apt-get install openscada-model-aglks The package openscada-model-aglks is a ready configuration and data to start and work since it is a simulator itself. Next let's per the stages: 1. Start the program and set the initial state, fixing the memory allocation — measuring the initial memory consumption value > Just start the program from the desktop menu for the entry "Simulator "AGLKS" on the open SCADA system" or by the command: $ openscada_AGLKS > Wait for about one minute to fix the memory consumption > Open the page: http://oscada.org/wiki/images/4/42/WebVision_wvis_cfg.png , where you can control of the WEB-sessions opening and closing, so allocating and freeing the memory — such sort of the iterations. > Set the "Life time of the sessions" on the page to 1 minute instead 10, to decrease the waiting time > In a Web-browser open the page "http://localhost:10002/WebVision;, this is the initial memory consumption value. 2. Perform the allocation-freeing iteration 2.1. Open the first Web-interface page from a Web-browser of the host system > The first page is "http://localhost:10002/WebVision/prj_AGLKS; 2.2. Close the page on the Web-browser 2.3. Wait to close-freeing session of the first Web-interface page on the program side, 1 minute — measuring the iteration memory consumption value 3. Return to the stage 2 and repeating 5 iterations But I think, the problem related in linking the areas to the threads, and such sort of programs as OpenSCADA, in the Web-mode, recreate the threads which then rebind to different arenas, why we have such sort of memory leak into the arenas. And it seems is a conceptual problem of the arenas in GLibC. Regards, Roman
Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator
Hello, Aurelien and Florian 08.08.19 20:00, Aurelien Jarno wrote: Thanks you Florian, setting the environment MALLOC_ARENA_MAX=1 I have got the memory effectivity some better even than in Debian 7! <http://oscada.org/wiki/File:WebVision_MemEffectAMD64.png> Thanks for the feedback. I think we can therefore considered this bug as solved. Closing it. OK, if you think such sort Debian behaviour is good: - there impossible or hard now to different and detect where is the application's memory leak and the developers may always complain to GLibC and Debian. :) - there impossible now, for default, to use Debian into dynamic applications limited for memory, which run more than days, like to many embedded systems, PLC and so on. 09.08.19 14:53, Florian Weimer wrote: Thanks you Florian, setting the environment MALLOC_ARENA_MAX=1 I have got the memory effectivity some better even than in Debian 7! Is there a way to reproduce your results easily? Upstream, we're looking for workloads which are difficult to handle for glibc's malloc and its default settings, so that we hopefully can improve things eventually. This way of the ready builds of the application and LiveDisks is simplest one for me, than writing a test application with simulation such sort complex load, so you can already install the application, start and observer. About my after-measures — I have set the environment variable MALLOC_ARENA_MAX=1 for all my builds of the live disks <http://oscada.org/wiki/Special:MyLanguage/Documents/How_to/Live_disk> of the automation Linux distributive <http://oscada.org/wiki/Special:MyLanguage/Sub-projects/Automation_Linux_distributive>. 07.08.19 09:09, Roman Savochenko wrote: I have a real-task environment on Debian 9 which consume initially about 1.6 GB and after a pair days of such work it consume about 6GB! To demonstrate how this problem can be awful in real tasks, I have wrote the memory consumption tendency for the both default environment and under MALLOC_ARENA_MAX=1. Also I have wrote the CPU load, to demonstrate of this environment variable influence to the performance. So, the tendency of the memory consumption on a real big application at the default conditions and during two days is: <http://oscada.org/wiki/File:WebVision_MemoryDef.png> And the tendency of the memory consumption on the same real big application at the environment variable MALLOC_ARENA_MAX=1 and during two days is: <http://oscada.org/wiki/File:WebVision_MemoryArenas1.png> So, influence to the performance is slightly notating and counted about 5% (15 > 20%), but the last environment also was under higher user loading than the previous one. Regards, Roman
Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator
Hello, Carlos 07.08.19 16:54, Carlos O'Donell wrote: On Wed, Aug 7, 2019 at 2:12 AM Roman Savochenko mailto:romansavoche...@gmail.com>> wrote: So, we have got such regression, and I have to think about back-using Debian 7 on such sort dynamic environments and forget all new ones. :( The primary thing to determine is if this extra memory is due to application demand or not. Sure not and I have tested that by *valgrind*, and this process of fragmentation is really satiated after pointed number in the table . To determine that I usually use a set of malloc tracing utilities: https://pagure.io/glibc-malloc-trace-utils These let you capture the direct API calls and graph the application demand, which you can compare to the real usage. Then you can take your trace of malloc API calls, which represents your workload, and run it in the simulator with different tunable parameters to see if they make any difference or if the simulator reproduces your excess usage. If it does then you can use the workload and the simulator as your test case to provide to upstream glibc developers to look at the problem. Thanks, but we have just resolved this problem as a disadvantage of the memory arenas, setting which to the 1 completely removes this extra-consumption on such kind tasks. <http://oscada.org/wiki/File:WebVision_MemEffectAMD64.png> Regards, Roman
Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator
Hello, Florian 07.08.19 17:04, Florian Weimer wrote: * Roman Savochenko: Initial condition of the problem representing is a program in the single source code, built on-and for Debian 7, 8, 9, 10 with a result in the Live disks. I think glibc 2.13 as shipped by Debian was not built with --enable-experimental-malloc, so it doesn't use arenas. This can substantially decrease RSS usage compared to later versions. You can get similar behavior by setting the MALLOC_ARENA_MAX environment variable to 1 or 2. Thanks you Florian, setting the environment MALLOC_ARENA_MAX=1 I have got the memory effectivity some better even than in Debian 7! <http://oscada.org/wiki/File:WebVision_MemEffectAMD64.png> Debian 10 also adds a thread cache, which further increases RSS size. See the manual <https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html> for details how to change thread cache behavior. Thanks, this manual I have read from the problem start but not to the end. :) The thread cache has not now of significant influence but I will hold it in my mind. Regards, Roman
Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator
Hello, Aurelien Jarno 06.08.19 23:57, Aurelien Jarno wrote: The live disks were started under VirtualBox 5.2, where the got data was measured by *top*. Can you details more precisely on you measure the memory used? Do you just get the line corresponding to the process you want to monitor? Sure Which column do you take? "RES", sure liboscada.so 2.3 MB, ui_WebVision.so 0.9 MB From the data we have the memory effectivity on AMD64 and I386 platform: and the absolute initial size for the both platform: This indeed really shows an increase in memory consumption with the GNU libc and the GCC versions. Have you tried to see if it comes mostly from the GLIBC or from GCC? No, but I thought initially about GCC, and it version included to this table. After deep familiarizing the problem I saw the GCC of the different version build mostly equal in the size binaries and the memory allocator is a part of GLibC as the functions malloc(), realloc(), free(). For example you can try to build your application with GCC 7 on Debian 10. This I am going to try. You can try to build your application on Debian 9 and run it on Debian 10 provided you do not have incompatible libraries. Also do you use exactly the same versions of other libraries in your tests? I have used libraries of the corresponded distributive which does not influence to the relative and final extra memory consumption, at least. And that is not possible: I know about "the Memory Allocation Tunables" (https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html) and have tried them but: - I have not got any effect from environments like to "GLIBC_TUNABLES=glibc.malloc.trim_threshold=128" on Debian 10 GLIBC_TUNABLES should work on Debian 10. Now depending on the workload you might see more or less effects. Sure, it is near the measuring method error and far from the Debian 7 values. - If the tunables real work, why their do not apply globally (on the system level) to return the memory effectivity to the level of the Debian 7 (GLibC 2.13)? Because every workload behave differently, and also not everybody cares about the same. You seem to care about memory usage, some other care about performance. The idea is to get a balanced memory allocator which can be tuned. This is only a representative example, and in the real life this problem is far worse. I have a real-task environment on Debian 9 which consume initially about 1.6 GB and after a pair days of such work it consume about 6GB! In other hand, I also have an old environment on Debian 7 which consumes very small extra memory, really frees the consumed memory and works the same fast. - If the new memory allocator (into GLibC 2.28) is so good, how can I return its memory effectivity to the level of the Debian 7 (GLibC 2.13)? I have no idea about that, maybe playing with the other tunables. It is not possible, or if you show me some real working examples, I will try. It's also not impossible some of the increase is due to the security hardening that has been enabled in debian over time. So, we have got such regression, and I have to think about back-using Debian 7 on such sort dynamic environments and forget all new ones. :( Regards, Roman
Bug#934080: [libc6] Significant degradation in the memory effectivity of the memory allocator
Package: libc6 Version: 2.19, 2.24, 2.28 Severity: normal --- Please enter the report below this line. --- Initial condition of the problem representing is a program in the single source code, built on-and for Debian 7, 8, 9, 10 with a result in the Live disks. The program represents a web-interface of several pages, where here used only the first page. In building the first page there used wide range of memory chunks: small objects of the C++ classes (~100 bytes), resources of the image files (~10 kbytes), GD memory blocks (~1 kbytes), so on The live disks were started under VirtualBox 5.2, where the got data was measured by *top*. The data measuring under VirtualBox performed into the next stages: 1. Start the program and set the initial state, fixing the memory allocation — *measuring the initial memory consumption value* 2. Perform the allocation-freeing iteration 2.1. Open the first Web-interface page from a Web-browser of the host system 2.2. Close the page on the Web-browser 2.3. Wait to close-freeing session of the first Web-interface page on the program side, 1 minute — *measuring the iteration **memory consumption value* 3. Return to the stage 2 and repeating 5 iterations The stage 2.3 tested to real freeing all the allocated memory blocks both by the objects counters and by *valgrind*! In the result we have next data: Environment Initially, MB Iter. 1, MB Iter. 2, MB Iter. 3, MB Iter. 4, MB Iter. 5, MB Resume Debian 10 amd64, GLibC 2.28, GCC 8.3.0 182 191.5 199 206 212 212 Satiated on the iteration*4*, base consumption 9.5 MB, extra consumption 20 MB (*200* %), liboscada.so 3.5 MB, ui_WebVision.so 0.74 MB Debian 9 amd64, GLibC 2.24, GCC 6.3.0 160 170 178 179 183 185 Satiated on the iteration*5*, base consumption 10 MB, extra consumption 15 MB (*150* %), liboscada.so 3.5 MB, ui_WebVision.so 0.72 MB Debian 8 amd64, GLibC 2.19, GCC 4.9.2 125.5 133 139 139 139 139 Satiated on the iteration*2*, base consumption 7.5 MB, extra consumption 6 MB (*80* %), liboscada.so 3.8 MB, ui_WebVision.so 0.79 MB Debian 7 amd64, GLibC 2.13, GCC 4.7.2 101 108 111 112 112 112 Satiated on the iteration*2*, base consumption 7 MB, extra consumption 4 MB (*57* %), liboscada.so 3.4 MB, ui_WebVision.so 0.85 MB Debian 10 i386, GLibC 2.28, GCC 8.3.0 151 158 162.5 166 166 166 Satiated on the iteration*3*, base consumption 7 MB, extra consumption 8 MB (*114* %), liboscada.so 3.7 MB, ui_WebVision.so 0.9 MB Debian 9 i386, GLibC 2.24, GCC 6.3.0 125 131 132 136 136 139 Satiated on the iteration*5*, base consumption 6 MB, extra consumption 8 MB (*133* %), liboscada.so 3.7 MB, ui_WebVision.so 0.9 MB Debian 8 i386, GLibC 2.19, GCC 4.9.2 92.5 99 101.5 103 103.5 103.5 Satiated on the iteration*2*, base consumption 6.5 MB, extra consumption 4.5 (*69* %), liboscada.so 3.6 MB, ui_WebVision.so 0.94 MB Debian 7 i386, GLibC 2.13, GCC 4.7.2 70 76 76 76 77 77 Satiated on the iteration*2*, base consumption 6 MB, extra consumption 1 MB (*16* %), liboscada.so 3.6 MB, ui_WebVision.so 0.9 MB ALTLinux 6 i386, GLibC 2.11.3, GCC 4.5.4 69 74 75 75 75 75 Satiated on the iteration*2*, base consumption 5 MB, extra consumption 1 MB (*20* %), liboscada.so 2.3 MB, ui_WebVision.so 0.9 MB From the data we have the memory effectivity on AMD64 and I386 platform: and the absolute initial size for the both platform: I know about "the Memory Allocation Tunables" (https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html) and have tried them but: - I have not got any effect from environments like to "GLIBC_TUNABLES=glibc.malloc.trim_threshold=128" on Debian 10 - If the tunables real work, why their do not apply globally (on the system level) to return the memory effectivity to the level of the Debian 7 (GLibC 2.13)? - If the new memory allocator (into GLibC 2.28) is so good, how can I return its memory effectivity to the level of the Debian 7 (GLibC 2.13)? The tested program and the analyse page provided on the page http://oscada.org/wiki/Modules/WebVision#Efficiency --- System information. --- Architecture: Kernel: Any for i386, amd64 Debian Release: 8, 9, 10 500 stable-updates ftp.ua.debian.org 500 stable security.debian.org 500 stable ftp.ua.debian.org --- Package information. --- Package's Depends field is empty. Package's Recommends field is empty. Package's Suggests field is empty.
Bug#914999: [libc6] Locking problems into libc6
12.12.18 17:11, Roman Savochenko wrote: There are thousands of packages in different versions between Debian 8 and Debian 9. You have found it's not related to the kernel, but I fail to see how that shows it's a libc6 issue. For example when you have tried the kernel from Debian 9 in Debian 8, have you also tried with the rtl8192 firmware from Debian 9? I will compare the firmware, thanks. I have installed of equal package firmware-realtek 20161130-4 on Debian 9 and this problem is actual yet. So, I have found a way of fixing this problem: https://github.com/Mange/rtl8192eu-linux-driver/issues/46 sudo nano /etc/NetworkManager/NetworkManager.conf and add below 2 lines: [device] wifi.scan-rand-mac-address=no Regards, Roman
Bug#914999: [libc6] Locking problems into libc6
Hello, Aurelien On 12/30/18 7:49 PM, Aurelien Jarno wrote: On 2018-12-12 17:11, Roman Savochenko wrote: On 12/4/18 1:24 PM, Roman Savochenko wrote: On 11/29/18 9:13 PM, Aurelien Jarno wrote: 1. For my program, I was needed to create extra locking about the function getaddrinfo(), but that resolved the problem only for my calls but for the ... Vice versa, the first problem is actual one for GLibC since: * I have observed twice the difference, please see on the included screenshot. I indeed see two different IPs circled in red. Now I don't get what they are, if they should be different or not and how that relates to glibc. The lower IP is addr() used as an argument of the function getaddrinfo(): if(addr_[aOff] != '[') host = TSYS::strParse(addr_, 0, ":", ); else { aOff++; host = TSYS::strParse(addr_, 0, "]:", ); } //Get IPv6 port = TSYS::strParse(addr_, 0, ":", ); string aErr; sockFd = -1; for(int off = 0; (host_=TSYS::strParse(host,0,",",)).size(); ) { struct addrinfo hints, *res; memset(, 0, sizeof(hints)); hints.ai_socktype = (type == SOCK_TCP) ? SOCK_STREAM : SOCK_DGRAM; int error; if(logLen()) pushLogMess(TSYS::strMess(_("Resolving for '%s'"),host_.c_str())); MtxAlloc aRes(*SYS->commonLock("getaddrinfo"), true); if((error=getaddrinfo(host_.c_str(),(port.size()?port.c_str():"10005"),,))) throw TError(nodePath().c_str(), _("Error the address '%s': '%s (%d)'"), addr_.c_str(), gai_strerror(error), error); vector addrs; for(struct addrinfo *iAddr = res; iAddr != NULL; iAddr = iAddr->ai_next) { static struct sockaddr_storage ss; if(iAddr->ai_addrlen > sizeof(ss)) { aErr = _("sockaddr to large."); continue; } memcpy(, iAddr->ai_addr, iAddr->ai_addrlen); addrs.push_back(ss); } freeaddrinfo(res); aRes.unlock(); Where the top IP is the real one taken from the connection addrs[iA], the getaddrinfo() result: //Create socket if(type == SOCK_TCP) { if((sockFd=socketsockaddr*)[iA])->sa_family==AF_INET6)?PF_INET6:PF_INET,SOCK_STREAM,0)) == -1) throw TError(nodePath().c_str(), _("Error creating the %s socket: '%s (%d)'!"), "TCP", strerror(errno), errno); int vl = 1; setsockopt(sockFd, SOL_SOCKET, SO_REUSEADDR, , sizeof(int)); if(MSS()) { vl = MSS(); setsockopt(sockFd, IPPROTO_TCP, TCP_MAXSEG, , sizeof(int)); } } else if(type == SOCK_UDP) { if((sockFd=socketsockaddr*)[iA])->sa_family==AF_INET6)?PF_INET6:PF_INET,SOCK_DGRAM,0)) == -1) throw TError(nodePath().c_str(), _("Error creating the %s socket: '%s (%d)'!"), "UDP", strerror(errno), errno); } //Get the connected address if(((sockaddr*)[iA])->sa_family == AF_INET6) { char aBuf[INET6_ADDRSTRLEN]; getnameinfo((sockaddr*)[iA], sizeof(addrs[iA]), aBuf, sizeof(aBuf), 0, 0, NI_NUMERICHOST); connAddr = aBuf; } else connAddr = inet_ntoa(((sockaddr_in*)[iA])->sin_addr); Then, without the lock "MtxAlloc aRes(*SYS->commonLock("getaddrinfo"), true);" I have such replacing into and the difference from a different threaded parallel connection into the same code. And I have up to ten such parallel connections why I observes this problem! * Also I have seen once for very long locking into the function getaddrinfo()->poll() for some VPN (FortiClient in the case), see to the crash report, got after the program termination by SIGSEGV. poll() has nothing to do with locking, it just hang there waiting for an answer to a DNS request sent by the functions called through getaddrinfo(). According to the trace, the timeout is set to about 5 seconds. The others thread waiting for poll() are called from libglib-2.0 and from libxcb.so.1. Sometime, on FortiVPN, the time is forever one and I have more time to catch this problem sending the signal SIGSEGV and more, once I close the FortiVPN connection this program successfully finished also. I think the second poll() is a different case since it is a generic one function, As for the segmentation fault, it happens in pthread_cond_timedwait.S called directly from libQt5Core.so.5. Without more info, it's difficult to say if it's due to a bug in glibc or if the argument passed to this function are corrupted, for example because the data pointed by QMutex* are corrupted. Do you have anoth
Bug#914999: [libc6] Locking problems into libc6
On 11/29/18 9:13 PM, Aurelien Jarno wrote: 1. For my program, I was needed to create extra locking about the function getaddrinfo(), but that resolved the problem only for my calls but for the external libraries like to MySQL, MariaDB I yet have the crashes and it cannot be fixed at all. Can you give more details about the issue, the symptoms, possible crash backtrace, way to reproduce it. Without this details, there are very few chances to be able to fix the bug. Yes, I had there a crash, but it appeared next as a problem into libMariaDB (Bug#915515). Also I had early observed differences into real address passed to getaddrinfo() and taken from the real connection, what I have not observed now. So this item I remove from causes to GLibC problems while. 3. Impossible to connect to any WLan HotSpot (Ad-hoc), for me it is Nokia N9 Without more details, I also fail to see the relation with glibc here. But I do not known other common libraries or build environments (can be GCC for the kernel build) which can be related to such problems with WLan. All those issues fine fork on two Debian 8 installations with the libc6 2.19, where one on the same hardware as Debian 9. Other Debian 9 installation on the stationary PC also does not work for the second issue. Initially I heve counted it is kernel problems but I have installed this same Linux kernel version on Debian 8 and these all work there. There are thousands of packages in different versions between Debian 8 and Debian 9. You have found it's not related to the kernel, but I fail to see how that shows it's a libc6 issue. For example when you have tried the kernel from Debian 9 in Debian 8, have you also tried with the rtl8192 firmware from Debian 9? I will compare the firmware, thanks. Anyway if we want to know that the problem is related with glibc, please try to install glibc packages (libc*, possibly locales* and nscd if needed) from Debian 9 onto a working Debian 8 installation and see if the problem appears. I going to try that also, thanks. Without more information, there is no way for us to fix the bug, so we'll just have to close it. I understand, thanks. Regards, Roman
Bug#914999: [libc6] Locking problems into libc6
Package: libc6 Version: 2.24 Severity: critical --- Please enter the report below this line. --- I have got already more signs of a problem into locking access to functions like to getaddrinfo(), by the macro __libc_lock_lock, which reproduced in multithreaded environments! 1. For my program, I was needed to create extra locking about the function getaddrinfo(), but that resolved the problem only for my calls but for the external libraries like to MySQL, MariaDB I yet have the crashes and it cannot be fixed at all. 2. rtl8192eu by the driver rtl8xxxu, or the external one 8192eu.ko, does not connect to any network with that messages into dmesg: [ 137.936642] wlx000f0064f2d8: authenticate with 00:90:4c:08:00:0d [ 137.940680] wlx000f0064f2d8: send auth to 00:90:4c:08:00:0d (try 1/3) [ 138.145146] wlx000f0064f2d8: send auth to 00:90:4c:08:00:0d (try 2/3) [ 138.353198] wlx000f0064f2d8: send auth to 00:90:4c:08:00:0d (try 3/3) [ 138.557239] wlx000f0064f2d8: authentication with 00:90:4c:08:00:0d timed out 3. Impossible to connect to any WLan HotSpot (Ad-hoc), for me it is Nokia N9 All those issues fine fork on two Debian 8 installations with the libc6 2.19, where one on the same hardware as Debian 9. Other Debian 9 installation on the stationary PC also does not work for the second issue. Initially I heve counted it is kernel problems but I have installed this same Linux kernel version on Debian 8 and these all work there. --- System information. --- Architecture: Kernel: Linux 4.9.0-6-amd64 Debian Release: 9.5 500 stable-updates ftp.ua.debian.org 500 stable security.debian.org 500 stable linux.teamviewer.com 500 stable ftp.ua.debian.org 500 preview linux.teamviewer.com 100 stretch-backports ftp.ua.debian.org --- Package information. --- Package's Depends field is empty. Package's Recommends field is empty. Package's Suggests field is empty.