Re: Experiencing issue where CPU is pinned at 100% - more info
Had this been filed as a bug? From: Administrator da...@gamehouse.commailto:da...@gamehouse.com Date: Friday, November 22, 2013 at 2:29 PM To: users@subversion.apache.orgmailto:users@subversion.apache.org users@subversion.apache.orgmailto:users@subversion.apache.org Subject: Experiencing issue where CPU is pinned at 100% - more info Config: OS: Windows 2012 Server 64bit 1TB SSD 8GiB RAM Software version4.0.2-3732.117 Subversion version 1.8.3-3732.117 Running since 11/21/2013 14:06:28 PST Repo health Total repositories: 2 - Go to the Repositories tab to create or discover repositories. Throughput on primary interface 0.0 B/s IN (over about 5 minutes); 0.0 B/s OUT (over about 5 minutes) Disk Usage as of 11/22/2013 11:10:31 PST Used space on root volume 565.43 GB Used space by repositories 138.65 GB Free space on repository volume 328.81 GB Authenticating with LDAP AD ~45 active user We migrated Two days ago. I dumped the repo to a new physical machine and imported it through the Web Interface. Every day, at some point, the repository becomes inaccessible. I RDC to the box and Apache is using 100% of the CPU. The entire machine is performing ridiculously slow. A restart of the Service restores CPU to normal and everyone carries on until the next occurrence. I parsed the error log for yesterday counting occurrences of all AH errors: occurrences: DEBUG Code 545112 AH01626: 206385 AH01691: 206377 AH01697: 139630 AH01384: 24 AH00361: 4 AH01316: 2 AH01320: 2 AH01318: 2 AH00408: 2 AH00403: 2 AH00348: 2 AH00334: 1 AH01284: 1 AH00456: 1 AH00455: 1 AH00453: 1 AH00431: 1 AH00422: 1 AH00418: 1 AH00411: 1 AH00407: 1 AH00402: 1 AH00391: 1 AH00359: 1 AH00357: 1 AH00354: 1 AH00352: 1 AH00096: 1 AH00094: I looked at the error log unto the point of the restart. I could not see anything obvious. I also look in the event viewer and see nothing. Dave Kelsey
Re: Experiencing issue where CPU is pinned at 100% - more info
On 11/22/13 2:29 PM, Dave Kelsey wrote: Software version 4.0.2-3732.117 What is this a software version of? You didn't mention what version of Apache httpd you're using. Though I'm guessing 2.4.x given the log message codes you gave below. It's possible this is actually a httpd issue and not a svn issue. I parsed the error log for yesterday counting occurrences of all AH errors: occurrences: DEBUG Code 545112 AH01626: Unimportant, just logging the result of the authorization hook in mod_authz_core.c 206385 AH01691: Client didn't send a password log message (mod_authnz_ldap.c). Probably normal since client doesn't authenticate until it's asked for authentication. 206377 AH01697: access granted in mod_authnz_ldap.c 139630 AH01384: mod_deflate.c telling how much it compressed things. 24 AH00361: From mpm/winnt/child.c (no idea what this means) /* Shutdown the worker threads * Post worker threads blocked on the ThreadDispatch IOCompletion port Could be normal I'm not super knowledgeable about the winnt mpm. 4 AH01316: Some sort of message LDAP caching modules/ldap/util_ldap.c 2 AH01320: LDAP SSL support unavailable modules/ldap/until_ldap.c (this looks suspicious to me) 2 AH01318: logging the LDAP SDK used. modules/ldap/util_ldap.c 2 AH00408: Looks like a normal operation of winnt mpm with duplicating a socket. mpm/winnt/mpm_winnt.c 2 AH00403: Looks like a normal operation of winnt mpm, waiting for data on a socket. mpm/winnt/mpm_winnt.c 2 AH00348: Child: Accept thread exiting. Could be just part of a normal shutdown. mpm/winnt/child.c 2 AH00334: Child: Listening on an address with a filter. Normal operation mpm/winnt/child.c 1 AH01284: ldap timed out modules/ldap/util_ldap.c 1 AH00456: logging date and time the server was built. mpm/winnt/mpm_winnt.c 1 AH00455: startup finished (resuming normal operation message) mpm/winnt/mpm_winnt.c 1 AH00453: Child process startup, normal operation mpm/winnt/mpm_winnt.c 1 AH00431: Parent forcing child to terminate, part of shutdown. mpm/winnt/mpm_winnt.c 1 AH00422: shutdown signal received, part of shutdown mpm/winnt/mpm_winnt.c 1 AH00418: Created child process, normal operation mpm/winnt/mpm_winnt.c 1 AH00411: More normal operation of the socket duplication mpm/winnt/mpm_winnt.c 1 AH00407: Child side of the previous message, child got the duplicated sockets. mpm/winnt/mpm_winnt.c 1 AH00402: Sent scoreboard handle to child, normal operation mpm/winnt/mpm_winnt.c 1 AH00391: Child got the soreboard from the previous message mpm/winnt/mpm_winnt.c 1 AH00359: Child released start mutex, normal operation mpm/winnt/mpm_winnt.c 1 AH00357: Child got an exit signal, part of shutdown mpm/winnt/child.c 1 AH00354: Creation of worker threads in child, normal operation mpm/winnt/child.c 1 AH00352: Child got the start mutix, normal operation mpm/winnt/child.c 1 AH00096: removed pid file, part of shutdown log.c 1 AH00094: logging of startup command, normal operation log.c I looked at the error log unto the point of the restart. I could not see anything obvious. I also look in the event viewer and see nothing. Based on the above it looks to me like you're having some sort of issue with your LDAP connectivity. But it's almost impossible to diagnose from the information provided. I suggest attaching a debugger and seeing what it's doing when it's spinning at 100% CPU. WinDbg and the Logexts.dll will let you log system calls: http://msdn.microsoft.com/en-us/library/windows/hardware/ff560170%28v=vs.85%29.aspx On 11/26/13 3:35 PM, Dave Kelsey wrote: Had this been filed as a bug? No and at this point I don't see a clear cause. May not even be a Subversion problem. So I wouldn't open an issue yet.
Re: Experiencing issue where CPU is pinned at 100% - more info
Ben, Thanks for responding. I'll follow up with the debug route. I used a couple of tools to query our AD and had no trouble using the settings from the leap config. We only have 96 users. Thanks for your help. Dave Sent from my iPhone On Nov 26, 2013, at 5:52 PM, Ben Reser b...@reser.org wrote: On 11/22/13 2:29 PM, Dave Kelsey wrote: Software version4.0.2-3732.117 What is this a software version of? You didn't mention what version of Apache httpd you're using. Though I'm guessing 2.4.x given the log message codes you gave below. It's possible this is actually a httpd issue and not a svn issue. I parsed the error log for yesterday counting occurrences of all AH errors: occurrences: DEBUG Code 545112 AH01626: Unimportant, just logging the result of the authorization hook in mod_authz_core.c 206385 AH01691: Client didn't send a password log message (mod_authnz_ldap.c). Probably normal since client doesn't authenticate until it's asked for authentication. 206377 AH01697: access granted in mod_authnz_ldap.c 139630 AH01384: mod_deflate.c telling how much it compressed things. 24 AH00361: From mpm/winnt/child.c (no idea what this means) /* Shutdown the worker threads * Post worker threads blocked on the ThreadDispatch IOCompletion port Could be normal I'm not super knowledgeable about the winnt mpm. 4 AH01316: Some sort of message LDAP caching modules/ldap/util_ldap.c 2 AH01320: LDAP SSL support unavailable modules/ldap/until_ldap.c (this looks suspicious to me) 2 AH01318: logging the LDAP SDK used. modules/ldap/util_ldap.c 2 AH00408: Looks like a normal operation of winnt mpm with duplicating a socket. mpm/winnt/mpm_winnt.c 2 AH00403: Looks like a normal operation of winnt mpm, waiting for data on a socket. mpm/winnt/mpm_winnt.c 2 AH00348: Child: Accept thread exiting. Could be just part of a normal shutdown. mpm/winnt/child.c 2 AH00334: Child: Listening on an address with a filter. Normal operation mpm/winnt/child.c 1 AH01284: ldap timed out modules/ldap/util_ldap.c 1 AH00456: logging date and time the server was built. mpm/winnt/mpm_winnt.c 1 AH00455: startup finished (resuming normal operation message) mpm/winnt/mpm_winnt.c 1 AH00453: Child process startup, normal operation mpm/winnt/mpm_winnt.c 1 AH00431: Parent forcing child to terminate, part of shutdown. mpm/winnt/mpm_winnt.c 1 AH00422: shutdown signal received, part of shutdown mpm/winnt/mpm_winnt.c 1 AH00418: Created child process, normal operation mpm/winnt/mpm_winnt.c 1 AH00411: More normal operation of the socket duplication mpm/winnt/mpm_winnt.c 1 AH00407: Child side of the previous message, child got the duplicated sockets. mpm/winnt/mpm_winnt.c 1 AH00402: Sent scoreboard handle to child, normal operation mpm/winnt/mpm_winnt.c 1 AH00391: Child got the soreboard from the previous message mpm/winnt/mpm_winnt.c 1 AH00359: Child released start mutex, normal operation mpm/winnt/mpm_winnt.c 1 AH00357: Child got an exit signal, part of shutdown mpm/winnt/child.c 1 AH00354: Creation of worker threads in child, normal operation mpm/winnt/child.c 1 AH00352: Child got the start mutix, normal operation mpm/winnt/child.c 1 AH00096: removed pid file, part of shutdown log.c 1 AH00094: logging of startup command, normal operation log.c I looked at the error log unto the point of the restart. I could not see anything obvious. I also look in the event viewer and see nothing. Based on the above it looks to me like you're having some sort of issue with your LDAP connectivity. But it's almost impossible to diagnose from the information provided. I suggest attaching a debugger and seeing what it's doing when it's spinning at 100% CPU. WinDbg and the Logexts.dll will let you log system calls: http://msdn.microsoft.com/en-us/library/windows/hardware/ff560170%28v=vs.85%29.aspx On 11/26/13 3:35 PM, Dave Kelsey wrote: Had this been filed as a bug? No and at this point I don't see a clear cause. May not even be a Subversion problem. So I wouldn't open an issue yet.
Experiencing issue where CPU is pinned at 100% - more info
Config: OS: Windows 2012 Server 64bit 1TB SSD 8GiB RAM Software version4.0.2-3732.117 Subversion version 1.8.3-3732.117 Running since 11/21/2013 14:06:28 PST Repo health Total repositories: 2 - Go to the Repositories tab to create or discover repositories. Throughput on primary interface 0.0 B/s IN (over about 5 minutes); 0.0 B/s OUT (over about 5 minutes) Disk Usage as of 11/22/2013 11:10:31 PST Used space on root volume 565.43 GB Used space by repositories 138.65 GB Free space on repository volume 328.81 GB Authenticating with LDAP AD ~45 active user We migrated Two days ago. I dumped the repo to a new physical machine and imported it through the Web Interface. Every day, at some point, the repository becomes inaccessible. I RDC to the box and Apache is using 100% of the CPU. The entire machine is performing ridiculously slow. A restart of the Service restores CPU to normal and everyone carries on until the next occurrence. I parsed the error log for yesterday counting occurrences of all AH errors: occurrences: DEBUG Code 545112 AH01626: 206385 AH01691: 206377 AH01697: 139630 AH01384: 24 AH00361: 4 AH01316: 2 AH01320: 2 AH01318: 2 AH00408: 2 AH00403: 2 AH00348: 2 AH00334: 1 AH01284: 1 AH00456: 1 AH00455: 1 AH00453: 1 AH00431: 1 AH00422: 1 AH00418: 1 AH00411: 1 AH00407: 1 AH00402: 1 AH00391: 1 AH00359: 1 AH00357: 1 AH00354: 1 AH00352: 1 AH00096: 1 AH00094: I looked at the error log unto the point of the restart. I could not see anything obvious. I also look in the event viewer and see nothing. Dave Kelsey