Re: [CentOS] another bizarre thing...
On Thu, Aug 08, 2019 at 05:06:06PM +, Young, Gregory wrote: > Is this on both EL6 and EL7? If only EL7, it could be control groups causing > the issue. The idea of cgroups is to prevent zombie processes, but if you > need your program to spawn another process then restart itself while the > other process continues to run, you need to launch it in a different control > group, or the shutdown of the parent process will also kill the child. In my > case, we have an upgrade script which needs to get called, then shut down the > calling process in order to upgrade it. For example: > > # Clear any errors in the upgrade control group. > /bin/systemctl reset-failed upgrade-trigger) > > # Launch the upgrader in its own control group. > /bin/systemd-run --unit=upgrade-trigger --slice=upgrade-trigger /bin/bash > /opt/myapp/Upgrade.sh "$1" "$2" > > > If we don't do this, the upgrade fails as the upgrader get's terminated when > the parent application is shut down. > well, we aren't INTENTINALLY using control groups. do we get put into one by the very act of launching a program w hich then creates threads, and they then all coexist until they're told to stop? I think it's not the scenario you describe, the main program launches from an init script, does some sanity checks, loads some config files, then spawns the number of threads defined by its configuration. then all the threads, including the main prog, hang around doing stuff until they're told to stop, which happens all at once for all of them. On a good day, anyway. what is happening now is they will all run fine for some time (anhour or twelve) then they all receive a SIGKILL. Accordiing to a systemtap script I found online, it thinks the program is killing itself, but as the guy who wrote it, I don't think so. the script can be seen below in earlier mail. As for if it also fails on C6, I don't know. I've asked our support team to see if they have a C6/EL6 customer who will let them install the latest version for 6 and see what happens, but so far, no joy. Fred > Subject: Re: [CentOS] another bizarre thing... > > On Mon, Aug 05, 2019 at 08:57:45PM -0400, Fred Smith wrote: > > Hi all! > > > > I'm stuck on something really bizarre that is happening to a product I > > "own" at work. It's a C program, built on CentOS, runs on CentOs or > > RHEL, has been in circulation since the early 00's, is in use at > > hundreds of sites. > > > > recently, at multiple customer sites it has started just going away. > > no core file (yes, ulimit is configured), nothing in any of its > > (several) log files. it's just gone. > > > > running it under strace until it dies reveals that every thread has > > been given a SIGKILL. > > > > How does one figure out who deliverd a SIGKILL? For other, non-fatal, > > signals it is possible to glean the PID of the sending process in a > > signal handler, but obviously you can't do that for SIGKILL because > > the app doesn't survive the signal. > > > > I'm grasping at straws here, and am open to almost any kind of > > suggestion that can be followed-up (as compared to "beats me" which is > > where I am now). > > OK, more information. > > Found a recipe to cause systemtap to emit a line of text identifying the > sender of the SIGKILL. > > probe signal.send { > if (sig_name == "SIGKILL") > printf("%s was sent to %s (pid:%d) by %s uid:%d\n", >sig_name, pid_name, sig_pid, execname(), uid()) > > unfortunately, it says the program is killing itself: > > SIGKILL was sent to myprog (pid:12269) by myprog uid:1000 > > So,... now I'm wondering how one figures that out. nowhere in my source code > does it explicitly raise any signal, much less SIGKILL. > So there must be some underlying library or system call or something doing it. > > -- > Fred Smith -- fre...@fcshome.stoneham.ma.us - >I can do all things through Christ > who strengthens me. > -- Philippians 4:13 > --- > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos > ___ > CentOS mailing list > CentOS@centos.org > https://lists.centos.org/mailman/listinfo/centos -- Fred Smith -- fre...@fcshome.stoneham.ma.us - "And he will be called Wonderful Counselor, Mighty God, Everlasting Father, Prince of Peace. Of the increase of his government there will be no end. He will reign on David's throne and over his kingdom, establishing and upholding it with justice and righteousness from that time on and forever." --- Isaiah 9:7 (niv) -- ___ CentOS mailing list CentOS@centos.org
Re: [CentOS] C7 Kernel module compilation
On Wed, Aug 7, 2019 at 11:00 PM Alessandro Baggi wrote: > > Il 07/08/19 20:15, Akemi Yagi ha scritto: > > On Wed, Aug 7, 2019 at 9:00 AM Alessandro Baggi > > wrote: > >> > >> Il 07/08/19 01:02, Phil Perry ha scritto: > >>> On 06/08/2019 14:45, Alessandro Baggi wrote: > > > >>> Please post the actual error message in dmesg or /var/log/messages. > >>> > >>> It's likely that the kernel is just grumbling that the module is not > >>> signed (missing key), but it's just noise unless you're using > >>> SecureBoot. Posting the actual message in full will help determine if > >>> that is the case. > >>> > >>> Thanks > >>> > >>> Phil > >> > >> Hi, thank you for your reply, > >> I solve adding in "General Setup" values of current kernel on > >> localversion option and adding Module.symver from > >> /boot/symver-version.gz to module directory. > >> > >> Now I get another problem compiling the third party module (i2c-nct6775): > >> > >> "CONFIG_RETPOLINE=y but not supported by the compiler. Compiler update > >> recomended. Stop." > >> > >> I tried using scl gcc7 and 8 but get the same issue. > >> > >> I checked that retpoline is related to Spectre but checking on centos with: > >> > >> cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 > >> > >> I get: > >> > >> Mitigation: IBRS (kernel), IBPB > >> > >> and RETPOLINE seems disabled (I'm wrong?). > >> > >> I ridden in a blog post that I can disable this check commenting out > >> some lines starting from N. 166 of arch/Makefile but I don't think this > >> is the best approach. > >> > >> At this point I can't understand what means the previous error and why I > >> get this error when compiling i2c-nct6775. > >> > >> Can someone point me in the right direction? > > > > Please post the output from: > > > > rpm -qa kernel\* | sort > > > > and > > > > uname -r > > > > Akemi > > kernel-3.10.0-957.12.1.el7.x86_64 > kernel-3.10.0-957.12.2.el7.x86_64 > kernel-3.10.0-957.21.3.el7.x86_64 > kernel-3.10.0-957.27.2.el7.x86_64 > kernel-3.10.0-957.el7.x86_64 > kernel-devel-3.10.0-957.21.3.el7.x86_64 > kernel-devel-3.10.0-957.27.2.el7.x86_64 > kernel-headers-3.10.0-957.27.2.el7.x86_64 > kernel-ml-5.1.0-1.el7.elrepo.x86_64 > kernel-ml-devel-5.1.0-1.el7.elrepo.x86_64 > kernel-tools-3.10.0-957.27.2.el7.x86_64 > kernel-tools-libs-3.10.0-957.27.2.el7.x86_64 > kernel-tools-libs-devel-3.10.0-957.27.2.el7.x86_64 > > > 3.10.0-957.27.2.el7.x86_64 So, you tried to build the i2c-nct6775 module against kernel-3.10.0-957.27.2.el7 under the running kernel 3.10.0-957.27.2 and you have a matching version of kernel-devel. Then I don't quite understand why you get the "not supported by the compiler" error... Akemi ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] another bizarre thing...
Is this on both EL6 and EL7? If only EL7, it could be control groups causing the issue. The idea of cgroups is to prevent zombie processes, but if you need your program to spawn another process then restart itself while the other process continues to run, you need to launch it in a different control group, or the shutdown of the parent process will also kill the child. In my case, we have an upgrade script which needs to get called, then shut down the calling process in order to upgrade it. For example: # Clear any errors in the upgrade control group. /bin/systemctl reset-failed upgrade-trigger # Launch the upgrader in its own control group. /bin/systemd-run --unit=upgrade-trigger --slice=upgrade-trigger /bin/bash /opt/myapp/Upgrade.sh "$1" "$2" If we don't do this, the upgrade fails as the upgrader get's terminated when the parent application is shut down. Gregory Young -Original Message- From: CentOS On Behalf Of Fred Smith Sent: August 7, 2019 1:39 PM To: centos@centos.org Subject: Re: [CentOS] another bizarre thing... On Mon, Aug 05, 2019 at 08:57:45PM -0400, Fred Smith wrote: > Hi all! > > I'm stuck on something really bizarre that is happening to a product I > "own" at work. It's a C program, built on CentOS, runs on CentOs or > RHEL, has been in circulation since the early 00's, is in use at > hundreds of sites. > > recently, at multiple customer sites it has started just going away. > no core file (yes, ulimit is configured), nothing in any of its > (several) log files. it's just gone. > > running it under strace until it dies reveals that every thread has > been given a SIGKILL. > > How does one figure out who deliverd a SIGKILL? For other, non-fatal, > signals it is possible to glean the PID of the sending process in a > signal handler, but obviously you can't do that for SIGKILL because > the app doesn't survive the signal. > > I'm grasping at straws here, and am open to almost any kind of > suggestion that can be followed-up (as compared to "beats me" which is > where I am now). OK, more information. Found a recipe to cause systemtap to emit a line of text identifying the sender of the SIGKILL. probe signal.send { if (sig_name == "SIGKILL") printf("%s was sent to %s (pid:%d) by %s uid:%d\n", sig_name, pid_name, sig_pid, execname(), uid()) unfortunately, it says the program is killing itself: SIGKILL was sent to myprog (pid:12269) by myprog uid:1000 So,... now I'm wondering how one figures that out. nowhere in my source code does it explicitly raise any signal, much less SIGKILL. So there must be some underlying library or system call or something doing it. -- Fred Smith -- fre...@fcshome.stoneham.ma.us - I can do all things through Christ who strengthens me. -- Philippians 4:13 --- ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
[CentOS] Two questions
You guys are awesome working on CentOS 8! Where might a get a list of the packages in CentOS 8 ? Will there be a "in place" upgrade from 7 to 8? Thanks, Jerry ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] C7 Kernel module compilation
Il 07/08/19 20:15, Akemi Yagi ha scritto: On Wed, Aug 7, 2019 at 9:00 AM Alessandro Baggi wrote: Il 07/08/19 01:02, Phil Perry ha scritto: On 06/08/2019 14:45, Alessandro Baggi wrote: Please post the actual error message in dmesg or /var/log/messages. It's likely that the kernel is just grumbling that the module is not signed (missing key), but it's just noise unless you're using SecureBoot. Posting the actual message in full will help determine if that is the case. Thanks Phil Hi, thank you for your reply, I solve adding in "General Setup" values of current kernel on localversion option and adding Module.symver from /boot/symver-version.gz to module directory. Now I get another problem compiling the third party module (i2c-nct6775): "CONFIG_RETPOLINE=y but not supported by the compiler. Compiler update recomended. Stop." I tried using scl gcc7 and 8 but get the same issue. I checked that retpoline is related to Spectre but checking on centos with: cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 I get: Mitigation: IBRS (kernel), IBPB and RETPOLINE seems disabled (I'm wrong?). I ridden in a blog post that I can disable this check commenting out some lines starting from N. 166 of arch/Makefile but I don't think this is the best approach. At this point I can't understand what means the previous error and why I get this error when compiling i2c-nct6775. Can someone point me in the right direction? Please post the output from: rpm -qa kernel\* | sort and uname -r Akemi ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos kernel-3.10.0-957.12.1.el7.x86_64 kernel-3.10.0-957.12.2.el7.x86_64 kernel-3.10.0-957.21.3.el7.x86_64 kernel-3.10.0-957.27.2.el7.x86_64 kernel-3.10.0-957.el7.x86_64 kernel-devel-3.10.0-957.21.3.el7.x86_64 kernel-devel-3.10.0-957.27.2.el7.x86_64 kernel-headers-3.10.0-957.27.2.el7.x86_64 kernel-ml-5.1.0-1.el7.elrepo.x86_64 kernel-ml-devel-5.1.0-1.el7.elrepo.x86_64 kernel-tools-3.10.0-957.27.2.el7.x86_64 kernel-tools-libs-3.10.0-957.27.2.el7.x86_64 kernel-tools-libs-devel-3.10.0-957.27.2.el7.x86_64 3.10.0-957.27.2.el7.x86_64 ___ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos