Re: [arch-general] High CPU on one core, but unable to find process responsible

2018-03-11 Thread Carsten Mattner via arch-general
On 3/12/18, David Rosenstrauch  wrote:
> My server's been exhibiting some very strange behavior lately.  Every
> couple of days I run into a situation where one core (core #0) on the
> quad core CPU starts continuously using around 34% of CPU, but I'm not
> able to see (using htop) any process that's responsible for using all
> that CPU.  Even when I tell htop to show me kernel threads too, I still
> am not able to see the offending process.  Every process remains under
> 1% CPU usage (except for occasional, small, short-lived spikes up) yet
> the CPU usage on that core remains permanently hovering at around 34%.
> The problem goes away when I reboot, but then comes back with a day or
> so.

My gut feeling is that one of the kernel worker threads hangs.
So that would be 25% overall and 100% of the affected core.
But you say there's no load to be found in the kernel threads,
which is odd.

Or if the server is accessible from the Internet, is it possible
it's rooted and someone's running a hidden process? To confirm
this isn't the case, cut off Internet access and let it run for
two days.

I don't think there are any official hidden processes that do not
show up in htop or top since that would make them seem like rootkits.
That means if the guilty process is really invisible, then it's
definitely unusual.

It's scary to consider a rootkit, but if that's the case, then
it's best to be aware as soon as possible. I hope this is not
case for you, wouldn't wish it on your worst enemy.

Another idea. Can you limit the cores to 1 or maybe two and see
if it becomes easier to pinpoint?

This might work in the booted system:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 0 > /sys/devices/system/cpu/cpu2/online
echo 0 > /sys/devices/system/cpu/cpu3/online

But on the kernel command line maxcpus=1 should work.


[arch-general] High CPU on one core, but unable to find process responsible

2018-03-11 Thread David Rosenstrauch
My server's been exhibiting some very strange behavior lately.  Every 
couple of days I run into a situation where one core (core #0) on the 
quad core CPU starts continuously using around 34% of CPU, but I'm not 
able to see (using htop) any process that's responsible for using all 
that CPU.  Even when I tell htop to show me kernel threads too, I still 
am not able to see the offending process.  Every process remains under 
1% CPU usage (except for occasional, small, short-lived spikes up) yet 
the CPU usage on that core remains permanently hovering at around 34%.  
The problem goes away when I reboot, but then comes back with a day or 
so.


I'm rather stumped as to how to fix this.  The server is a bit old, 
running an up-to-date installation of Arch on a Intel Core 2 Quad Q6600 
CPU.  Any suggestions anyone might have as to either what might be going 
on here, or how to go about debugging it would be greatly appreciated.


Thanks!

DR


Re: [arch-general] Update to 4.15.8 on dual quad-core box locked on ( 3/16) Install DKMS modules, need help resurecting

2018-03-11 Thread Carsten Mattner via arch-general
On 3/12/18, Celti Burroughs via arch-general  wrote:
> On Mon, 12 Mar 2018 01:04:14 +
> Carsten Mattner via arch-general  wrote:
>
>> Or I actually did post it to the list by accident.
>>
>> Please don't flame me for mention ZFS boot environments as a technique
>> available for FOSS servers.
>>
>> On 3/12/18, Carsten Mattner  wrote:
>> > On 3/11/18, David C. Rankin  wrote:
>> >
>> >> This was a nightmare. It's not a CD problem, it's a problem with
>> >> the system
>> >> seeing the CD Label and/or creating the /dev/disk/by-label
>> >> directory in time for the link to be created.
>> >
>> > Hi David,
>> >
>> > so in the end you were able to boot off usb, right?
>> >
>> > Also, the nightmare you had to work through can be avoided on
>> > servers where you run illumos or FreeBSD by way of ZFS boot
>> > environments (BE). Basically, it's like Windows style snapshots of
>> > core files you can boot, in case stuff goes south.
>> >
>> > I didn't post this to the list, since it mentions ZFS, and that
>> > alone might get some people pissed off.
>
> I don't see why anyone should get pissed off. I mean, ArchZFS[1] is
> definitely a thing that works reasonably well, and the wiki page[2]
> specifically mentions boot environments and beadm.

I'm happy to hear that. My rationale is based on past observations
of needlessly heated arguments and ZFS, due to its license splitting
the Linux community in half, appearing to be perfect fuel for such
a thread.

Thanks for the wiki links. Never used ZFS on Linux because I avoid
out of kernel patches. Maybe I will give it a try on Linux as well.


Re: [arch-general] Update to 4.15.8 on dual quad-core box locked on ( 3/16) Install DKMS modules, need help resurecting

2018-03-11 Thread Celti Burroughs via arch-general
On Mon, 12 Mar 2018 01:04:14 +
Carsten Mattner via arch-general  wrote:

> Or I actually did post it to the list by accident.
> 
> Please don't flame me for mention ZFS boot environments as a technique
> available for FOSS servers.
> 
> On 3/12/18, Carsten Mattner  wrote:
> > On 3/11/18, David C. Rankin  wrote:
> >  
> >> This was a nightmare. It's not a CD problem, it's a problem with
> >> the system
> >> seeing the CD Label and/or creating the /dev/disk/by-label
> >> directory in time for the link to be created.  
> >
> > Hi David,
> >
> > so in the end you were able to boot off usb, right?
> >
> > Also, the nightmare you had to work through can be avoided on
> > servers where you run illumos or FreeBSD by way of ZFS boot
> > environments (BE). Basically, it's like Windows style snapshots of
> > core files you can boot, in case stuff goes south.
> >
> > I didn't post this to the list, since it mentions ZFS, and that
> > alone might get some people pissed off.

I don't see why anyone should get pissed off. I mean, ArchZFS[1] is
definitely a thing that works reasonably well, and the wiki page[2]
specifically mentions boot environments and beadm.

~Celti

[1]: https://github.com/archzfs/archzfs
[2]: https://wiki.archlinux.org/index.php/Installing_Arch_Linux_on_ZFS


pgpMvSpLP5bNK.pgp
Description: OpenPGP digital signature


Re: [arch-general] Update to 4.15.8 on dual quad-core box locked on ( 3/16) Install DKMS modules, need help resurecting

2018-03-11 Thread Carsten Mattner via arch-general
Or I actually did post it to the list by accident.

Please don't flame me for mention ZFS boot environments as a technique
available for FOSS servers.

On 3/12/18, Carsten Mattner  wrote:
> On 3/11/18, David C. Rankin  wrote:
>
>> This was a nightmare. It's not a CD problem, it's a problem with the
>> system
>> seeing the CD Label and/or creating the /dev/disk/by-label directory in
>> time for the link to be created.
>
> Hi David,
>
> so in the end you were able to boot off usb, right?
>
> Also, the nightmare you had to work through can be avoided on servers
> where you run illumos or FreeBSD by way of ZFS boot environments (BE).
> Basically, it's like Windows style snapshots of core files you can
> boot, in case stuff goes south.
>
> I didn't post this to the list, since it mentions ZFS, and that alone
> might get some people pissed off.
>


Re: [arch-general] Update to 4.15.8 on dual quad-core box locked on ( 3/16) Install DKMS modules, need help resurecting

2018-03-11 Thread Carsten Mattner via arch-general
On 3/11/18, David C. Rankin  wrote:

> This was a nightmare. It's not a CD problem, it's a problem with the system
> seeing the CD Label and/or creating the /dev/disk/by-label directory in
> time for the link to be created.

Hi David,

so in the end you were able to boot off usb, right?

Also, the nightmare you had to work through can be avoided on servers
where you run illumos or FreeBSD by way of ZFS boot environments (BE).
Basically, it's like Windows style snapshots of core files you can
boot, in case stuff goes south.

I didn't post this to the list, since it mentions ZFS, and that alone
might get some people pissed off.


Re: [arch-general] Update to 4.15.8 on dual quad-core box locked on ( 3/16) Install DKMS modules, need help resurecting

2018-03-11 Thread David C. Rankin
On 03/11/2018 04:08 PM, Guus Snijders via arch-general wrote:
> Op zo 11 mrt. 2018 21:29 schreef David C. Rankin <
> drankina...@suddenlinkmail.com>:
> 
>> All,
>>
>>   I experienced a hard lockup during kernel update to 4.15.8 on a
>> Supermicro
>> Dual Opteron Quad-core box.
>>
> [cd problem]
> 
> Just to make sure; can you run a memtest on this machine? It's a bit of a
> long shot,  but hard lockups are suspicious. Especially since the CD also
> acts strangely.
> Though an overheating CPU could also cause these symptons.
> 
> 
> Mvg, Guus Snijders
> 

This was a nightmare. It's not a CD problem, it's a problem with the system
seeing the CD Label and/or creating the /dev/disk/by-label directory in time
for the link to be created.

I burned 3 different CD's from the .iso (validating the sha1sum). I burned 2
of them from the Arch server next to this box running the 4.15.8 kernel whose
update went fine. I burned per:

https://wiki.archlinux.org/index.php/Optical_disc_drive#Burning_an_ISO_image_to_CD.2C_DVD.2C_or_BD

cdrecord -v -sao dev=/dev/sr0 archlinux-2018.03.01-x86_64.iso

and I burned from K3b as well. No change. Same failure.

So even though this box cannot boot from a USB, I created a USB install media
and plugged it into a USB port so that maybe its ARCH_201803 drive label would
be seen. (I think the problem is the .iso CD lsblk Label isn't updated during
boot for some reason)

Low-and-behold... It worked!. I was able to boot to the Arch install prompt.
mdadm ran and assembled my arrays. I arch-chrooted to /mnt and then
reinstalled the kernel, kernel-lts and then had to reinstall the other 57
packages.

I don't know what the hiccup was, but for this box it was a death sentence. No
linker modules updated, only 2 out of 16 post install processes run. That
really leaves you in a bad way...

Fixed now.

So to recap, the key to solving the 30 second CD label not seen bug, was to
put a USB install media in a USB port before boot so the drive would be
activated and the LABEL available when it got to the find disk/by-label part
of the installer boot. (I hope I recall this trick 2 years from now when
something like this happens again...)

-- 
David C. Rankin, J.D.,P.E.


Re: [arch-general] Update to 4.15.8 on dual quad-core box locked on ( 3/16) Install DKMS modules, need help resurecting

2018-03-11 Thread Guus Snijders via arch-general
Op zo 11 mrt. 2018 21:29 schreef David C. Rankin <
drankina...@suddenlinkmail.com>:

> All,
>
>   I experienced a hard lockup during kernel update to 4.15.8 on a
> Supermicro
> Dual Opteron Quad-core box.
>
[cd problem]

Just to make sure; can you run a memtest on this machine? It's a bit of a
long shot,  but hard lockups are suspicious. Especially since the CD also
acts strangely.
Though an overheating CPU could also cause these symptons.


Mvg, Guus Snijders


Re: [arch-general] godep has been deprecated in favor of dep

2018-03-11 Thread Morten Linderud via arch-general
Yo!

Just as a conclusive mail and some brief information!

dep has been added to community. I was initially unsure as the plans for dep was
to be merged upstream to essentially become "go dep" at some point in time. The
roadmap stated that the next release, 1.11, would have been this window. That
was apparently a lie for reasons unclear.

Meanwhile vgo was released on the 20th of February as a dependency management
tool, and has already been merged into go and will be included in the 1.11
realease, for reasons unclear. However after some more reading dep will continue
be the recommended tool for dependency management as both godep and glide has
ceased to be developed. Thus I found it useful to be in community.

TL;DR: Go still has issues they should have solved 9 years ago.

https://research.swtch.com/vgo-intro
https://sdboyer.io/blog/vgo-and-dep/

-- 
Morten Linderud

PGP: 9C02FF419FECBE16


signature.asc
Description: PGP signature


[arch-general] Update to 4.15.8 on dual quad-core box locked on ( 3/16) Install DKMS modules, need help resurecting

2018-03-11 Thread David C. Rankin
All,

  I experienced a hard lockup during kernel update to 4.15.8 on a Supermicro
Dual Opteron Quad-core box. I've updated this box 50 times without issue, but
something caused a hardlock. The filesystems are on mdadm linux-raid 1 
partitions.

  The hardlock occurred after the packages were installed and it was in
postprocessing at:

( 3/16) Install DKMS modules

  Now on boot I receive:

Warning: /lib/modules/4.15.8-1-ARCH/modules.devname not found - ignoring
starting version 237
ERROR: devide `UUID=c7492ac0-e805...` not found. Skipping fsck.
mount: /new_root: can't find UUID UUID=c7492ac0-e805...
You are being dropped into an emergency shell.
sh: can't access tty; job control turned off
[rootfs ]#

(and the box hardlocks)

  So I downloaded the 201803 iso to try and fix the box. I have to boot from
CD, since this box does not boot from USB. So I burn the .iso to CD (making
sure the CD Label is `ARCH_201803`) and boot the box again in attempt to fix it:

  All goes well until...

:: Mounting '/dev/disk/by-label/ARCH_201803' to '/run/archiso/bootmnt'
Waiting 30 seconds for device /dev/disk/by-label/ARCH_201803 ...
ERROR: '/dev/disk/by-label/ARCH_201803' device did not show up after 30
seconds ...
  Falling back to interactive prompt
  You can try to fix the problem manually, log out when you are finished
sh: can't access tty; job control tuned off
[rootfs ]#

(thankfully this prompt is not hardlocked)

This is bizarre, I've created the iso, sha1sums are correct, CD label is
'ARCH_201803', but the iso won't boot. I've researched, but these solutions
don't solve the problem:

https://bbs.archlinux.org/viewtopic.php?id=195671
https://superuser.com/questions/519784/error-installing-arch-linux
https://bugs.launchpad.net/bugs/1318400

  Check /dev/disk from the recovery prompt, there is no "by-label" directory
under /dev/disk to begin with. Attempting to create 'by-label' and softlinking
/dev/sr0 to /dev/disk/by-label/ARCH_201803 does create a series of additional
errors I/O errors concluding with,

mount: /run/archiso/bootmnt: wrong fs type, bad option, bad superblock on
/dev/sr0, missing codepage or helper system 

So I'm snakebit and need help. I've never had the system lock during kernel
update before and it has left part of the system thinking it has 4.15.7 and
the rest thinking it is 4.15.8 (but the 4.15.8 update never finished)

(1) How do I go about recovering? 4.15.7 was A-OK. I'm not sure what part of
the install is still 4.15.7 and what's 4.15.8. 59 packages were updated,
including the kernel and lts-kernel, but the initramfs was never regenerated
due to the failure at the 'Install DKMS modules' phase. If I can get the
ARCH_201803 install media to boot properly -- what next?

(2) How do I get around the ERROR: '/dev/disk/by-label/ARCH_201803' device did
not show up after 30 seconds problem? The disk label is correct, it's just not
being seen and mounted by the installer to /run/archiso/bootmnt

Any help greatly appreciated.


-- 
David C. Rankin, J.D.,P.E.