Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-14 Thread Tsukasa Mcp_Reznor
Have you checked the power supply?

I don't use a diskless setup but last year (nah, maybe many years ago) I
had this strange resume problem after suspend. As in, I'd wake the
machine and it'd sit there with a blinking text cursor in text mode,
quite stuck. I am pretty sure I posted about it on the list here.

It turned out that when my machine was running the power supply was
fine. However, when I suspended it, the 5V rail would bleed voltage. So,
I discovered if I resumed within, say, 5 minutes after suspending my
machine it would wake normally. After that though, I'd get the blinking
cursor and it would hang resuming.

I confirmed that the 5V rail was bleeding voltage when in suspend with
my voltmeter. It turned out to be bad capacitors in the power supply.

Just a suggestion...

Dan


I appreciate the tip, if I boot off a hard drive on my main desktop it does 
indeed sleep/resume just fine, and it was the source of every file that got 
sent to the network when I started converting to diskless,  maybe I'll throw in 
small livedvd install and check again.

If it helps, when I'm in LXDE and have just a terminal open with top running, 
when the screen comes back on, top will update just ONCE before freezing, I can 
move the mouse cursor, num lock toggles, I can drag the terminal window around, 
if I try to switch vt2 or anything else like load a previously uncached menu 
from the taskbar then it never loads or switches.  So it's definately (to my 
eyes at least) I problem with the nfs connection, I don't believe the NIC is 
powering down as I turned on wake on lan, but I'll test and make sure tonight,  
and aside from blacklisting kernel modules I have yet to find a way to tweak 
the resume/suspend functions but I'm still looking for more information when I 
have free time.



Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-14 Thread Daniel Frey
On 12/10/18 7:03 PM, Tsukasa Mcp_Reznor wrote:> Has anyone managed to 
get suspend/resume to work on diskless machines using NFS as the root?

>
> Suspend works like normal, but resume hard locks, can't seem to get 
any error's or anything as it's not sending to any log files naturally.

>

On 12/13/18 1:08 PM, Tsukasa Mcp_Reznor wrote:
If I manually suspend for up to say 10 seconds, they resume just fine.  



Have you checked the power supply?

I don't use a diskless setup but last year (nah, maybe many years ago) I 
had this strange resume problem after suspend. As in, I'd wake the 
machine and it'd sit there with a blinking text cursor in text mode, 
quite stuck. I am pretty sure I posted about it on the list here.


It turned out that when my machine was running the power supply was 
fine. However, when I suspended it, the 5V rail would bleed voltage. So, 
I discovered if I resumed within, say, 5 minutes after suspending my 
machine it would wake normally. After that though, I'd get the blinking 
cursor and it would hang resuming.


I confirmed that the 5V rail was bleeding voltage when in suspend with 
my voltmeter. It turned out to be bad capacitors in the power supply.


Just a suggestion...

Dan



Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-13 Thread Tsukasa Mcp_Reznor

From: J. Roeleveld 
Sent: Thursday, December 13, 2018 4:03 PM
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Root on NFS Suspend/Resume support

On December 11, 2018 10:59:47 PM UTC, Tsukasa Mcp_Reznor 
 wrote:

_
If you want to resume from NFS, you will need an initramfs that correctly 
passes the swap device for resuming.
I would try the same method as resuming from encrypted swap.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



I appreciate the response, I'm not trying to use hibernate but rather suspend 
to ram.  I don't use swap over NFS, the machines that do have hard drives 
installed use them for local swap and cachefilesd (which is amazingly 
performant)

In the past when I've tried to use an initramfs, it's lead to boot hangs that I 
haven't quite figured out the root cause for,  I was trying to use genkernel to 
build them, maybe I'll give dracut a shot and see if that fixes the problem, 
you could very well be on to something.

I believe "suspend to ram" might switch off the network (and kill a NFS 
connection in the process). This might be the cause of the issue.
Do the nodes have enough memory to load the filesystem into RAM and run from 
there? (Like sysresccd can do)
If yes, that might allow this to work.

--
Joost
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Probably not enough ram, the lowest machine has 4 gigs, as an update I 
installed and tried out dracut, that didn't make any difference but each system 
booted fine an initrd which is a change for sure.  If I manually suspend for up 
to say 10 seconds, they resume just fine.  I like the idea S2ram killing the 
network as the cause, I thought enabling wake on lan would keep it from being 
switched off, I'll see if I can research the suspending/resuming routines and 
blacklist or whatever to keep it running, thanks for the tip :)



Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-13 Thread J. Roeleveld
On December 11, 2018 10:59:47 PM UTC, Tsukasa Mcp_Reznor 
 wrote:
>_
>If you want to resume from NFS, you will need an initramfs that
>correctly passes the swap device for resuming.
>I would try the same method as resuming from encrypted swap.
>--
>Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
>
>
>I appreciate the response, I'm not trying to use hibernate but rather
>suspend to ram.  I don't use swap over NFS, the machines that do have
>hard drives installed use them for local swap and cachefilesd (which is
>amazingly performant)
>
>In the past when I've tried to use an initramfs, it's lead to boot
>hangs that I haven't quite figured out the root cause for,  I was
>trying to use genkernel to build them, maybe I'll give dracut a shot
>and see if that fixes the problem, you could very well be on to
>something.

I believe "suspend to ram" might switch off the network (and kill a NFS 
connection in the process). This might be the cause of the issue.
Do the nodes have enough memory to load the filesystem into RAM and run from 
there? (Like sysresccd can do)
If yes, that might allow this to work.

--
Joost
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-11 Thread Grant Taylor

On 12/11/2018 03:53 PM, Tsukasa Mcp_Reznor wrote:
Actually I haven't found the need for a menu at all, dnsmasq serves 
whatever kernel I have symbolically linked to the clients from their 
boot folder


Nice.

Aside:  I played with a PXELINUX (?) menu to boot a few different 
things.  It's been too long for me to remember details.  But I was quite 
happy with it.  I think I had installers for a couple of different Linux 
distros and a couple of DOS based utilities.


I've had this same IP for months on this machine, I would assume 
the same for the others,


Fair enough.

as far as log messages from the server, it answers the same to each 
request from the same mac as far back I I've scrolled through


Okay.  So there's no obvious conflict with UEFI and the OS DHCPing from 
the same MAC address.  Relatively clean transition.



Yes all are booting the same kernel


Nice.

(Kernel parameters moved to individual lines so my brain can absorb them.)


ip=dhcp


So, there are three things DHCPing.

1)  UEFI firmware
2)  Kernel itself
3)  OS init scripts

Intriguing.


root=/dev/nfs
rootfstype=nfs


I had no idea that there was an nfs device.  I am assuming that it's 
specific to the fact that the root file system type is NFS.  -  I must 
research this more.



rw
nfsroot=ServerIP:/diskless 
/root,nolock,fsc,tcp,proto=tcp,vers=4,nfsvers=4.2,rsize=1048576,wsize=1048576


I assume:

"ServerIP" is the NFS server's IP address.

"/diskless" is the NFS export

"/root,nolock,fsc,tcp,proto=tcp,vers=4,nfsvers=4.2,rsize=1048576,wsize=1048576" 
are NFS mount options.



raid=noautodetect


Why have a raid parameter?  Is there something in the kernel that you 
don't need and are disabling?  Or is this somehow influencing how file 
systems are mounted on boot?


Do OS init scripts try to remount root?  Or is there not an entry in 
/etc/fstab for the root, and just rely on the kernel's mount?


all the same root, I actually just have a custom bash script in local.d 
(openrc) for handling specifics for each node (adding dvd/blueray whatever 
to fstab)


Hum.

How are you handling the hostnames?  Or is that dynamic?

What about user accounts?  Are all your client systems using the same 
password & group files?


What about SSH host keys?

anything that conflicts like /var/log I just have as tmpfs on 
each machine


I can see that for logs.  But I don't think that an empty tmpfs is 
sufficient for things like passwd / group files or ssh host keys.


https://wiki.gentoo.org/wiki/Diskless_nodes I got my start from 
reading that, well unless you count doing diskless with ubuntu in the 
way way past, my hard drive died then and I wasn't about to just use a 
livedvd unable to really install anything lol

$ReadingList++

Thank you for the link and kindling something that's been a latent 
interest of mine.




--
Grant. . . .
unix || die



Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-11 Thread Tsukasa Mcp_Reznor
_
If you want to resume from NFS, you will need an initramfs that correctly 
passes the swap device for resuming.
I would try the same method as resuming from encrypted swap.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



I appreciate the response, I'm not trying to use hibernate but rather suspend 
to ram.  I don't use swap over NFS, the machines that do have hard drives 
installed use them for local swap and cachefilesd (which is amazingly 
performant)

In the past when I've tried to use an initramfs, it's lead to boot hangs that I 
haven't quite figured out the root cause for,  I was trying to use genkernel to 
build them, maybe I'll give dracut a shot and see if that fixes the problem, 
you could very well be on to something.


Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-11 Thread Tsukasa Mcp_Reznor

Do you have reservations in the DHCP server?  Or are the addresses truly
dynamic?

Dynamic, any "servers" that would require forwarding I just run on my server

Are you relying on the client's UEFI implementation to provide the menu?
  Or are you using PXELINUX for the menu?  (I know it's a nuance, but it
is a difference.)  The latter is much easier to centrally manage than
the former.

Actually I haven't found the need for a menu at all, dnsmasq serves 
whatever kernel I have symbolically linked to the clients from their boot folder


Does the UEFI stack get the same IP via DHCP that the OS gets via DHCP?
Is there any sort of contention?  Does UEFI release the IP before
bootstrapping the PXELINUX image?  Does the DHCP server view the
multiple requests from the same client MAC as a form of a refresh?  Or
does it just offer the same IP?

I've had this same IP for months on this machine, I would assume the same 
for the others, as far as log messages from the server, it answers the same to 
each request from the same mac as far back I I've scrolled through



Are all clients booting the same kernel, thus using the same command line?

This means that clients must use DHCP to retrieve their IP address.

---Yes all are booting the same kernel


I guess there is some opportunity to return different files (PXELINUX
image / config and / or kernel file) to different clients to get
different behavior.  But that might be more complexity than is necessary.

Would you please share the kernel command line?  I'm quite curious what
the syntax is for NFS root.

 ip=dhcp root=/dev/nfs rootfstype=nfs rw 
nfsroot=ServerIP:/diskless/root,nolock,fsc,tcp,proto=tcp,vers=4,nfsvers=4.2,rsize=1048576,wsize=1048576
 raid=noautodetect



That makes me think that you are using a separate NFS export for each
machine's root.

---all the same root, I actually just have a custom bash script in local.d 
(openrc) for handling specifics for each node (adding dvd/blueray whatever to 
fstab)

I have wondered about trying to do something similar (likely start in a
VM) that has (at least) one machine specific export for things like
/etc, but would then try to use a common export for things like /usr,
/lib, and maybe /var.

--- anything that conflicts like /var/log I just have as tmpfs on each machine


  - What is common between the diskless clients and what is unique.
 - PXELINUX image / config
 - Kernel image
 - NFS exports
  - What do your exports look like.
  - What sort of configuration you have in your DHCP server that's
specific to this.
 - Any sticky reservations, possibly with machine specific parameters.
  - Other things that I can't think of at the moment.

Thank you again.  Very interesting stuff.

--- https://wiki.gentoo.org/wiki/Diskless_nodes I got my start from reading 
that, well unless you count doing diskless with ubuntu in the way way past, my 
hard drive died then and I wasn't about to just use a livedvd unable to really 
install anything lol



Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-11 Thread J. Roeleveld
On December 11, 2018 11:23:27 AM UTC, Tsukasa Mcp_Reznor 
 wrote:
>
>From: Grant Taylor 
>Sent: Monday, December 10, 2018 10:14 PM
>To: gentoo-user@lists.gentoo.org
>Subject: Re: [gentoo-user] Root on NFS Suspend/Resume support
>
>On 12/10/18 8:03 PM, Tsukasa Mcp_Reznor wrote:
>> Has anyone managed to get suspend/resume to work on diskless
>machines
>> using NFS as the root?
>
>~blink~
>
>I haven't tried to suspend / resume diskless machines.  (I've not done
>much with diskless machines, but it's on my to do list.)
>
>But I don't think I would have thought about trying to suspend / resume
>a diskless machine.
>
>Are we talking about a wired Ethernet network connection with static
>IP(s)?  Or something more complex?
>
>Aside: I'm wondering why a diskless machine is using suspend / resume.
>If you're bored, I'd like to have my (apparently limited) world view
>expanded.
>
>> Suspend works like normal, but resume hard locks, can't seem to
>get any
>> error's or anything as it's not sending to any log files
>naturally.
>
>Have you tried using any network based logging?
>
>Can syslog log to a network block device?
>
>Doesn't the kernel have some network logging?  Or the ability to log
>debug info somewhere other than a file?
>
>> I have 3 machines currently running this setup, just trying to
>save
>> some power.  If it helps they are all using Realtek NICs.
>
>Okay.  I conceptually get saving power.
>
>How are you waking them up?  User interaction?  Clock?  Magic packet?
>
>> My google-fu hasn't turned up anything in the last 5 years.
>
>So, you've been working on it for a while.
>
>Are any of your problems related to stale file handles?  I.e. the
>diskless NFS client disagreeing with the NFS server about the state of
>the files?  Is the NFS server closing the files after a timeout?
>
>> Thanks
>
>You're welcome.  But I'm not sure I helped.  I would like to learn what
>you figure out.
>
>
>
>
>You're totally correct, more information would be beneficial, here
>goes.
>All machines are Wired 1Gbps connections.
>Uefi IP4 network stack sends dhcp request, gets boot file pxelinux.efi,
>the default entry sends the linux kernel (no initramfs needed, firmware
>added to kernel image).
>Another good note is the kernel contains the command line built-in for
>using root on NFS.
>Machine loads, mounts the required mount points through NFS4.2 (so much
>better than the old NFS 3 speeds).
>LightDM loads and users are free to work, in this case family members
>playing Steam/Diablo 3/etc.
>I switched to using Root on NFS for alot of reasons.
>
>Maintaining 4 gentoo installs on machines of varying specs and
>remembering to update each with good updates added a fair amount of
>administration time. (4, because the server is included)
>
>Using chroots on the server as binary build hosts for each machine
>solves some problems, but increases space requirements quite a bit, and
>adds latency if you want to use it while it's emerging anything, plus
>compiling say Libreoffice or whatever 3+ times in a row is pretty slow.
>
>Side note, If anyone else runs diskless I have a patch for wine I can
>send out that returns the nfs mount as a fixed hard drive, there are a
>few apps/games that refuse to install/run on a network share, and a
>patch for steam that removes the file locking issues so updates run
>quick and smooth (neither will ever be upstreamable, people have tried
>in the past)
>
>
>
>Thanks for your response, I'd love to help if you have any more
>questions, it's been a fun experience for me for sure. Also,
>cachefilesd if there's a drive available, makes everything feel like
>it's not a networked machine at all here.

If you want to resume from NFS, you will need an initramfs that correctly 
passes the swap device for resuming.
I would try the same method as resuming from encrypted swap.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-11 Thread Grant Taylor

On 12/11/2018 04:23 AM, Tsukasa Mcp_Reznor wrote:

You're totally correct, more information would be beneficial, here goes.


:-)


All machines are Wired 1Gbps connections.


ACK

That means that you don't have the complications (and performance 
issues) of wireless.


Uefi IP4 network stack sends dhcp request, gets boot file pxelinux.efi, 
the default entry sends the linux kernel (no initramfs needed, firmware 
added to kernel image).


Interesting.

Do you have reservations in the DHCP server?  Or are the addresses truly 
dynamic?


Are you relying on the client's UEFI implementation to provide the menu? 
 Or are you using PXELINUX for the menu?  (I know it's a nuance, but it 
is a difference.)  The latter is much easier to centrally manage than 
the former.


Does the UEFI stack get the same IP via DHCP that the OS gets via DHCP? 
Is there any sort of contention?  Does UEFI release the IP before 
bootstrapping the PXELINUX image?  Does the DHCP server view the 
multiple requests from the same client MAC as a form of a refresh?  Or 
does it just offer the same IP?


Another good note is the kernel contains the command line built-in for 
using root on NFS.


Okay.  ~pondering~

Are all clients booting the same kernel, thus using the same command line?

This means that clients must use DHCP to retrieve their IP address.

I guess there is some opportunity to return different files (PXELINUX 
image / config and / or kernel file) to different clients to get 
different behavior.  But that might be more complexity than is necessary.


Would you please share the kernel command line?  I'm quite curious what 
the syntax is for NFS root.


Machine loads, mounts the required mount points through NFS4.2 (so much 
better than the old NFS 3 speeds).


Nice.

LightDM loads and users are free to work, in this case family members 
playing Steam/Diablo 3/etc.


:-)


I switched to using Root on NFS for alot of reasons.


:-)

Maintaining 4 gentoo installs on machines of varying specs and remembering 
to update each with good updates added a fair amount of administration 
time. (4, because the server is included)


*nod*

Using chroots on the server as binary build hosts for each machine 
solves some problems, but increases space requirements quite a bit, and 
adds latency if you want to use it while it's emerging anything, plus 
compiling say Libreoffice or whatever 3+ times in a row is pretty slow.


That makes me think that you are using a separate NFS export for each 
machine's root.


I have wondered about trying to do something similar (likely start in a 
VM) that has (at least) one machine specific export for things like 
/etc, but would then try to use a common export for things like /usr, 
/lib, and maybe /var.


Maybe a common / export and a per machine /etc would accomplish what I'm 
thinking.


Side note, If anyone else runs diskless I have a patch for wine I can 
send out that returns the nfs mount as a fixed hard drive, there are a 
few apps/games that refuse to install/run on a network share, and a patch 
for steam that removes the file locking issues so updates run quick and 
smooth


Nice.


(neither will ever be upstreamable, people have tried in the past)


:-/

Thanks for your response, I'd love to help if you have any more questions, 
it's been a fun experience for me for sure. Also, cachefilesd if there's a 
drive available, makes everything feel like it's not a networked machine 
at all here.

You're welcome.

Thank you for sharing.

I'd love to know more about how you're doing things.

 - What is common between the diskless clients and what is unique.
- PXELINUX image / config
- Kernel image
- NFS exports
 - What do your exports look like.
 - What sort of configuration you have in your DHCP server that's 
specific to this.

- Any sticky reservations, possibly with machine specific parameters.
 - Other things that I can't think of at the moment.

Thank you again.  Very interesting stuff.



--
Grant. . . .
unix || die



Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-11 Thread Tsukasa Mcp_Reznor

From: Grant Taylor 
Sent: Monday, December 10, 2018 10:14 PM
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Root on NFS Suspend/Resume support

On 12/10/18 8:03 PM, Tsukasa Mcp_Reznor wrote:
> Has anyone managed to get suspend/resume to work on diskless machines
> using NFS as the root?

~blink~

I haven't tried to suspend / resume diskless machines.  (I've not done
much with diskless machines, but it's on my to do list.)

But I don't think I would have thought about trying to suspend / resume
a diskless machine.

Are we talking about a wired Ethernet network connection with static
IP(s)?  Or something more complex?

Aside: I'm wondering why a diskless machine is using suspend / resume.
If you're bored, I'd like to have my (apparently limited) world view
expanded.

> Suspend works like normal, but resume hard locks, can't seem to get any
> error's or anything as it's not sending to any log files naturally.

Have you tried using any network based logging?

Can syslog log to a network block device?

Doesn't the kernel have some network logging?  Or the ability to log
debug info somewhere other than a file?

> I have 3 machines currently running this setup, just trying to save
> some power.  If it helps they are all using Realtek NICs.

Okay.  I conceptually get saving power.

How are you waking them up?  User interaction?  Clock?  Magic packet?

> My google-fu hasn't turned up anything in the last 5 years.

So, you've been working on it for a while.

Are any of your problems related to stale file handles?  I.e. the
diskless NFS client disagreeing with the NFS server about the state of
the files?  Is the NFS server closing the files after a timeout?

> Thanks

You're welcome.  But I'm not sure I helped.  I would like to learn what
you figure out.




You're totally correct, more information would be beneficial, here goes.
All machines are Wired 1Gbps connections.
Uefi IP4 network stack sends dhcp request, gets boot file pxelinux.efi, the 
default entry sends the linux kernel (no initramfs needed, firmware added to 
kernel image).
Another good note is the kernel contains the command line built-in for using 
root on NFS.
Machine loads, mounts the required mount points through NFS4.2 (so much better 
than the old NFS 3 speeds).
LightDM loads and users are free to work, in this case family members playing 
Steam/Diablo 3/etc.
I switched to using Root on NFS for alot of reasons.

Maintaining 4 gentoo installs on machines of varying specs and remembering to 
update each with good updates added a fair amount of administration time. (4, 
because the server is included)

Using chroots on the server as binary build hosts for each machine solves some 
problems, but increases space requirements quite a bit, and adds latency if you 
want to use it while it's emerging anything, plus compiling say Libreoffice or 
whatever 3+ times in a row is pretty slow.

Side note, If anyone else runs diskless I have a patch for wine I can send out 
that returns the nfs mount as a fixed hard drive, there are a few apps/games 
that refuse to install/run on a network share, and a patch for steam that 
removes the file locking issues so updates run quick and smooth (neither will 
ever be upstreamable, people have tried in the past)



Thanks for your response, I'd love to help if you have any more questions, it's 
been a fun experience for me for sure. Also, cachefilesd if there's a drive 
available, makes everything feel like it's not a networked machine at all here.


Re: [gentoo-user] Root on NFS Suspend/Resume support

2018-12-10 Thread Grant Taylor

On 12/10/18 8:03 PM, Tsukasa Mcp_Reznor wrote:
Has anyone managed to get suspend/resume to work on diskless machines 
using NFS as the root?


~blink~

I haven't tried to suspend / resume diskless machines.  (I've not done 
much with diskless machines, but it's on my to do list.)


But I don't think I would have thought about trying to suspend / resume 
a diskless machine.


Are we talking about a wired Ethernet network connection with static 
IP(s)?  Or something more complex?


Aside: I'm wondering why a diskless machine is using suspend / resume. 
If you're bored, I'd like to have my (apparently limited) world view 
expanded.


Suspend works like normal, but resume hard locks, can't seem to get any 
error's or anything as it's not sending to any log files naturally.


Have you tried using any network based logging?

Can syslog log to a network block device?

Doesn't the kernel have some network logging?  Or the ability to log 
debug info somewhere other than a file?


I have 3 machines currently running this setup, just trying to save 
some power.  If it helps they are all using Realtek NICs.


Okay.  I conceptually get saving power.

How are you waking them up?  User interaction?  Clock?  Magic packet?


My google-fu hasn't turned up anything in the last 5 years.


So, you've been working on it for a while.

Are any of your problems related to stale file handles?  I.e. the 
diskless NFS client disagreeing with the NFS server about the state of 
the files?  Is the NFS server closing the files after a timeout?



Thanks


You're welcome.  But I'm not sure I helped.  I would like to learn what 
you figure out.