Re: Complete lock-up from using pkgsrc/net/darkstat

2022-05-29 Thread Lloyd Parkes




On 27/05/22 06:00, John Klos wrote:

So here's an interesting problem:

On NetBSD 8, 9, current, with both ipfilter and with npf, with different 
kinds of ethernet interfaces (re*, wm*), run pkgsrc/net/darkstat. Pass a 
lot of traffic (like a week's worth of Internet traffic). Stop darkstat. 
Machine locks.


I had a go at reproducing this on NetBSD 9.99.79 with no luck. I was 
just pumping several TB of data into nc running on the host, so no IP 
forwarding or anything.


I do recall seeing a message about UDP buffer problems from the DNS 
lookup child and I've had named running in small systems fail with 
similar sounding error messages.


Maybe the problem is related to the number of DNS lookups the child 
process is doing rather than the number of TB the parent process is 
counting?


Cheers,
Lloyd


Re: regarding the changes to kernel entropy gathering

2021-04-04 Thread Lloyd Parkes
With some trepidation, I'm going to dip into this conversation even 
though I haven't read all of.  I don't have the mental fortitude for 
that. I have two suggestions, one short and one long.


Firstly, we could just have an rc.d script that checks to see if the 
system has /var/db/entropy-file or an rng device, and if not then it 
prints a warning and then generates some simplistic entropy with "ls -lR 
/ ; dd if=/dev/urandom of=/dev/random bs=32 count=1 ; sysctl -w 
kern.entropy.consolidate=1". The system owner has been warned and the 
system proceeds to run.


Secondly we could fix what I see as the biggest problem with the new 
implementation that I see right now and that is that it is unreasonably 
difficult for people to work out how to make their system go forwards 
once it has stopped. Note that making the system go forwards is easy, 
it's work out what to do that's hard. We can fix that.


The current implementation prints out a message whenever it blocks a 
process that wants randomness, which immediately makes this 
implementation superior to all others that I have ever seen. The number 
of times I've logged into systems that have stalled on boot and made 
them finish booting by running "ls -lR /" over the past 20 years are too 
many to count. I don't know if I just needed to wait longer for the boot 
to finish, or if generating entropy was the fix, and I will never know. 
This is nuts.


We can use the message to point the system administrator to a manual 
page that tells them what to do, and by "tells them what to do", I mean 
in plain simple language, right at the top of the page, without scaring 
them.


How about this..

"entropy: pid %d (%s) blocking due to lack of entropy, see entropy(4)"

and then in entropy(4) we can start with something like

"If you are reading this because you have read a kernel message telling 
you that a process is blocking due to a lack of entropy then it is 
almost certainly because your hardware doesn't have a reliable source of 
randomness. If you have no particular requirements for cryptographic 
security on your system, you can generate some entropy and then tell the 
kernel that this entropy is 'enough' with the commands

    ls -lR /
    dd if=/dev/urandom of=/dev/random bs=32 count=1
    sysctl -w kern.entropy.consolidate=1
If have strong requirements for cryptographic security on your system 
then you should run 'rndctl -S /root/seed' on a system with hardware 
random number generate (most modern CPUs), copy the seed file over to 
this system as /var/db/entropy-file and then run 'rndctl -L 
/var/db/entropy-file'.


This only needs to be done once since scripts in rc.d will take care of 
saving and restoring system entropy in /var/db/entropy-file across reboots."


We could even do both of these things.



Re: 5.x filesystem performance regression

2011-06-04 Thread Lloyd Parkes
??? Why is this in tech-net?

On 5/06/2011, at 2:40 AM, Edgar Fuß wrote:

> Having fixed my performace-critical RAID configuration, I think there's some
> serious filesystem performance regression from 4.x to 5.x.
> 
> I've tested every possible combination of 4.0.1 vs. 5.1, softdep vs. WAPBL,
> parity maps enabled vs. disabled, bare disc vs. RAID 1 vs. RAID 5.

Excellent.

> The test case was extracting the 5.1 src.tgz set onto the filesystem under 
> test.
> The extraction was done twice (having deleted the extracted tree in between);

I always reboot between such tests to ensure that the buffer cache has been 
cleared out. If I ever get around to running RAID benchmarks again, I'll script 
it all in /etc/rc.d with reboots between each run so that I can get a number of 
runs without having to run anything by hand.

> in some cases the time for the first run is missing because I forgot to time
> the tar command.

That's a problem because that is what is required to show the effect of the 
buffer cache.

> So, almost everywhere, 4.0.1 is three to fiveteen times as fast as 5.1.

I'm afraid that is isn't even close to almost everywhere because there are so 
many missing measurements. If we ignore all of the second runs because of the 
buffer cache issue, we only have two columns that contain enough data. The 
first is the plain disc column and it shows things looking pretty good for 5.1. 
The second is RAID 5 32k which doesn't look so good. For some reason, RAID 5 
appears to be very slow and it needs looking at.

If we want to look at the second runs in order to work out why 5.1 looks so 
much worse in the second runs, we still only have enough data in the plain disk 
and RAID 5 32k columns. For the plain disk, 5.1 does perform better in the 
second run than the first, just nowhere near as well as 4.0.1. My guess is that 
the VM parameters changed between 4.0.1. and 5.1 (they did change, I just can't 
remember when). Try comparing the output of "sysctl vm" on the two versions of 
NetBSD. My experience is that the VM settings need adjusting in order to get 
acceptable performance from any specialised workload and I suspect that under 
4.0.1 your file set fits in memory, but under 5.1 it doesn't fit in the allowed 
file memory. Once again RAID 5 appears to be very slow and it needs looking at. 

Cheers,
Lloyd