Re: huge nanosleep variance on 11-stable
On Tue, Nov 1, 2016 at 10:16 PM, Jason Harmeningwrote: > > > On 11/01/16 20:45, Kevin Oberman wrote: > > On Tue, Nov 1, 2016 at 2:36 PM, Jason Harmening > > > wrote: > > > > Sorry, that should be ~*30ms* to get 30fps, though the variance is > still > > up to 500ms for me either way. > > > > On 11/01/16 14:29, Jason Harmening wrote: > > > repro code is at http://pastebin.com/B68N4AFY if anyone's > interested. > > > > > > On 11/01/16 13:58, Jason Harmening wrote: > > >> Hi everyone, > > >> > > >> I recently upgraded my main amd64 server from 10.3-stable > > (r302011) to > > >> 11.0-stable (r308099). It went smoothly except for one big issue: > > >> certain applications (but not the system as a whole) respond very > > >> sluggishly, and video playback of any kind is extremely choppy. > > >> > > >> The system is under very light load, and I see no evidence of > > abnormal > > >> interrupt latency or interrupt load. More interestingly, if I > > place the > > >> system under full load (~0.0% idle) the problem *disappears* and > > >> playback/responsiveness are smooth and quick. > > >> > > >> Running ktrace on some of the affected apps points me at the > problem: > > >> huge variance in the amount of time spent in the nanosleep system > > call. > > >> A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from > > entry > > >> to return of the syscall. OTOH, anything CPU-bound or that waits > on > > >> condvars or I/O interrupts seems to work fine, so this doesn't > > seem to > > >> be an issue with overall system latency. > > >> > > >> I can repro this with a simple program that just does a 3ms > > usleep in a > > >> tight loop (i.e. roughly the amount of time a video player would > > sleep > > >> between frames @ 30fps). At light load ktrace will show the huge > > >> nanosleep variance; under heavy load every nanosleep will > complete in > > >> almost exactly 3ms. > > >> > > >> FWIW, I don't see this on -current, although right now all my > > -current > > >> images are VMs on different HW so that might not mean anything. > > I'm not > > >> aware of any recent timer- or scheduler- specific changes, so I'm > > >> wondering if perhaps the recent IPI or taskqueue changes might be > > >> somehow to blame. > > >> > > >> I'm not especially familiar w/ the relevant parts of the kernel, > > so any > > >> guidance on where I should focus my debugging efforts would be > much > > >> appreciated. > > >> > > >> Thanks, > > >> Jason > > > > > > This is likely off track, but this is a behavior I have noticed since > > moving to 11, though it might have started in 10.3-STABLE before moving > > to head before 11 went to beta. I can't explain any way nanosleep could > > be involved, but I saw annoying lock-ups similar to yours. I also no > > longer see them. > > > > I eliminated the annoyance by change scheduler from ULE to 4BSD. That > > was it, but I have not seen the issue since. I'd be very interested in > > whether the scheduler is somehow impacting timing functions or it's s > > different issue. I've felt that there was something off in ULE for some > > time, but it was not until these annoying hiccups convinced me to try > > going back to 4BSD. > > > > Tip o' the hat to Doug B. for his suggestions that ULE may have issues > > that impacted interactivity. > > I figured it out: r282678 (which was never MFCed to 10-stable) added > support for the MWAIT instruction on the idle path for Intel CPUs that > claim to support it. > > While my CPU (2009-era Xeon 5500) advertises support for it in its > feature mask and ACPI C-state entries, the cores don't seem to respond > very quickly to interrupts while idling in MWAIT. Disabling mwait in > acpi_cpu.c and falling back to the old "sti; hlt" mechanism for C1 > completely fixes the responsiveness issues. > > So if your CPU is of a similar vintage, it may not be ULE's fault. > > You are almost certainly correct. My system is circa 2011; i5-2520M, Sandy Bridge. While it might have the same issue, I'd be surprised. It's possible, but probably completely different from what you are seeing. Reports of the problem I was seeing definitely pre-date 11, but 11 made things much worse, so it could be a combination of things. And I can't see how ULE could have anything to do with this issue. Congratulations on some really good sleuthing to find this. -- Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkober...@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to
Re: huge nanosleep variance on 11-stable
On 11/01/16 20:45, Kevin Oberman wrote: > On Tue, Nov 1, 2016 at 2:36 PM, Jason Harmening >> wrote: > > Sorry, that should be ~*30ms* to get 30fps, though the variance is still > up to 500ms for me either way. > > On 11/01/16 14:29, Jason Harmening wrote: > > repro code is at http://pastebin.com/B68N4AFY if anyone's interested. > > > > On 11/01/16 13:58, Jason Harmening wrote: > >> Hi everyone, > >> > >> I recently upgraded my main amd64 server from 10.3-stable > (r302011) to > >> 11.0-stable (r308099). It went smoothly except for one big issue: > >> certain applications (but not the system as a whole) respond very > >> sluggishly, and video playback of any kind is extremely choppy. > >> > >> The system is under very light load, and I see no evidence of > abnormal > >> interrupt latency or interrupt load. More interestingly, if I > place the > >> system under full load (~0.0% idle) the problem *disappears* and > >> playback/responsiveness are smooth and quick. > >> > >> Running ktrace on some of the affected apps points me at the problem: > >> huge variance in the amount of time spent in the nanosleep system > call. > >> A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from > entry > >> to return of the syscall. OTOH, anything CPU-bound or that waits on > >> condvars or I/O interrupts seems to work fine, so this doesn't > seem to > >> be an issue with overall system latency. > >> > >> I can repro this with a simple program that just does a 3ms > usleep in a > >> tight loop (i.e. roughly the amount of time a video player would > sleep > >> between frames @ 30fps). At light load ktrace will show the huge > >> nanosleep variance; under heavy load every nanosleep will complete in > >> almost exactly 3ms. > >> > >> FWIW, I don't see this on -current, although right now all my > -current > >> images are VMs on different HW so that might not mean anything. > I'm not > >> aware of any recent timer- or scheduler- specific changes, so I'm > >> wondering if perhaps the recent IPI or taskqueue changes might be > >> somehow to blame. > >> > >> I'm not especially familiar w/ the relevant parts of the kernel, > so any > >> guidance on where I should focus my debugging efforts would be much > >> appreciated. > >> > >> Thanks, > >> Jason > > > This is likely off track, but this is a behavior I have noticed since > moving to 11, though it might have started in 10.3-STABLE before moving > to head before 11 went to beta. I can't explain any way nanosleep could > be involved, but I saw annoying lock-ups similar to yours. I also no > longer see them. > > I eliminated the annoyance by change scheduler from ULE to 4BSD. That > was it, but I have not seen the issue since. I'd be very interested in > whether the scheduler is somehow impacting timing functions or it's s > different issue. I've felt that there was something off in ULE for some > time, but it was not until these annoying hiccups convinced me to try > going back to 4BSD. > > Tip o' the hat to Doug B. for his suggestions that ULE may have issues > that impacted interactivity. I figured it out: r282678 (which was never MFCed to 10-stable) added support for the MWAIT instruction on the idle path for Intel CPUs that claim to support it. While my CPU (2009-era Xeon 5500) advertises support for it in its feature mask and ACPI C-state entries, the cores don't seem to respond very quickly to interrupts while idling in MWAIT. Disabling mwait in acpi_cpu.c and falling back to the old "sti; hlt" mechanism for C1 completely fixes the responsiveness issues. So if your CPU is of a similar vintage, it may not be ULE's fault. signature.asc Description: OpenPGP digital signature
Re: huge nanosleep variance on 11-stable
On Tue, Nov 1, 2016 at 2:36 PM, Jason Harmeningwrote: > Sorry, that should be ~*30ms* to get 30fps, though the variance is still > up to 500ms for me either way. > > On 11/01/16 14:29, Jason Harmening wrote: > > repro code is at http://pastebin.com/B68N4AFY if anyone's interested. > > > > On 11/01/16 13:58, Jason Harmening wrote: > >> Hi everyone, > >> > >> I recently upgraded my main amd64 server from 10.3-stable (r302011) to > >> 11.0-stable (r308099). It went smoothly except for one big issue: > >> certain applications (but not the system as a whole) respond very > >> sluggishly, and video playback of any kind is extremely choppy. > >> > >> The system is under very light load, and I see no evidence of abnormal > >> interrupt latency or interrupt load. More interestingly, if I place the > >> system under full load (~0.0% idle) the problem *disappears* and > >> playback/responsiveness are smooth and quick. > >> > >> Running ktrace on some of the affected apps points me at the problem: > >> huge variance in the amount of time spent in the nanosleep system call. > >> A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from entry > >> to return of the syscall. OTOH, anything CPU-bound or that waits on > >> condvars or I/O interrupts seems to work fine, so this doesn't seem to > >> be an issue with overall system latency. > >> > >> I can repro this with a simple program that just does a 3ms usleep in a > >> tight loop (i.e. roughly the amount of time a video player would sleep > >> between frames @ 30fps). At light load ktrace will show the huge > >> nanosleep variance; under heavy load every nanosleep will complete in > >> almost exactly 3ms. > >> > >> FWIW, I don't see this on -current, although right now all my -current > >> images are VMs on different HW so that might not mean anything. I'm not > >> aware of any recent timer- or scheduler- specific changes, so I'm > >> wondering if perhaps the recent IPI or taskqueue changes might be > >> somehow to blame. > >> > >> I'm not especially familiar w/ the relevant parts of the kernel, so any > >> guidance on where I should focus my debugging efforts would be much > >> appreciated. > >> > >> Thanks, > >> Jason > This is likely off track, but this is a behavior I have noticed since moving to 11, though it might have started in 10.3-STABLE before moving to head before 11 went to beta. I can't explain any way nanosleep could be involved, but I saw annoying lock-ups similar to yours. I also no longer see them. I eliminated the annoyance by change scheduler from ULE to 4BSD. That was it, but I have not seen the issue since. I'd be very interested in whether the scheduler is somehow impacting timing functions or it's s different issue. I've felt that there was something off in ULE for some time, but it was not until these annoying hiccups convinced me to try going back to 4BSD. Tip o' the hat to Doug B. for his suggestions that ULE may have issues that impacted interactivity. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Problems with jail inside ZFS dataset
On Tue, Nov 1, 2016 at 5:05 PM, eddyrazwrote: > > > Good Afternoon., I have A VPS with FreeBSD in a dedicated server with OVH > Hosting. Last week I installed FreeBSD 11, Ran buildworld after that Crea\ > ted a VM within a jail in /local/jails/ ZFS dataset > > While trying to start the VM gave an error that said it could not mount > nullfs. > > I added this in /boot/loader.conf > > nullfs_mount=1 > > later the error changed to: > > Starting jails: cannot start jail "haproxy": > mount: .: Operation not supported by device > jail: haproxy: /sbin/mount -t fdescfs . /local/jails/haproxy/dev/fd: failed > > Following this post in Internet > > https://lists.freebsd.org/pipermail/freebsd-stable/2014-August/079700.html > > I ran the patch that they advised to. and I found in > > http://pastebin.com/5t9zEzkV > > > but while aplying the patch with > > patch /sys/fs/fdescfs/fdesc_vfsops.c sys_fs_fdescfs_fdesc_vfsop > > gives this error. > > ** > Hmm... Looks like a unified diff to me... > The text leading up to this was: > > -- > |diff --git a/sys/fs/fdescfs/fdesc_vfsops.c b/sys/fs/fdescfs/fdesc_vfsops.c > |index cb5e3c0..7193809 100644 > |--- a/sys/fs/fdescfs/fdesc_vfsops.c > |+++ b/sys/fs/fdescfs/fdesc_vfsops.c > -- > Patching file /sys/fs/fdescfs/fdesc_vfsops.c using Plan A... > Reversed (or previously applied) patch detected! Assume -R? [y] y > Hunk #1 succeeded at 51 (offset 1 line). > Hunk #2 failed at 79. > No such line 241 in input file, ignoring > Hunk #3 succeeded at 229 (offset -8 lines). > 1 out of 3 hunks failed--saving rejects to > /sys/fs/fdescfs/fdesc_vfsops.c.rej > Hmm... The next patch looks like a unified diff to me... > The text leading up to this was: > -- > |diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c > |index 2846eca..791723d 100644 > |--- a/sys/kern/kern_jail.c > |+++ b/sys/kern/kern_jail.c > -- > File to patch: /sys/kern/kern_jail.c > Patching file /sys/kern/kern_jail.c using Plan A... > Hunk #1 failed at 207. > Hunk #2 failed at 224. > Hunk #3 failed at 4247. > Hunk #4 failed at 4403. > 4 out of 4 hunks failed--saving rejects to /sys/kern/kern_jail.c.rej > Hmm... The next patch looks like a unified diff to me... > The text leading up to this was: > > |diff --git a/sys/sys/jail.h b/sys/sys/jail.h > |index a82a499..a01d665 100644 > |--- a/sys/sys/jail.h > |+++ b/sys/sys/jail.h > > -- > File to patch: /sys/sys/jail.h > Patching file /sys/sys/jail.h using Plan A... > Hunk #1 failed at 228. > 1 out of 1 hunks failed--saving rejects to /sys/sys/jail.h.rej > done > # > > * > > I asked in serverfault.com but they make me notice that this was a patch for > 10.x and I was applying it to 11. but afterthat no further advance, Ple\ > ase could anyone help me on this. Thanks in advance. ZFS appears to be unrelated to your problem. It sounds like you're trying to mount fdescfs from within a jail. That's not allowed by default. Is your VPS itself a jail? If so, you'll have to ask your hosting provider to set allow.mount.fdescfs=1 in your jail config. Or are you trying to use nested jails? If so, set that same parameter in the config for your outer jail. -Alan ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Problems with jail inside ZFS dataset
Good Afternoon., I have A VPS with FreeBSD in a dedicated server with OVH Hosting. Last week I installed FreeBSD 11, Ran buildworld after that Crea\ ted a VM within a jail in /local/jails/ ZFS dataset While trying to start the VM gave an error that said it could not mount nullfs. I added this in /boot/loader.conf nullfs_mount=1 later the error changed to: Starting jails: cannot start jail "haproxy": mount: .: Operation not supported by device jail: haproxy: /sbin/mount -t fdescfs . /local/jails/haproxy/dev/fd: failed Following this post in Internet https://lists.freebsd.org/pipermail/freebsd-stable/2014-August/079700.html I ran the patch that they advised to. and I found in http://pastebin.com/5t9zEzkV but while aplying the patch with patch /sys/fs/fdescfs/fdesc_vfsops.c sys_fs_fdescfs_fdesc_vfsop gives this error. ** Hmm... Looks like a unified diff to me... The text leading up to this was: -- |diff --git a/sys/fs/fdescfs/fdesc_vfsops.c b/sys/fs/fdescfs/fdesc_vfsops.c |index cb5e3c0..7193809 100644 |--- a/sys/fs/fdescfs/fdesc_vfsops.c |+++ b/sys/fs/fdescfs/fdesc_vfsops.c -- Patching file /sys/fs/fdescfs/fdesc_vfsops.c using Plan A... Reversed (or previously applied) patch detected! Assume -R? [y] y Hunk #1 succeeded at 51 (offset 1 line). Hunk #2 failed at 79. No such line 241 in input file, ignoring Hunk #3 succeeded at 229 (offset -8 lines). 1 out of 3 hunks failed--saving rejects to /sys/fs/fdescfs/fdesc_vfsops.c.rej Hmm... The next patch looks like a unified diff to me... The text leading up to this was: -- |diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c |index 2846eca..791723d 100644 |--- a/sys/kern/kern_jail.c |+++ b/sys/kern/kern_jail.c -- File to patch: /sys/kern/kern_jail.c Patching file /sys/kern/kern_jail.c using Plan A... Hunk #1 failed at 207. Hunk #2 failed at 224. Hunk #3 failed at 4247. Hunk #4 failed at 4403. 4 out of 4 hunks failed--saving rejects to /sys/kern/kern_jail.c.rej Hmm... The next patch looks like a unified diff to me... The text leading up to this was: |diff --git a/sys/sys/jail.h b/sys/sys/jail.h |index a82a499..a01d665 100644 |--- a/sys/sys/jail.h |+++ b/sys/sys/jail.h -- File to patch: /sys/sys/jail.h Patching file /sys/sys/jail.h using Plan A... Hunk #1 failed at 228. 1 out of 1 hunks failed--saving rejects to /sys/sys/jail.h.rej done # * I asked in serverfault.com but they make me notice that this was a patch for 10.x and I was applying it to 11. but afterthat no further advance, Ple\ ase could anyone help me on this. Thanks in advance. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: huge nanosleep variance on 11-stable
Sorry, that should be ~*30ms* to get 30fps, though the variance is still up to 500ms for me either way. On 11/01/16 14:29, Jason Harmening wrote: > repro code is at http://pastebin.com/B68N4AFY if anyone's interested. > > On 11/01/16 13:58, Jason Harmening wrote: >> Hi everyone, >> >> I recently upgraded my main amd64 server from 10.3-stable (r302011) to >> 11.0-stable (r308099). It went smoothly except for one big issue: >> certain applications (but not the system as a whole) respond very >> sluggishly, and video playback of any kind is extremely choppy. >> >> The system is under very light load, and I see no evidence of abnormal >> interrupt latency or interrupt load. More interestingly, if I place the >> system under full load (~0.0% idle) the problem *disappears* and >> playback/responsiveness are smooth and quick. >> >> Running ktrace on some of the affected apps points me at the problem: >> huge variance in the amount of time spent in the nanosleep system call. >> A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from entry >> to return of the syscall. OTOH, anything CPU-bound or that waits on >> condvars or I/O interrupts seems to work fine, so this doesn't seem to >> be an issue with overall system latency. >> >> I can repro this with a simple program that just does a 3ms usleep in a >> tight loop (i.e. roughly the amount of time a video player would sleep >> between frames @ 30fps). At light load ktrace will show the huge >> nanosleep variance; under heavy load every nanosleep will complete in >> almost exactly 3ms. >> >> FWIW, I don't see this on -current, although right now all my -current >> images are VMs on different HW so that might not mean anything. I'm not >> aware of any recent timer- or scheduler- specific changes, so I'm >> wondering if perhaps the recent IPI or taskqueue changes might be >> somehow to blame. >> >> I'm not especially familiar w/ the relevant parts of the kernel, so any >> guidance on where I should focus my debugging efforts would be much >> appreciated. >> >> Thanks, >> Jason >> > signature.asc Description: OpenPGP digital signature
Re: huge nanosleep variance on 11-stable
repro code is at http://pastebin.com/B68N4AFY if anyone's interested. On 11/01/16 13:58, Jason Harmening wrote: > Hi everyone, > > I recently upgraded my main amd64 server from 10.3-stable (r302011) to > 11.0-stable (r308099). It went smoothly except for one big issue: > certain applications (but not the system as a whole) respond very > sluggishly, and video playback of any kind is extremely choppy. > > The system is under very light load, and I see no evidence of abnormal > interrupt latency or interrupt load. More interestingly, if I place the > system under full load (~0.0% idle) the problem *disappears* and > playback/responsiveness are smooth and quick. > > Running ktrace on some of the affected apps points me at the problem: > huge variance in the amount of time spent in the nanosleep system call. > A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from entry > to return of the syscall. OTOH, anything CPU-bound or that waits on > condvars or I/O interrupts seems to work fine, so this doesn't seem to > be an issue with overall system latency. > > I can repro this with a simple program that just does a 3ms usleep in a > tight loop (i.e. roughly the amount of time a video player would sleep > between frames @ 30fps). At light load ktrace will show the huge > nanosleep variance; under heavy load every nanosleep will complete in > almost exactly 3ms. > > FWIW, I don't see this on -current, although right now all my -current > images are VMs on different HW so that might not mean anything. I'm not > aware of any recent timer- or scheduler- specific changes, so I'm > wondering if perhaps the recent IPI or taskqueue changes might be > somehow to blame. > > I'm not especially familiar w/ the relevant parts of the kernel, so any > guidance on where I should focus my debugging efforts would be much > appreciated. > > Thanks, > Jason > signature.asc Description: OpenPGP digital signature
huge nanosleep variance on 11-stable
Hi everyone, I recently upgraded my main amd64 server from 10.3-stable (r302011) to 11.0-stable (r308099). It went smoothly except for one big issue: certain applications (but not the system as a whole) respond very sluggishly, and video playback of any kind is extremely choppy. The system is under very light load, and I see no evidence of abnormal interrupt latency or interrupt load. More interestingly, if I place the system under full load (~0.0% idle) the problem *disappears* and playback/responsiveness are smooth and quick. Running ktrace on some of the affected apps points me at the problem: huge variance in the amount of time spent in the nanosleep system call. A sleep of, say, 5ms might take anywhere from 5ms to ~500ms from entry to return of the syscall. OTOH, anything CPU-bound or that waits on condvars or I/O interrupts seems to work fine, so this doesn't seem to be an issue with overall system latency. I can repro this with a simple program that just does a 3ms usleep in a tight loop (i.e. roughly the amount of time a video player would sleep between frames @ 30fps). At light load ktrace will show the huge nanosleep variance; under heavy load every nanosleep will complete in almost exactly 3ms. FWIW, I don't see this on -current, although right now all my -current images are VMs on different HW so that might not mean anything. I'm not aware of any recent timer- or scheduler- specific changes, so I'm wondering if perhaps the recent IPI or taskqueue changes might be somehow to blame. I'm not especially familiar w/ the relevant parts of the kernel, so any guidance on where I should focus my debugging efforts would be much appreciated. Thanks, Jason signature.asc Description: OpenPGP digital signature
pax(1) needs to learn POSIX-pax format (by libarchive(3)?)
Dear hackers, I'm frequently missing pax(1) ability to handle the pax (the POSIX pax) format. Backing up real-world file names and lengths doesn't work with ustar format - which pax(1) uses and also tar(1) by default. I'd prefer using pax(1) because of it's cli usage – personal taste… But in practice, I'm forced to use tar(1), overriding tar's default format with the "--format pax" (or --posix) option, for almost any archive/backup job (where zfs send isn't feasible). Since tar(1) does support the POSIX pax format, it's not a big issue, but weird using tar for pax and pax for tar ;-) I'd love pax(1) beeing libarchive(3)ed. Has anyone ever thought about? Unfortunately I'm lacking skills and time :-( Thanks, -harry ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Dying jail
01.11.2016 0:21, Baptiste Daroussin wrote: I can see the jail staying in dying mode for multiple minutes even after sockstat -j has been showing no TCP is left at all. No processes are left in the jail Same here, but not for multuple minutes but multiple days: my dying jail without a process cannot die 6 days already. It was restarted with another JID 6 days ago and its new instance runs just fine but old still dying. This is definitely a regression since 9.3-STABLE that ran "service jail restart" just fine even using fixed JID. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"