Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()

2014-10-16 Thread Andriy Gapon
On 16/10/2014 08:56, Justin T. Gibbs wrote:
 avg pointed out the rate limiting code in vm_pageout_scan() during discussion
 about PR 187594.  While it certainly can contribute to the problems discussed
 in that PR, a bigger problem is that it can allow the OOM killer to be
 triggered even though there is plenty of reclaimable memory available in the
 system.  Any load that can consume enough pages within the polling interval
 to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero
 of=/file/on/zfs') can make this happen.
 
 The product I’m working on does not have swap configured and treats any OOM
 trigger as fatal, so it is very obvious when this happens. :-)
 
 I’ve tried several things to mitigate the problem.  The first was to ignore
 rate limiting for pass 2.  However, even though ZFS is guaranteed to receive
 some feedback prior to OOM being declared, my testing showed that a trivial
 load (a couple dd operations) could still consume enough of the reclaimed
 space to leave the system below its target at the end of pass 2.  After
 removing the rate limiting entirely, I’ve so far been unable to kill the
 system via a ZFS induced load.
 
 I understand the motivation behind the rate limiting, but the current
 implementation seems too simplistic to be safe.  The documentation for the
 Solaris slab allocator provides good motivation for their approach of using a
 “sliding average” to reign in temporary bursts of usage without unduly
 harming efficient service for the recorded steady-state memory demand.
 Regardless of the approach taken, I believe that the OOM killer must be a
 last resort and shouldn’t be called when there are caches that can be
 culled.

FWIW, I have this toy branch:
https://github.com/avg-I/freebsd/compare/experiment/uma-cache-trimming

Not all commits are relevant to the problem and some things are unfinished.
Not sure if the changes would help your case either...

 One other thing I’ve noticed in my testing with ZFS is that it needs feedback
 and a little time to react to memory pressure.  Calling it’s lowmem handler
 just once isn’t enough for it to limit in-flight writes so it can avoid reuse
 of pages that it just freed up.  But, it doesn’t take too long to react (

I've been thinking about this and maybe we need to make arc_memory_throttle()
more aggressive on FreeBSD.  I can't say that I really follow the logic of that
code, though.

 1sec in the profiling I’ve done).  Is there a way in vm_pageout_scan() that
 we can better record that progress is being made (pages were freed in the
 pass, even if some/all of them were consumed again) and allow more passes
 before the OOM killer is invoked in this case?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()

2014-10-16 Thread Andriy Gapon
On 16/10/2014 08:56, Justin T. Gibbs wrote:
 avg pointed out the rate limiting code in vm_pageout_scan() during discussion
 about PR 187594.  While it certainly can contribute to the problems discussed
 in that PR, a bigger problem is that it can allow the OOM killer to be
 triggered even though there is plenty of reclaimable memory available in the
 system.  Any load that can consume enough pages within the polling interval
 to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero
 of=/file/on/zfs') can make this happen.
 
 The product I’m working on does not have swap configured and treats any OOM
 trigger as fatal, so it is very obvious when this happens. :-)
 
 I’ve tried several things to mitigate the problem.  The first was to ignore
 rate limiting for pass 2.  However, even though ZFS is guaranteed to receive
 some feedback prior to OOM being declared, my testing showed that a trivial
 load (a couple dd operations) could still consume enough of the reclaimed
 space to leave the system below its target at the end of pass 2.  After
 removing the rate limiting entirely, I’ve so far been unable to kill the
 system via a ZFS induced load.
 
 I understand the motivation behind the rate limiting, but the current
 implementation seems too simplistic to be safe.  The documentation for the
 Solaris slab allocator provides good motivation for their approach of using a
 “sliding average” to reign in temporary bursts of usage without unduly
 harming efficient service for the recorded steady-state memory demand.
 Regardless of the approach taken, I believe that the OOM killer must be a
 last resort and shouldn’t be called when there are caches that can be
 culled.

FWIW, I have this toy branch:
https://github.com/avg-I/freebsd/compare/experiment/uma-cache-trimming

Not all commits are relevant to the problem and some things are unfinished.
Not sure if the changes would help your case either...

 One other thing I’ve noticed in my testing with ZFS is that it needs feedback
 and a little time to react to memory pressure.  Calling it’s lowmem handler
 just once isn’t enough for it to limit in-flight writes so it can avoid reuse
 of pages that it just freed up.  But, it doesn’t take too long to react (

I've been thinking about this and maybe we need to make arc_memory_throttle()
more aggressive on FreeBSD.  I can't say that I really follow the logic of that
code, though.

 1sec in the profiling I’ve done).  Is there a way in vm_pageout_scan() that
 we can better record that progress is being made (pages were freed in the
 pass, even if some/all of them were consumed again) and allow more passes
 before the OOM killer is invoked in this case?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Resizing a zpool as a VMware ESXi guest ...

2014-10-16 Thread Edward Tomasz Napierała
On 1010T1529, Matthew Grooms wrote:
 All,
 
 I am a long time user and advocate of FreeBSD and manage a several 
 deployments of FreeBSD in a few data centers. Now that these 
 environments are almost always virtual, it would make sense that FreeBSD 
 support for basic features such as dynamic disk resizing. It looks like 
 most of the parts are intended to work. Kudos to the FreeBSD foundation 
 for seeing the need and sponsoring dynamic increase of online UFS 
 filesystems via growfs. Unfortunately, it would appear that there are 
 still problems in this area, such as ...
 
 a) cam/geom recognizing when a drive's size has increased
 b) zpool recognizing when a gpt partition size has increased
 
 For example, if I do an install of FreeBSD 10 on VMware using ZFS, I see 
 the following ...
 
 root@zpool-test:~ #  gpart show
 =  34  16777149  da0  GPT  (8.0G)
  34  10241  freebsd-boot  (512K)
1058   41943042  freebsd-swap  (2.0G)
 4195362  125818213  freebsd-zfs  (6.0G)
 
 If I increase the VM disk size using VMware to 16G and rescan using 
 camcontrol, this is what I see ...

camcontrol rescan does not force fetching the updated disk size.
AFAIK there is no way to do that.  However, this should happen
automatically, if the other side properly sends proper Unit Attention
after resizing.  No idea why this doesn't happen with VMWare.
Reboot obviously clears things up.

[..]

 Now I want the claim the additional 14 gigs of space for my zpool ...
 
 root@zpool-test:~ # zpool status
pool: zroot
   state: ONLINE
scan: none requested
 config:
 
  NAME  STATE READ 
 WRITE CKSUM
  zroot ONLINE 0 0 0
gptid/352086bd-50b5-11e4-95b8-0050569b2a04  ONLINE 0 0 0
 
 root@zpool-test:~ # zpool set autoexpand=on zroot
 root@zpool-test:~ # zpool online -e zroot 
 gptid/352086bd-50b5-11e4-95b8-0050569b2a04
 root@zpool-test:~ # zpool list
 NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 zroot  5.97G   876M  5.11G14%  1.00x  ONLINE  -
 
 The zpool appears to still only have 5.11G free. Lets reboot and try 
 again ...

Interesting.  This used to work; actually either of those (autoexpand or
online -e) should do the trick.

 root@zpool-test:~ # zpool set autoexpand=on zroot
 root@zpool-test:~ # zpool online -e zroot 
 gptid/352086bd-50b5-11e4-95b8-0050569b2a04
 root@zpool-test:~ # zpool list
 NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 zroot  14.0G   876M  13.1G 6%  1.00x  ONLINE  -
 
 Now I have 13.1G free. I can add this space to any of my zfs volumes and 
 it picks the change up immediately. So the question remains, why do I 
 need to reboot the OS twice to allocate new disk space to a volume? 
 FreeBSD is first and foremost a server operating system. Servers are 
 commonly deployed in data centers. Virtual environments are now 
 commonplace in data centers, not the exception to the rule. VMware still 
 has the vast majority of the private virutal environment market. I 
 assume that most would expect things like this to work out of the box. 
 Did I miss a required step or is this fixed in CURRENT?

Looks like genuine bugs (or rather, one missing feature and one bug).
Filling PRs for those might be a good idea.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Resizing a zpool as a VMware ESXi guest ...

2014-10-16 Thread Garrett Cooper

 On Oct 16, 2014, at 1:10, Edward Tomasz Napierała tr...@freebsd.org wrote:

 camcontrol rescan does not force fetching the updated disk size.
 AFAIK there is no way to do that.  However, this should happen
 automatically, if the other side properly sends proper Unit Attention
 after resizing.  No idea why this doesn't happen with VMWare.
 Reboot obviously clears things up.
 
 [..]

Is open-vm-tools installed?

I ask because if I don't have it installed and the kernel modules loaded, 
VMware doesn't notify the guest OS of disks being added/removed.

Also, what disk controller are you using?

Cheers.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()

2014-10-16 Thread Steven Hartland

Unfortunately ZFS doesn't prevent new inflight writes until it
hits zfs_dirty_data_max, so while what your suggesting will
help, if the writes come in quick enough I would expect it to
still be able to out run the pageout.

- Original Message - 
From: Justin T. Gibbs gi...@freebsd.org

To: freebsd-current@freebsd.org
Cc: a...@freebsd.org; Andriy Gapon a...@freebsd.org
Sent: Thursday, October 16, 2014 6:56 AM
Subject: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()


avg pointed out the rate limiting code in vm_pageout_scan() during discussion about PR 187594.  While it certainly can contribute to 
the problems discussed in that PR, a bigger problem is that it can allow the OOM killer to be triggered even though there is plenty 
of reclaimable memory available in the system.  Any load that can consume enough pages within the polling interval to hit the 
v_free_min threshold (e.g. multiple 'dd if=/dev/zero of=/file/on/zfs') can make this happen.


The product I’m working on does not have swap configured and treats any OOM trigger as fatal, so it is very obvious when this 
happens. :-)


I’ve tried several things to mitigate the problem.  The first was to ignore rate limiting for pass 2.  However, even though ZFS is 
guaranteed to receive some feedback prior to OOM being declared, my testing showed that a trivial load (a couple dd operations) 
could still consume enough of the reclaimed space to leave the system below its target at the end of pass 2.  After removing the 
rate limiting entirely, I’ve so far been unable to kill the system via a ZFS induced load.


I understand the motivation behind the rate limiting, but the current implementation seems too simplistic to be safe.  The 
documentation for the Solaris slab allocator provides good motivation for their approach of using a “sliding average” to reign in 
temporary bursts of usage without unduly harming efficient service for the recorded steady-state memory demand.  Regardless of the 
approach taken, I believe that the OOM killer must be a last resort and shouldn’t be called when there are caches that can be 
culled.


One other thing I’ve noticed in my testing with ZFS is that it needs feedback and a little time to react to memory pressure. 
Calling it’s lowmem handler just once isn’t enough for it to limit in-flight writes so it can avoid reuse of pages that it just 
freed up.  But, it doesn’t take too long to react ( 1sec in the profiling I’ve done).  Is there a way in vm_pageout_scan() that we 
can better record that progress is being made (pages were freed in the pass, even if some/all of them were consumed again) and allow 
more passes before the OOM killer is invoked in this case?


—
Justin

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: installincludes, bsd.incs.mk and param.h

2014-10-16 Thread Harald Schmalzbauer
 Bezüglich Ian Lepore's Nachricht vom 14.10.2014 19:00 (localtime):

…
 The old code that used to work for you got the version via sysctl, so I
 was recommending that you keep doing that yourself, since it's no longer
 built in to bsd.ports.mk.  

 So just add export OSVERSION=`sysctl kern.osreldate` to your script
 that kicks off this update process, something like that.

Thank you for your support!

Like for many others, the former OSVERSION detection wasn't working very
well for me with jail environments (python broke e.g.). Therefore I had
worked arround it defferently, nonetheless I'm not happy with reverting
to the old behaviour.

Since /usr/include gets populated regardless if WITHOUT_TOOLCHAIN=true
was set in src.conf, I think it's a good idea to have the one param.h
also installed, regardless of the option.
Please see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194401

Thanks,

-Harry



signature.asc
Description: OpenPGP digital signature


Re: Resizing a zpool as a VMware ESXi guest ...

2014-10-16 Thread Michael Jung

On 2014-10-16 04:17, Garrett Cooper wrote:
On Oct 16, 2014, at 1:10, Edward Tomasz Napierała tr...@freebsd.org 
wrote:



camcontrol rescan does not force fetching the updated disk size.
AFAIK there is no way to do that.  However, this should happen
automatically, if the other side properly sends proper Unit 
Attention

after resizing.  No idea why this doesn't happen with VMWare.
Reboot obviously clears things up.

[..]


Is open-vm-tools installed?

I ask because if I don't have it installed and the kernel modules
loaded, VMware doesn't notify the guest OS of disks being
added/removed.

Also, what disk controller are you using?

Cheers.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to 
freebsd-current-unsubscr...@freebsd.org


I duplicated this behavior.  According to gpart The virtual disk does 
not grow

until the freebsd guest is rebooted.

FreeBSD freebsd10 10.0-RELEASE-p6 FreeBSD 10.0-RELEASE-p6 #0: Tue Jun 24 
07:47:37 UTC 2014

r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC

pkg info -- amd64 open-vm-tools-nox11-1280544_8,1 Open VMware tools for 
FreeBSD VMware guests

ESXi reported -- Running, version:2147483647 (3rd-party/Independent)

ESXi-5.5-1331820(A00) Guest Hardware version 10

789  -  S 0:00.54 /usr/local/bin/vmtoolsd -c 
/usr/local/share/vmware-tools/


Id Refs AddressSize Name

1   12 0x8020 15f03b0  kernel
21 0x81a12000 5209 fdescfs.ko
31 0x81a18000 2198 vmmemctl.ko
41 0x81a1b000 23d8 vmxnet.ko
51 0x81a1e000 2bf0 vmblock.ko
61 0x81a21000 81b4 vmhgfs.ko

--mikej
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: zfs recv hangs in kmem arena

2014-10-16 Thread James R. Van Artsdalen
The zfs recv / kmem arena hang happens with -CURRENT as well as
10-STABLE, on two different systems, with 16GB or 32GB of RAM, from
memstick or normal multi-user environments,

Hangs usually seem to hapeen 1TB to 3TB in, but last night one run hung
after only 4.35MB.

On 9/26/2014 1:42 AM, James R. Van Artsdalen wrote:
 FreeBSD BLACKIE.housenet.jrv 10.1-BETA2 FreeBSD 10.1-BETA2 #2 r272070M:
 Wed Sep 24 17:36:56 CDT 2014
 ja...@blackie.housenet.jrv:/usr/obj/usr/src/sys/GENERIC  amd64

 With current STABLE10 I am unable to replicate a ZFS pool using zfs
 send/recv without zfs hanging in state kmem arena, within the first
 4TB or so (of a 23TB Pool).

 The most recent attempt used this command line

 SUPERTEX:/root# zfs send -R BIGTEX/UNIX@syssnap | ssh BLACKIE zfs recv
 -duvF BIGTOX

 though local replications fail in kmem arena too.

 The two machines I've been attempting this on have 16BG and 32GB of RAM
 each and are otherwise idle.

 Any suggestions on how to get around, or investigate, kmem arena?

 # top
 last pid:  3272;  load averages:  0.22,  0.22,  0.23  up
 0+08:25:02  01:32:07
 34 processes:  1 running, 33 sleeping
 CPU:  0.0% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.9% idle
 Mem: 21M Active, 82M Inact, 15G Wired, 28M Cache, 450M Free
 ARC: 12G Total, 24M MFU, 12G MRU, 23M Anon, 216M Header, 47M Other
 Swap: 16G Total, 16G Free

   PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIMEWCPU
 COMMAND
  1173 root  1  520 86476K  7780K select  0 124:33   0.00% sshd
  1176 root  1  460 87276K 47732K kmem a  3  48:36   0.00% zfs
   968 root 32  200 12344K  1888K rpcsvc  0   0:13   0.00% nfsd
  1009 root  1  200 25452K  2864K select  3   0:01   0.00% ntpd
 ...

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT: EFI boot failure

2014-10-16 Thread Harald Schmalzbauer
 Bezüglich O. Hartmann's Nachricht vom 04.10.2014 08:47 (localtime):

…
 Sorry, forget the suggestion, it doesn't work since it leads to CFLAG
 -march= and the same problem occurs.
 For my case this works:
 --- sys/boot/efi/Makefile.inc.orig  2014-09-23 16:22:46.0 +0200
 +++ sys/boot/efi/Makefile.inc   2014-09-23 16:46:30.0 +0200
 @@ -2,6 +2,10 @@
  
  BINDIR?=   /boot
  
 +.if ${CPUTYPE} == core-avx2
 +CPUTYPE=   core-avx-i
 +.endif
 +
  .if ${MACHINE_CPUARCH} == i386
  CFLAGS+=-march=i386
  .endif

 JFI

 -Harry

 Has this problem anyhow seriously been addressed? I run into this very often 
 on several
 platforms with HAswell-based CPUs (other systems with IvyBridge or 
 SandyBridge are still
 to be migrated to UEFI boot, so I do not have any older architectures at hand 
 to proof
 whether this issue is still present or not on Non-AVX2 systems.

 If there is no progress so far, would it be well-advised to open a PR?

Unofrtunately I don't really have qualified knwoledge about compiler
optimazations nor any efi binary knwoledge.
Opening a PR is really needed, this issue shouldn't be left unchecked.
But I'd prefer if someone does it, who understands what Matt Fleming
answered in
http://lists.freebsd.org/pipermail/freebsd-current/2014-September/052354.html

Anyone?

Thanks,

-Harry



signature.asc
Description: OpenPGP digital signature


Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()

2014-10-16 Thread Andriy Gapon
On 16/10/2014 12:08, Steven Hartland wrote:
 Unfortunately ZFS doesn't prevent new inflight writes until it
 hits zfs_dirty_data_max, so while what your suggesting will
 help, if the writes come in quick enough I would expect it to
 still be able to out run the pageout.

As I've mentioned, arc_memory_throttle() also plays role in limiting the dirty 
data.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: zfs recv hangs in kmem arena

2014-10-16 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 10/16/14 4:25 AM, James R. Van Artsdalen wrote:
 The zfs recv / kmem arena hang happens with -CURRENT as well as 
 10-STABLE, on two different systems, with 16GB or 32GB of RAM,
 from memstick or normal multi-user environments,
 
 Hangs usually seem to hapeen 1TB to 3TB in, but last night one run
 hung after only 4.35MB.
 
 On 9/26/2014 1:42 AM, James R. Van Artsdalen wrote:
 FreeBSD BLACKIE.housenet.jrv 10.1-BETA2 FreeBSD 10.1-BETA2 #2
 r272070M: Wed Sep 24 17:36:56 CDT 2014 
 ja...@blackie.housenet.jrv:/usr/obj/usr/src/sys/GENERIC  amd64
 
 With current STABLE10 I am unable to replicate a ZFS pool using
 zfs send/recv without zfs hanging in state kmem arena, within
 the first 4TB or so (of a 23TB Pool).
 
 The most recent attempt used this command line
 
 SUPERTEX:/root# zfs send -R BIGTEX/UNIX@syssnap | ssh BLACKIE zfs
 recv -duvF BIGTOX
 
 though local replications fail in kmem arena too.
 
 The two machines I've been attempting this on have 16BG and 32GB
 of RAM each and are otherwise idle.
 
 Any suggestions on how to get around, or investigate, kmem
 arena?
 
 # top last pid:  3272;  load averages:  0.22,  0.22,  0.23
 up 0+08:25:02  01:32:07 34 processes:  1 running, 33 sleeping 
 CPU:  0.0% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.9%
 idle Mem: 21M Active, 82M Inact, 15G Wired, 28M Cache, 450M Free 
 ARC: 12G Total, 24M MFU, 12G MRU, 23M Anon, 216M Header, 47M
 Other Swap: 16G Total, 16G Free
 
 PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME
 WCPU COMMAND 1173 root  1  520 86476K  7780K select
 0 124:33   0.00% sshd 1176 root  1  460 87276K 47732K
 kmem a  3  48:36   0.00% zfs 968 root 32  200 12344K
 1888K rpcsvc  0   0:13   0.00% nfsd 1009 root  1  200
 25452K  2864K select  3   0:01   0.00% ntpd ...

What does procstat -kk 1176 (or the PID of your 'zfs' process that
stuck in that state) say?

Cheers,

-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJUP+5vAAoJEJW2GBstM+ns0v4P/31s7geR2j22etrRnfReUxbb
lbex0VkmLGm23TbTj2vpVce+ogBeA4zo6h4WzF/yYt2372MpWOfnEoVX2yOuuGku
AFapewXS3UMXLzaRWrdTWng1KQlOyQykAHI2rvQLlYlQNTLA5AbUm6uzNXaKpD8s
PbckREQ6wHnpZOiRcMN695QstjBNCal+XJHgvrwTfyp9vdFrPVD4UHnsN7MU6QSO
XobxOqbuw4Tq95mgYJqrjk+xEYMgzUy2zkVp2QTCBXZn3T3yroI2RcgUZQWaw5SO
xRegPa5jfJqcQJAdSxl8oVs9Sz8+5YDeksAnjCOxIQzLZBbNho+SOAzi+kjnT6W7
ijTc20z5eioQVPekdJ4MBweBsAeS1aGi8VWppuP+ZDLoirmxB0LaZyRv/W/HRQDD
j4CoZswkndh+J+9Crsa9SUkfNGNvVVNjhJUGyIfTGFUsMbWTAWwa4SMj7Ad04aqW
yhg+Ab4H3Yc14TahtX0jrhD3sTBer6ZoMFKE3tl8aStGXHVMyPkj0PHg5xjZEWL2
XGF86eoIgx03A9sIdbdHEZpyTMksfNatDXZk5XpPGF/sVd6txUoYP4Ch2wD8YRFM
O5Ny2r6ash2rZYmlyjf19n4gvKebdGo8d8NbzOJ3oYue6OI/88cu0rv6xLV9hHSF
fwgIbPo5uK4hIpEm0Dk4
=qY45
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[CFT] multiple instance support in rc.d script

2014-10-16 Thread Hiroki Sato
[Please reply to freebsd-rc@]

Hi,

 I would like your feedback and testers of the attached patch.  This
 implements multiple instance support in rc.d scripts.  You can try it
 by replacing /etc/rc.subr with the attached one.

 More details are as follow.  Typically, an rc.d/foo script has the
 following structure and rc.conf variables:

   /etc/rc.d/foo:
   
   name=foo
   rcvar=foo_enable
   ...
   load_rc_command $name
   run_rc_command $*
   

   /etc/rc.conf:
   
   foo_enable=YES
   foo_flags=-f -l -a -g
   

 The above supports one instance for one script.  After replacing
 rc.subr, you can specify additional instances in rc.conf:

   /etc/rc.conf:
   
   foo_instances=one two

   foo_one_enable=YES
   foo_one_flags=-f -l -a -g

   foo_two_enable=YES
   foo_two_flags=-F -L -A -G
   

 $foo_instances defines instances by space-separated list of instance
 names, and rc.conf variables for them are something like
 ${name}_${instname}_enable.  The following command

  # service foo start

 starts foo_one and foo_two with the specified flags.  Instances can
 be specified in the following form:

  # service foo start:one

 or multiple instances in a particular order:

  # service foo start:two,one

 Basically, no change is required for the rc.d/foo script itself.
 However, there is a problem that default values of the instantiated
 variables are not defined.

 For example, if an rc.d/script uses $foo_mode, you need to define
 $foo_one_mode.  The default value of $foo_mode is usually defined in
 etc/defaults/rc.conf for rc.d scripts in the base system and :
 ${foo_mode:=value} idiom in scripts from Ports Collection.  So all
 of the variables should be defined for each instance, too.  As you
 noticed, this is not easy without editing the script itself.

 To alleviate this, set_rcvar() can be used:

   /etc/rc.d/foo:
   
   name=foo
   rcvar=foo_enable

   set_rcvar foo_enable YES Enable $name
   set_rcvar foo_program/tmp/test Command for $name
   ...
   load_rc_command $name
   run_rc_command $*
   

 The three arguments are varname, default value, and description.  If
 a variable is defined by set_rcvar(), default values instantiated
 variables will be set automatically---foo_one_program is set by
 foo_program if it is not defined.

 This approach still has another problem.  set_rcvar() is not
 supported in all branches, so a script using it does not work in old
 supported branches.  One solution which can be used for scripts in
 Ports Collection is adding both definitions before and after
 load_rc_command() until EoL of old branches like this:

   /etc/rc.d/foo:
   
   name=foo
   rcvar=foo_enable

   if type set_rcvar /dev/null 21; then
set_rcvar foo_enableYES Enable $name
set_rcvar foo_program   /tmp/test Command for $name
   fi
   ...
   load_rc_command $name

   # will be removed after all supported branches have set_rcvar().
   if ! type set_rcvar /dev/null 21; then
: ${foo_enable:=YES}
: ${foo_program:=/tmp/test}
for _i in $foo_instances; do
for _j in enable program; do
eval : \${foo_${_i}_enable:=\$foo_$_j}
done
done
   fi

   run_rc_command $*
   

 This is a bit ugly but should work fine.

 I am using this patch to invoke multiple named (caching
 server/contents server) and syslogd (local only/listens INET/INET6
 socket only) daemons.  While $foo_instances is designed as a
 user-defined knob, this can be applied to software which need to
 invoke multiple/different daemons which depend on each other in a
 script, too.

 I am feeling this patch still needs more careful review from others.
 Any comments are welcome.  Thank you.

-- Hiroki
Index: etc/rc.subr
===
--- etc/rc.subr	(revision 272976)
+++ etc/rc.subr	(working copy)
@@ -698,7 +698,10 @@
 #		start stop restart rcvar status poll ${extra_commands}
 #	If there's a match, run ${argument}_cmd or the default method
 #	(see below).
+#	_run_rc_command0() is the main routine and run_rc_command() is
+#	a wrapper to handle multiple instances.
 #
+#
 #	If argument has a given prefix, then change the operation as follows:
 #		Prefix	Operation
 #		--	-
@@ -755,6 +758,9 @@
 #
 #	${name}_nice	n	Nice level to run ${command} at.
 #
+#	${name}_pidfile	n	This to be used in /etc/rc.conf to override
+#${pidfile}.
+#
 #	${name}_user	n	User to run ${command} as, using su(1) if not
 #using ${name}_chroot.
 #Requires /usr to be mounted.
@@ -863,6 +869,57 @@
 #
 run_rc_command()
 {
+	local _act _instances _name _desc _rcvar
+
+	_act=$1
+	shift
+	eval _instances=\$${name}_instances
+
+	# Check if instance is specified, e.g. start:instance,
+	case ${_act%:*} in
+	$_act)	;;			# no instance specified
+	*)
+		_instances=$(echo ${_act#*:} | tr ,  )
+		_act=${_act%:*}
+	;;
+	esac
+
+	# Use 

Re: [CFT] multiple instance support in rc.d script

2014-10-16 Thread Allan Jude
On 2014-10-16 21:22, Hiroki Sato wrote:
 [Please reply to freebsd-rc@]
 
 Hi,
 
  I would like your feedback and testers of the attached patch.  This
  implements multiple instance support in rc.d scripts.  You can try it
  by replacing /etc/rc.subr with the attached one.
 
  More details are as follow.  Typically, an rc.d/foo script has the
  following structure and rc.conf variables:
 
/etc/rc.d/foo:

name=foo
rcvar=foo_enable
...
load_rc_command $name
run_rc_command $*

 
/etc/rc.conf:

foo_enable=YES
foo_flags=-f -l -a -g

 
  The above supports one instance for one script.  After replacing
  rc.subr, you can specify additional instances in rc.conf:
 
/etc/rc.conf:

foo_instances=one two
 
foo_one_enable=YES
foo_one_flags=-f -l -a -g
 
foo_two_enable=YES
foo_two_flags=-F -L -A -G

 
  $foo_instances defines instances by space-separated list of instance
  names, and rc.conf variables for them are something like
  ${name}_${instname}_enable.  The following command
 
   # service foo start
 
  starts foo_one and foo_two with the specified flags.  Instances can
  be specified in the following form:
 
   # service foo start:one
 
  or multiple instances in a particular order:
 
   # service foo start:two,one
 
  Basically, no change is required for the rc.d/foo script itself.
  However, there is a problem that default values of the instantiated
  variables are not defined.
 
  For example, if an rc.d/script uses $foo_mode, you need to define
  $foo_one_mode.  The default value of $foo_mode is usually defined in
  etc/defaults/rc.conf for rc.d scripts in the base system and :
  ${foo_mode:=value} idiom in scripts from Ports Collection.  So all
  of the variables should be defined for each instance, too.  As you
  noticed, this is not easy without editing the script itself.
 
  To alleviate this, set_rcvar() can be used:
 
/etc/rc.d/foo:

name=foo
rcvar=foo_enable
 
set_rcvar foo_enable   YES Enable $name
set_rcvar foo_program  /tmp/test Command for $name
...
load_rc_command $name
run_rc_command $*

 
  The three arguments are varname, default value, and description.  If
  a variable is defined by set_rcvar(), default values instantiated
  variables will be set automatically---foo_one_program is set by
  foo_program if it is not defined.
 
  This approach still has another problem.  set_rcvar() is not
  supported in all branches, so a script using it does not work in old
  supported branches.  One solution which can be used for scripts in
  Ports Collection is adding both definitions before and after
  load_rc_command() until EoL of old branches like this:
 
/etc/rc.d/foo:

name=foo
rcvar=foo_enable
 
if type set_rcvar /dev/null 21; then
   set_rcvar foo_enableYES Enable $name
   set_rcvar foo_program   /tmp/test Command for $name
fi
...
load_rc_command $name
 
# will be removed after all supported branches have set_rcvar().
if ! type set_rcvar /dev/null 21; then
   : ${foo_enable:=YES}
   : ${foo_program:=/tmp/test}
   for _i in $foo_instances; do
   for _j in enable program; do
   eval : \${foo_${_i}_enable:=\$foo_$_j}
   done
   done
fi
 
run_rc_command $*

 
  This is a bit ugly but should work fine.
 
  I am using this patch to invoke multiple named (caching
  server/contents server) and syslogd (local only/listens INET/INET6
  socket only) daemons.  While $foo_instances is designed as a
  user-defined knob, this can be applied to software which need to
  invoke multiple/different daemons which depend on each other in a
  script, too.
 
  I am feeling this patch still needs more careful review from others.
  Any comments are welcome.  Thank you.
 
 -- Hiroki
 

This feature is quite useful. I've used the built in version that the
rc.d script in memcached and it is very helpful to be able to run
multiple named instances.

I wonder if sysrc could be improved to support an 'append', so you can have:

foo_instances=one two

and do:
sysrc --append foo_instances=three

to get:
foo_instances=one two three

instead of having to do:

sysrc foo_instances=`sysrc -n foo_instances` three

or something more convoluted

-- 
Allan Jude



signature.asc
Description: OpenPGP digital signature


Re: zfs recv hangs in kmem arena

2014-10-16 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 10/16/14 8:43 PM, James R. Van Artsdalen wrote:
 On 10/16/2014 11:12 AM, Xin Li wrote:
 On 9/26/2014 1:42 AM, James R. Van Artsdalen wrote:
 FreeBSD BLACKIE.housenet.jrv 10.1-BETA2 FreeBSD 10.1-BETA2
 #2 r272070M: Wed Sep 24 17:36:56 CDT 2014 
 ja...@blackie.housenet.jrv:/usr/obj/usr/src/sys/GENERIC
 amd64
 
 With current STABLE10 I am unable to replicate a ZFS pool
 using zfs send/recv without zfs hanging in state kmem
 arena, within the first 4TB or so (of a 23TB Pool).
 
 What does procstat -kk 1176 (or the PID of your 'zfs' process
 that stuck in that state) say?
 
 Cheers,
 
 SUPERTEX:/root# ps -lp 866 UID PID PPID CPU PRI NI   VSZ   RSS
 MWCHAN   STAT TT  TIME COMMAND 0 866  863   0  52  0 66800
 29716 kmem are D+1  57:40.82 zfs recv -duvF BIGTOX 
 SUPERTEX:/root# procstat -kk 866 PIDTID COMM TDNAME
 KSTACK 866 101573 zfs  -mi_switch+0xe1 
 sleepq_wait+0x3a _cv_wait+0x16d vmem_xalloc+0x568 vmem_alloc+0x3d 
 kmem_malloc+0x33 keg_alloc_slab+0xcd keg_fetch_slab+0x151 
 zone_fetch_slab+0x7e zone_import+0x40 uma_zalloc_arg+0x34e 
 arc_get_data_buf+0x31a arc_buf_alloc+0xaa dmu_buf_will_fill+0x169 
 dmu_write+0xfc dmu_recv_stream+0xd40 zfs_ioc_recv+0x94e 
 zfsdev_ioctl+0x5ca

Do you have any special tuning in your /boot/loader.conf?

Cheers,

-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJUQJazAAoJEJW2GBstM+ns6dQQAK4NM6X40d7tS7pqoTQvZbrD
U0u5kid703tWgAlSFzvORxeOEB94BxcHu/z1a68nGhUlL2kip8SirWV9A1rqBpes
i4T6asHYTcFj4OvaPfSoA7lSVsZIaLK+RscraN1b7hehSG9UExeYF8D7cRIguhoa
1Gnlv5iZZkjJZGjR0R6DmxC8C1CyNxAZBXnj1L+ofpgUzqH0Rw2TCW1XVKqMcxvI
5lpt+V0uu7MPNgjzgVy/1z5ZD2SUBPco0eHuN/Npj0c6HkmHkoWqd53vxrBhlyCP
eDbzLw7QTO7PaV5hAuC9y9/X1JGlmTVa0GP2irKuE5t1bAbVwUPQqpn+iiFs1Le8
34fL/jkCeSBY6voYYj100CBU1/1mZOh93wuY6FdMVWPJp/bsjbDUtKZUmosGU86j
ZMikfVNl5Jc5dmH30JGFCDOWzfaFq+V34toSfYIihaBQPyFov0Mr7De5MvQ7VJ7D
qiXDcfAXE99CXzAboYpruwrbxyxTqhUmXlWp2uCPqvmo0WhVUsROmhhXhWXkG3tS
S7L0n4X8kgklveirZWq/oDsg4JXNTP2ernNdAYyhD7TbG/N4INdFaVuqZkDVDgny
ibwY0HEzg2zskJOJBqr9a21fZx6c2dvJ1n+j5BaAq6ve2Hw2NyvUVWfMTknp4I8j
t/JJtsDNs9xokH/veS3J
=aBKI
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org