Re: freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone
On 3/30/12 5:48 PM, Philip M. Gollucci wrote: After reading several sparse articles/post, I've come to the conclusion that FreeBSD doesn't do well with SWAP 32GB; however it does allow it. As such I decided to drop the swap to 8GB*2=16GB. Sadly that didn't help either after dropping kern.maxswzone back 2*thedefault which is apparently very near or the max you can up it and get more actual SWAPMETA space b/c of the limiting based on the number of total system pages. I'm still quite perplexed here. Please also the recent thread on -stable where someone has the same problem with ZFS/NFS. subject: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak That didn't help either. We will compare NAMEI next in addition to trying to tune the ZFS arch/meta. -- 1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70 3F8C 75B8 8FFB DB9B 8C1C Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354 Member, Apache Software Foundation Committer,FreeBSD Foundation Consultant, P6M7G8 Inc. Director Operations, Ridecharge Inc. Work like you don't need the money, love like you'll never get hurt, and dance like nobody's watching. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone
On 03/28/12 03:09, Philip M. Gollucci wrote: It works out to roughly 7.7GB from 32MB okay fine. If I double it, that should give me 15.4GB from 64MB (still not enough). If I 16x it that should give me 246GB from 512MB. Thats more my physical ram + swap. Oh well. After reading several sparse articles/post, I've come to the conclusion that FreeBSD doesn't do well with SWAP 32GB; however it does allow it. As such I decided to drop the swap to 8GB*2=16GB. Sadly that didn't help either after dropping kern.maxswzone back 2*thedefault which is apparently very near or the max you can up it and get more actual SWAPMETA space b/c of the limiting based on the number of total system pages. I'm still quite perplexed here. Please also the recent thread on -stable where someone has the same problem with ZFS/NFS. subject: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak -- 1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70 3F8C 75B8 8FFB DB9B 8C1C Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354 Member, Apache Software Foundation Committer,FreeBSD Foundation Consultant, P6M7G8 Inc. Director Operations, Ridecharge Inc. Work like you don't need the money, love like you'll never get hurt, and dance like nobody's watching. signature.asc Description: OpenPGP digital signature
Re: freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone
On 03/27/12 02:32, Philip M. Gollucci wrote: Some other tuning updates $ zfs set zfs:zfs_nocacheflush = 1 $ sysctl vfs.zfs.prefetch_disable=1 $ cat /etc/my.cnf skip-innodb-doublewrite innodb_flush_log_at_trx_commit=2 $ zfs set primarycache=metadata zmysqlD $ zfs set atime=off zmysqlD $ zfs set recordsize=16k zmysqlD but not on zmysqlL my next plan is to turn off tmpfs and use ZVOL swaps then to simply use just zroot/tmp as a normal dir. after that I'll drastically increase maxswzone. still hoping someone has already done this. None of that made a difference; however I haven't tried the ZVOL swaps yet b/c they're quite new and this after all production eventually. so I've been reading up on maxswzone. Its seems to me that nobody really understands it. Fortunately it isn't used very much, It works out to roughly 7.7GB from 32MB okay fine. If I double it, that should give me 15.4GB from 64MB (still not enough). If I 16x it that should give me 246GB from 512MB. Thats more my physical ram + swap. Oh well. I've seen John Baldwin write on lists o) you have another problem if the default isn't enough o) when it panics I pick up the crash dump swap info and do #blocks in use*totalswblocks/maxswzone o) setting it higher claims wired memory which can't be reused. tuning(7) is from the 4.x days and is useless here. something thats really confusing me is if the output from $ vmstat -z |grep solaris is relevant or the size of my swap itself or if by upping maxswzone I'm taking away too much from zfs in the long run. So tracing this below kern.maxswzone=536870912 # = 16*(32*1024*1024) vm.stats.vm.v_page_count: 24411488 n=12205744 ###n = cnt.v_page_count / 2; if (maxswzone n maxswzone / sizeof(struct swblock)) n = maxswzone / sizeof(struct swblock); struct swblock { struct swblock *swb_hnext; vm_object_t swb_object; vm_pindex_t swb_index; int swb_count; daddr_t swb_pages[SWAP_META_PAGES]; }; if this is 43.98 bytes then the conditional is true; however its not b/c the printf() message isn't written out below. if (n2 != n) printf(Swap zone entries reduced from %d to %d.\n, which means the initial allocation succeeds with n=12205744 and not maxswzone. ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP SWAPMETA: 288, 1864135, 0, 0, 0, 0, 0 So more than a little perplex by these size/limits and that none of its used on a system thats running out of it. subr_param.c: --- longmaxswzone; /* max swmeta KVA storage */ SYSCTL_LONG(_kern, OID_AUTO, maxswzone, CTLFLAG_RDTUN, maxswzone, 0, Maximum memory for swap metadata); #ifdef VM_SWZONE_SIZE_MAX maxswzone = VM_SWZONE_SIZE_MAX; #endif TUNABLE_LONG_FETCH(kern.maxswzone, maxswzone); param.h: /* * Ceiling on amount of swblock kva space, can be changed via * the kern.maxswzone /boot/loader.conf variable. */ #ifndef VM_SWZONE_SIZE_MAX #define VM_SWZONE_SIZE_MAX (32 * 1024 * 1024) #endif swap_pager.c: -- void swap_pager_swap_init(void) { int n, n2; //comments skipped nsw_cluster_max = min((MAXPHYS/PAGE_SIZE), MAX_PAGEOUT_CLUSTER); mtx_lock(pbuf_mtx); nsw_rcount = (nswbuf + 1) / 2; nsw_wcount_sync = (nswbuf + 3) / 4; nsw_wcount_async = 4; nsw_wcount_async_max = nsw_wcount_async; mtx_unlock(pbuf_mtx); /* * Initialize our zone. Right now I'm just guessing on the number * we need based on the number of pages in the system. Each swblock * can hold 16 pages, so this is probably overkill. This reservation * is typically limited to around 32MB by default. */ n = cnt.v_page_count / 2; if (maxswzone n maxswzone / sizeof(struct swblock)) n = maxswzone / sizeof(struct swblock); n2 = n; swap_zone = uma_zcreate(SWAPMETA, sizeof(struct swblock), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE | UMA_ZONE_VM); if (swap_zone == NULL) panic(failed to create swap_zone.); do { if (uma_zone_set_obj(swap_zone, swap_zone_obj, n)) break; /* * if the allocation failed, try a zone two thirds the * size of the previous attempt. */ n -= ((n + 2) / 3); } while (n 0); if (n2 != n) printf(Swap zone entries reduced from %d to %d.\n, n2, n); n2 = n; /* * Initialize our meta-data hash table. The swapper does not need to * be quite as efficient as the VM system, so we do not use an * oversized hash table. * * n: size of hash table, must be power of 2 *
freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone
/var/log/messages Mar 23 22:21:50 sabertooth kernel: swap zone exhausted, increase kern.maxswzone Mar 23 22:21:50 sabertooth kernel: pid 86697 (mysqld), uid 88, was killed: out of swap space how to repeat: $ mysql -ux file.sql (~150GB) worth basically, it slows down continually until it dies. IF you (suspend) the process in time it recovers some, but eventually you have to suspend it every 1s for ~3 minutes. The load is ~10 at this point. I've looked at top, ps, iostat, zpool iostat, vmstat -z, vmstat -m and I don't see anything wonky. I can provide more info on request. system description: $ df zmysqlD801G658G142G82%/var/db/mysql/data zmysqlL133G 26G107G20%/var/db/mysql/log its a 600GB innodb space, mysql has innodb_buffer_pool_size = 80GB about 1GB of data is MyISAM the rest is InnoDB The machine has 96GB of RAM $ cat /etc/fstab /dev/gpt/swap0 noneswapsw 0 0 /dev/gpt/swap1 noneswapsw 0 0 tmpfs /tmptmpfs rw 2 0 swapinfo -h will show %6 and %6 usage on the swap devices /tmp remains 5% used $ grep maxswzone /boot/loader.conf kern.maxswzone=67108864 ## double the default $ gpart show = 34 286749421 da3 GPT (136G) 341281 freebsd-boot (64k) 162 2013265922 freebsd-swap (96G) 201326754 854227013 freebsd-zfs (40G) = 34 286749421 da4 GPT (136G) 341281 freebsd-boot (64k) 162 2013265922 freebsd-swap (96G) 201326754 854227013 freebsd-zfs (40G) da[012] are SSDs, the rest are 15krpm $ zpool status pool: zmysqlD state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM zmysqlD ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10ONLINE 0 0 0 da11ONLINE 0 0 0 da12ONLINE 0 0 0 da13ONLINE 0 0 0 da14ONLINE 0 0 0 logs da0 ONLINE 0 0 0 cache da2 ONLINE 0 0 0 errors: No known data errors pool: zmysqlL state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM zmysqlL ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 cache da1 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 -- 1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70 3F8C 75B8 8FFB DB9B 8C1C Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354 Member, Apache Software Foundation Committer,FreeBSD Foundation Consultant, P6M7G8 Inc. Director Operations, Ridecharge Inc. Work like you don't need the money, love like you'll never get hurt, and dance like nobody's watching. signature.asc Description: OpenPGP digital signature
Re: freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone
Some other tuning updates $ zfs set zfs:zfs_nocacheflush = 1 $ sysctl vfs.zfs.prefetch_disable=1 $ cat /etc/my.cnf skip-innodb-doublewrite innodb_flush_log_at_trx_commit=2 $ zfs set primarycache=metadata zmysqlD $ zfs set atime=off zmysqlD $ zfs set recordsize=16k zmysqlD but not on zmysqlL my next plan is to turn off tmpfs and use ZVOL swaps then to simply use just zroot/tmp as a normal dir. after that I'll drastically increase maxswzone. still hoping someone has already done this. On 03/26/12 14:50, Philip M. Gollucci wrote: /var/log/messages Mar 23 22:21:50 sabertooth kernel: swap zone exhausted, increase kern.maxswzone Mar 23 22:21:50 sabertooth kernel: pid 86697 (mysqld), uid 88, was killed: out of swap space how to repeat: $ mysql -ux file.sql (~150GB) worth basically, it slows down continually until it dies. IF you (suspend) the process in time it recovers some, but eventually you have to suspend it every 1s for ~3 minutes. The load is ~10 at this point. I've looked at top, ps, iostat, zpool iostat, vmstat -z, vmstat -m and I don't see anything wonky. I can provide more info on request. system description: $ df zmysqlD801G658G142G82%/var/db/mysql/data zmysqlL133G 26G107G20%/var/db/mysql/log its a 600GB innodb space, mysql has innodb_buffer_pool_size = 80GB about 1GB of data is MyISAM the rest is InnoDB The machine has 96GB of RAM $ cat /etc/fstab /dev/gpt/swap0 noneswapsw 0 0 /dev/gpt/swap1 noneswapsw 0 0 tmpfs /tmptmpfs rw 2 0 swapinfo -h will show %6 and %6 usage on the swap devices /tmp remains 5% used $ grep maxswzone /boot/loader.conf kern.maxswzone=67108864 ## double the default $ gpart show = 34 286749421 da3 GPT (136G) 341281 freebsd-boot (64k) 162 2013265922 freebsd-swap (96G) 201326754 854227013 freebsd-zfs (40G) = 34 286749421 da4 GPT (136G) 341281 freebsd-boot (64k) 162 2013265922 freebsd-swap (96G) 201326754 854227013 freebsd-zfs (40G) da[012] are SSDs, the rest are 15krpm $ zpool status pool: zmysqlD state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM zmysqlD ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10ONLINE 0 0 0 da11ONLINE 0 0 0 da12ONLINE 0 0 0 da13ONLINE 0 0 0 da14ONLINE 0 0 0 logs da0 ONLINE 0 0 0 cache da2 ONLINE 0 0 0 errors: No known data errors pool: zmysqlL state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM zmysqlL ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 cache da1 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: none requested config: NAMESTATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 -- 1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70 3F8C 75B8 8FFB DB9B 8C1C Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354 Member, Apache Software Foundation Committer,FreeBSD Foundation Consultant, P6M7G8 Inc. Director Operations, Ridecharge Inc. Work like you don't need the money, love like you'll never get hurt, and dance like nobody's watching. signature.asc Description: OpenPGP digital signature