Re: freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone

2012-04-01 Thread Philip M. Gollucci

On 3/30/12 5:48 PM, Philip M. Gollucci wrote:

After reading several sparse articles/post, I've come to the conclusion
that FreeBSD doesn't do well with SWAP  32GB; however it does allow it.
As such I decided to drop the swap to 8GB*2=16GB.  Sadly that didn't
help either after dropping kern.maxswzone back 2*thedefault which is
apparently very near or the max you can up it and get more actual
SWAPMETA space b/c of the limiting based on the number of total system
pages.

I'm still quite perplexed here.  Please also the recent thread on
-stable where someone has the same problem with ZFS/NFS.

subject: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak


That didn't help either.  We will compare NAMEI next in addition to 
trying to tune the ZFS arch/meta.




--

1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70  3F8C 75B8 8FFB DB9B 8C1C
Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354
Member,   Apache Software Foundation
Committer,FreeBSD Foundation
Consultant,   P6M7G8 Inc.
Director Operations,  Ridecharge Inc.

Work like you don't need the money,
love like you'll never get hurt,
and dance like nobody's watching.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone

2012-03-30 Thread Philip M. Gollucci
On 03/28/12 03:09, Philip M. Gollucci wrote:

 It works out to roughly 7.7GB from 32MB okay fine.
 If I double it, that should give me 15.4GB from 64MB (still not enough).
 If I 16x it that should give me 246GB from 512MB.  Thats more my
 physical ram + swap.  Oh well.

After reading several sparse articles/post, I've come to the conclusion
that FreeBSD doesn't do well with SWAP  32GB; however it does allow it.
As such I decided to drop the swap to 8GB*2=16GB.  Sadly that didn't
help either after dropping kern.maxswzone back 2*thedefault which is
apparently very near or the max you can up it and get more actual
SWAPMETA space b/c of the limiting based on the number of total system
pages.

I'm still quite perplexed here.  Please also the recent thread on
-stable where someone has the same problem with ZFS/NFS.

subject: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak



-- 

1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70  3F8C 75B8 8FFB DB9B 8C1C
Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354
Member,   Apache Software Foundation
Committer,FreeBSD Foundation
Consultant,   P6M7G8 Inc.
Director Operations,  Ridecharge Inc.

Work like you don't need the money,
love like you'll never get hurt,
and dance like nobody's watching.



signature.asc
Description: OpenPGP digital signature


Re: freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone

2012-03-27 Thread Philip M. Gollucci
On 03/27/12 02:32, Philip M. Gollucci wrote:
 Some other tuning updates
 
 $ zfs set zfs:zfs_nocacheflush = 1
 $ sysctl vfs.zfs.prefetch_disable=1
 
 $ cat /etc/my.cnf
 skip-innodb-doublewrite
 innodb_flush_log_at_trx_commit=2
 
 
 $ zfs set primarycache=metadata zmysqlD
 $ zfs set atime=off zmysqlD
 $ zfs set recordsize=16k zmysqlD
 
 but not on zmysqlL
 
 my next plan is to turn off tmpfs and use ZVOL swaps then to simply use
 just zroot/tmp as a normal dir.
 
 after that I'll drastically increase maxswzone.
 
 still hoping someone has already done this.

None of that made a difference; however I haven't tried the ZVOL swaps
yet b/c they're quite new and this after all production eventually.

so I've been reading up on maxswzone.  Its seems to me that nobody
really understands it.

Fortunately it isn't used very much,

It works out to roughly 7.7GB from 32MB okay fine.
If I double it, that should give me 15.4GB from 64MB (still not enough).
If I 16x it that should give me 246GB from 512MB.  Thats more my
physical ram + swap.  Oh well.


I've seen John Baldwin write on lists
o) you have another problem if the default isn't enough
o) when it panics I pick up the crash dump swap info and do
   #blocks in use*totalswblocks/maxswzone
o) setting it higher claims wired memory which can't be reused.

tuning(7) is from the 4.x days and is useless here.

something thats really confusing me is if the output from
 $ vmstat -z |grep solaris is relevant
 or the size of my swap itself

or if by upping maxswzone I'm taking away too much from zfs in the long run.

So tracing this below
kern.maxswzone=536870912 # = 16*(32*1024*1024)
vm.stats.vm.v_page_count: 24411488

n=12205744  ###n = cnt.v_page_count / 2;

if (maxswzone  n  maxswzone / sizeof(struct swblock))
  n = maxswzone / sizeof(struct swblock);

struct swblock {
struct swblock  *swb_hnext;
vm_object_t swb_object;
vm_pindex_t swb_index;
int swb_count;
daddr_t swb_pages[SWAP_META_PAGES];
};
if this is 43.98 bytes then the conditional is true; however its not
b/c the printf() message isn't written out below.
if (n2 != n)
printf(Swap zone entries reduced from %d to %d.\n,

which means the initial allocation succeeds with n=12205744 and not
maxswzone.

ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
SWAPMETA:   288, 1864135,   0,   0,   0,   0,   0

So more than a little perplex by these size/limits and that none of its
used on a system thats running out of it.








subr_param.c:
---
longmaxswzone;  /* max swmeta KVA storage */
SYSCTL_LONG(_kern, OID_AUTO, maxswzone, CTLFLAG_RDTUN, maxswzone, 0,
Maximum memory for swap metadata);
#ifdef VM_SWZONE_SIZE_MAX
maxswzone = VM_SWZONE_SIZE_MAX;
#endif
TUNABLE_LONG_FETCH(kern.maxswzone, maxswzone);

param.h:

/*
 * Ceiling on amount of swblock kva space, can be changed via
 * the kern.maxswzone /boot/loader.conf variable.
 */
#ifndef VM_SWZONE_SIZE_MAX
#define VM_SWZONE_SIZE_MAX  (32 * 1024 * 1024)
#endif

swap_pager.c:
--
void
swap_pager_swap_init(void)
{
int n, n2;
//comments skipped
nsw_cluster_max = min((MAXPHYS/PAGE_SIZE), MAX_PAGEOUT_CLUSTER);

mtx_lock(pbuf_mtx);
nsw_rcount = (nswbuf + 1) / 2;
nsw_wcount_sync = (nswbuf + 3) / 4;
nsw_wcount_async = 4;
nsw_wcount_async_max = nsw_wcount_async;
mtx_unlock(pbuf_mtx);
/*
 * Initialize our zone.  Right now I'm just guessing on the number
 * we need based on the number of pages in the system.  Each swblock
 * can hold 16 pages, so this is probably overkill.  This reservation
 * is typically limited to around 32MB by default.
 */
n = cnt.v_page_count / 2;
if (maxswzone  n  maxswzone / sizeof(struct swblock))
n = maxswzone / sizeof(struct swblock);
n2 = n;
swap_zone = uma_zcreate(SWAPMETA, sizeof(struct swblock), NULL, NULL,
NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE | UMA_ZONE_VM);
if (swap_zone == NULL)
panic(failed to create swap_zone.);
do {
if (uma_zone_set_obj(swap_zone, swap_zone_obj, n))
break;
/*
 * if the allocation failed, try a zone two thirds the
 * size of the previous attempt.
 */
n -= ((n + 2) / 3);
} while (n  0);
if (n2 != n)
printf(Swap zone entries reduced from %d to %d.\n, n2, n);
n2 = n;

/*
 * Initialize our meta-data hash table.  The swapper does not need to
 * be quite as efficient as the VM system, so we do not use an
 * oversized hash table.
 *
 *  n:  size of hash table, must be power of 2
 *  

freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone

2012-03-26 Thread Philip M. Gollucci
/var/log/messages
Mar 23 22:21:50 sabertooth kernel: swap zone exhausted, increase
kern.maxswzone
Mar 23 22:21:50 sabertooth kernel: pid 86697 (mysqld), uid 88, was
killed: out of swap space

how to repeat:
$ mysql -ux  file.sql (~150GB) worth

basically, it slows down continually until it dies.  IF you (suspend)
the process in time it recovers some, but eventually you have to suspend
it every 1s for ~3 minutes.  The load is ~10 at this point.

I've looked at top, ps, iostat, zpool iostat, vmstat -z, vmstat -m
and I don't see anything wonky.  I can provide more info on request.

system description:

$ df
zmysqlD801G658G142G82%/var/db/mysql/data
zmysqlL133G 26G107G20%/var/db/mysql/log

its a 600GB innodb space, mysql has
innodb_buffer_pool_size = 80GB
about 1GB of data is MyISAM the rest is InnoDB

The machine has 96GB of RAM

$ cat /etc/fstab
/dev/gpt/swap0  noneswapsw  0   0
/dev/gpt/swap1  noneswapsw  0   0

tmpfs   /tmptmpfs   rw  2   0

swapinfo -h will show %6 and %6 usage on the swap devices
/tmp remains  5% used

$ grep maxswzone /boot/loader.conf
kern.maxswzone=67108864  ## double the default

$ gpart show
=   34  286749421  da3  GPT  (136G)
 341281  freebsd-boot  (64k)
162  2013265922  freebsd-swap  (96G)
  201326754   854227013  freebsd-zfs  (40G)

=   34  286749421  da4  GPT  (136G)
 341281  freebsd-boot  (64k)
162  2013265922  freebsd-swap  (96G)
  201326754   854227013  freebsd-zfs  (40G)

da[012] are SSDs, the rest are 15krpm

$ zpool status
  pool: zmysqlD
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
zmysqlD ONLINE   0 0 0
  raidz2-0  ONLINE   0 0 0
da7 ONLINE   0 0 0
da8 ONLINE   0 0 0
da9 ONLINE   0 0 0
da10ONLINE   0 0 0
da11ONLINE   0 0 0
da12ONLINE   0 0 0
da13ONLINE   0 0 0
da14ONLINE   0 0 0
logs
  da0   ONLINE   0 0 0
cache
  da2   ONLINE   0 0 0

errors: No known data errors

  pool: zmysqlL
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
zmysqlL ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
da5 ONLINE   0 0 0
da6 ONLINE   0 0 0
cache
  da1   ONLINE   0 0 0

errors: No known data errors

  pool: zroot
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
zroot   ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
da3p3   ONLINE   0 0 0
da4p3   ONLINE   0 0 0



-- 

1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70  3F8C 75B8 8FFB DB9B 8C1C
Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354
Member,   Apache Software Foundation
Committer,FreeBSD Foundation
Consultant,   P6M7G8 Inc.
Director Operations,  Ridecharge Inc.

Work like you don't need the money,
love like you'll never get hurt,
and dance like nobody's watching.



signature.asc
Description: OpenPGP digital signature


Re: freebsd 9.0-release + zfs + mysqld(percona) = kernel: swap zone exhausted, increase kern.maxswzone

2012-03-26 Thread Philip M. Gollucci
Some other tuning updates

$ zfs set zfs:zfs_nocacheflush = 1
$ sysctl vfs.zfs.prefetch_disable=1

$ cat /etc/my.cnf
skip-innodb-doublewrite
innodb_flush_log_at_trx_commit=2


$ zfs set primarycache=metadata zmysqlD
$ zfs set atime=off zmysqlD
$ zfs set recordsize=16k zmysqlD

but not on zmysqlL

my next plan is to turn off tmpfs and use ZVOL swaps then to simply use
just zroot/tmp as a normal dir.

after that I'll drastically increase maxswzone.

still hoping someone has already done this.


On 03/26/12 14:50, Philip M. Gollucci wrote:
 /var/log/messages
 Mar 23 22:21:50 sabertooth kernel: swap zone exhausted, increase
 kern.maxswzone
 Mar 23 22:21:50 sabertooth kernel: pid 86697 (mysqld), uid 88, was
 killed: out of swap space
 
 how to repeat:
 $ mysql -ux  file.sql (~150GB) worth
 
 basically, it slows down continually until it dies.  IF you (suspend)
 the process in time it recovers some, but eventually you have to suspend
 it every 1s for ~3 minutes.  The load is ~10 at this point.
 
 I've looked at top, ps, iostat, zpool iostat, vmstat -z, vmstat -m
 and I don't see anything wonky.  I can provide more info on request.
 
 system description:
 
 $ df
 zmysqlD801G658G142G82%/var/db/mysql/data
 zmysqlL133G 26G107G20%/var/db/mysql/log
 
 its a 600GB innodb space, mysql has
 innodb_buffer_pool_size = 80GB
 about 1GB of data is MyISAM the rest is InnoDB
 
 The machine has 96GB of RAM
 
 $ cat /etc/fstab
 /dev/gpt/swap0  noneswapsw  0   0
 /dev/gpt/swap1  noneswapsw  0   0
 
 tmpfs   /tmptmpfs   rw  2   0
 
 swapinfo -h will show %6 and %6 usage on the swap devices
 /tmp remains  5% used
 
 $ grep maxswzone /boot/loader.conf
 kern.maxswzone=67108864  ## double the default
 
 $ gpart show
 =   34  286749421  da3  GPT  (136G)
  341281  freebsd-boot  (64k)
 162  2013265922  freebsd-swap  (96G)
   201326754   854227013  freebsd-zfs  (40G)
 
 =   34  286749421  da4  GPT  (136G)
  341281  freebsd-boot  (64k)
 162  2013265922  freebsd-swap  (96G)
   201326754   854227013  freebsd-zfs  (40G)
 
 da[012] are SSDs, the rest are 15krpm
 
 $ zpool status
   pool: zmysqlD
  state: ONLINE
  scan: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 zmysqlD ONLINE   0 0 0
   raidz2-0  ONLINE   0 0 0
 da7 ONLINE   0 0 0
 da8 ONLINE   0 0 0
 da9 ONLINE   0 0 0
 da10ONLINE   0 0 0
 da11ONLINE   0 0 0
 da12ONLINE   0 0 0
 da13ONLINE   0 0 0
 da14ONLINE   0 0 0
 logs
   da0   ONLINE   0 0 0
 cache
   da2   ONLINE   0 0 0
 
 errors: No known data errors
 
   pool: zmysqlL
  state: ONLINE
  scan: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 zmysqlL ONLINE   0 0 0
   mirror-0  ONLINE   0 0 0
 da5 ONLINE   0 0 0
 da6 ONLINE   0 0 0
 cache
   da1   ONLINE   0 0 0
 
 errors: No known data errors
 
   pool: zroot
  state: ONLINE
  scan: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 zroot   ONLINE   0 0 0
   mirror-0  ONLINE   0 0 0
 da3p3   ONLINE   0 0 0
 da4p3   ONLINE   0 0 0
 
 
 


-- 

1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70  3F8C 75B8 8FFB DB9B 8C1C
Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354
Member,   Apache Software Foundation
Committer,FreeBSD Foundation
Consultant,   P6M7G8 Inc.
Director Operations,  Ridecharge Inc.

Work like you don't need the money,
love like you'll never get hurt,
and dance like nobody's watching.



signature.asc
Description: OpenPGP digital signature