Hi,

I'm migrating a webserver(apache+php) from RHEL to solaris. During the
stress testing comparison, I found under the same session number of client
request, CPU% is ~70% on RHEL while CPU% is full on solaris.

After some investigation, zfs root lock emerges as a major doubtful point.

Firstly, there are a bunch of mutex events.
============================================
# lockstat -D 10 sleep 2

Adaptive mutex spin: 1862701 events in 3.678 seconds (506499 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
829064  45%  45% 0.00    33280 0xffffff117624a5d0     rrw_enter_read+0x1b
705001  38%  82% 0.00    30983 0xffffff117624a5d0     rrw_exit+0x1d
140678   8%  90% 0.00    10546 0xffffff117624a6e0     zfs_zget+0x46
37208   2%  92% 0.00     5403 0xffffff114b136840     vn_rele+0x1e
33926   2%  94% 0.00     5417 0xffffff114b136840     lookuppnatcred+0xc5
27188   1%  95% 0.00     1155 vn_vfslocks_buckets+0xd980
vn_vfslocks_getlock+0x3b
11073   1%  96% 0.00     1639 vn_vfslocks_buckets+0x4600
vn_vfslocks_getlock+0x3b
 9321   1%  96% 0.00     1961 0xffffff114b82a680     dnlc_lookup+0x83
 6929   0%  97% 0.00     1590 0xffffff11573b8f28
zfs_fastaccesschk_execute+0x6a
 5746   0%  97% 0.00     5935 0xffffff114b136840     lookuppnvp+0x566
-------------------------------------------------------------------------------

If we look at the lock caller's stack, the following two are the hotspots.
==============================================================
# lockstat -D 10 -l 0xffffff117624a5d0 -s 15 sleep 2

Adaptive mutex spin: 865609 events in 3.237 seconds (267444 events/sec)
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller
51215   6%   6% 0.00   100023 0xffffff117624a5d0     rrw_exit+0x1d

nsec ------ Time Distribution ------ count     Stack
  256 |                               61        zfs_root+0x78
  512 |@                              1746      fsop_root+0x2d
 1024 |@@@                            5210      traverse+0x65
 2048 |@@@@@@@                        13464     kkkk+0x446
 4096 |@@@@@@@                        13553     lookuppnatcred+0x119
 8192 |@@@                            5355      lookupnameatcred+0x97
 16384 |@                              2403      lookupnameat+0x6b
 32768 |                               1331      cstatat_getvp+0x11e
 65536 |                               673       cstatat64_32+0x5d
131072 |                               961       fstatat64_32+0x4c
262144 |                               1259      _sys_sysenter_post_swapgs+0x149
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Caller
48494   6%  17% 0.00   145263 0xffffff117624a5d0     rrw_enter_read+0x1b

 nsec ------ Time Distribution ------ count     Stack
   256 |                               17        rrw_enter+0x2c
   512 |                               1120      zfs_root+0x3b
  1024 |@                              1718      fsop_root+0x2d
  2048 |@@                             4834      traverse+0x65
  4096 |@@@@@@@@@@@                    18569     lookuppnvp+0x446
  8192 |@@@@                           6620      lookuppnatcred+0x119
 16384 |@                              2929      lookupnameatcred+0x97
 32768 |@                              1635      lookupnameat+0x6b
 65536 |                               894       cstatat_getvp+0x11e
131072 |                               1249      cstatat64_32+0x5d
262144 |@                              1620      fstatat64_32+0x4c
524288 |@                              2474      _sys_sysenter_post_swapgs+0x149
============================================================

The apache root documentation is apparently in zfs dataset. It looks like each
file system operation will run into zfs root mutex contention.

Is this an expected behavior? If so, is there any zfs tunable to
address this issue?

Thanks,
-Aubrey
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to