Hi

I have a FreeBSD 9 system with ZFS root.  It is actually a VM under Xen on a 
beefy piece of HW (4 core Sandy Bridge 3ghz Xeon, total HW memory 32GB -- VM 
has 4vcpus and 6GB RAM).  Mirrored gpart partitions.  I am looking for data 
integrity more than performance as long as performance is reasonable (which it 
has more than been the last 3 months).

The other "servers" on the same HW, the other VMs on the same, don't have this 
problem but are set up the same way.  There are 4 other FreeBSD VMs, one 
running email for a one man company and a few of his friends, as well as some 
static web pages and stuff for him, one runs a few low use web apps for various 
customers, and one runs about 30 websites with apache and nginx, mostly just 
static sites.  None are heavily used.  There is also one VM with linux running 
a couple low use FrontBase databases.   Not high use database -- low use ones.

The troubleseome VM  has been running fine for over 3 months since I installed 
it.    Level of use has been pretty much constant.   The server runs 4 jails on 
it, each dedicated to a different bit of email processing for a small number of 
users.   One is a secondary DNS.  One runs clamav and spamassassin.  One runs 
exim for incoming and outgoing mail.  One runs dovecot for imap and pop.   
There is no web server or database or anything else running.

Total number of mail users on the system is approximately 50, plus or minus.  
Total mail traffic is very low compared to "real" mail servers.

Earlier this week things started "freezing up".  It might last a few minutes, 
or it might last 1/2 hour.   Processes become unresponsive.  This can last a 
few minutes or much longer.  It eventually resolves itself and things are good 
for another 10 minutes or 3 hours until it happens again.  When it happens,  
lots of processes are listed in "top" as 

zfs
zio->i
zfs
tx->tx
db->db

state.   These processes only get listed in these states when there are 
problems.   What are these states indicative of?

Eventually things get going again, these states drop off and the system hums 
along.

Based on some stuff I found in Google (for a person who had a different but 
somewhat similar problem) I tried setting 

zfs set primarycache=metadata zroot

and

zfs set primarycache=none zroot

but the problem still happened with approximately the same severity and 
frequency.  (Wanted to see if the system was "churning" with cache upkeep).


What is strange is that this server ran fine for 3 months straight without 
interruption with the same level of work.

Thanks for any hints or clues
Chad



some data points below

---

# uname -a
FreeBSD newbagend 9.0-STABLE FreeBSD 9.0-STABLE #1: Wed Mar 21 15:22:14 MDT 
2012     chad@underhill:/usr/obj/usr/src/sys/UNDERHILL-XEN  amd64
# 

---

# zpool status
 pool: zroot
state: ONLINE
 scan: scrub repaired 0 in 6h13m with 0 errors on Fri Aug 10 19:33:23 2012
config:

        NAME                                            STATE     READ WRITE 
CKSUM
        zroot                                           ONLINE       0     0    
 0
          mirror-0                                      ONLINE       0     0    
 0
            gptid/f0da8263-8a52-11e1-b3ae-aa00003efccd  ONLINE       0     0    
 0
            gptid/0f24ab58-8a53-11e1-b3ae-aa00003efccd  ONLINE       0     0    
 0

errors: No known data errors
#

---

representative data from doing a stats during a trouble period

zfs-stats  -a


------------------------------------------------------------------------
ZFS Subsystem Report                            Sat Aug 11 13:40:07 2012
------------------------------------------------------------------------

System Information:

        Kernel Version:                         900505 (osreldate)
        Hardware Platform:                      amd64
        Processor Architecture:                 amd64

        ZFS Storage pool Version:               28
        ZFS Filesystem Version:                 5

FreeBSD 9.0-STABLE #1: Wed Mar 21 15:22:14 MDT 2012 chad
1:40PM  up  2:54, 3 users, load averages: 0.23, 0.19, 0.14

------------------------------------------------------------------------

System Memory:

        11.49%  681.92  MiB Active,     4.03%   238.97  MiB Inact
        33.37%  1.93    GiB Wired,      0.05%   3.04    MiB Cache
        51.04%  2.96    GiB Free,       0.01%   808.00  KiB Gap

        Real Installed:                         6.00    GiB
        Real Available:                 99.65%  5.98    GiB
        Real Managed:                   96.93%  5.80    GiB

        Logical Total:                          6.00    GiB
        Logical Used:                   46.76%  2.81    GiB
        Logical Free:                   53.24%  3.19    GiB

Kernel Memory:                                  1.25    GiB
        Data:                           98.38%  1.23    GiB
        Text:                           1.62%   20.75   MiB

Kernel Memory Map:                              5.68    GiB
        Size:                           17.27%  1003.75 MiB
        Free:                           82.73%  4.70    GiB

------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                9
        Recycle Misses:                         64.30k
        Mutex Misses:                           10
        Evict Skips:                            58.80k

ARC Size:                               39.98%  1.20    GiB
        Target Size: (Adaptive)         100.00% 3.00    GiB
        Min Size (Hard Limit):          12.50%  384.00  MiB
        Max Size (High Water):          8:1     3.00    GiB

ARC Size Breakdown:
        Recently Used Cache Size:       25.56%  785.15  MiB
        Frequently Used Cache Size:     74.44%  2.23    GiB

ARC Hash Breakdown:
        Elements Max:                           223.30k
        Elements Current:               99.93%  223.15k
        Collisions:                             418.23k
        Chain Max:                              9
        Chains:                                 66.67k

------------------------------------------------------------------------

ARC Efficiency:                                 3.17m
        Cache Hit Ratio:                89.07%  2.82m
        Cache Miss Ratio:               10.93%  346.27k
        Actual Hit Ratio:               86.49%  2.74m

        Data Demand Efficiency:         99.50%  1.09m
        Data Prefetch Efficiency:       60.54%  1.78k

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           23.72%  669.34k
          Most Frequently Used:         73.38%  2.07m
          Most Recently Used Ghost:     1.92%   54.33k
          Most Frequently Used Ghost:   3.30%   93.02k

        CACHE HITS BY DATA TYPE:
          Demand Data:                  38.35%  1.08m
          Prefetch Data:                0.04%   1.08k
          Demand Metadata:              58.75%  1.66m
          Prefetch Metadata:            2.87%   80.97k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  1.56%   5.39k
          Prefetch Data:                0.20%   704
          Demand Metadata:              55.46%  192.02k
          Prefetch Metadata:            42.78%  148.15k

------------------------------------------------------------------------

L2ARC is disabled

------------------------------------------------------------------------

File-Level Prefetch: (HEALTHY)

DMU Efficiency:                                 6.05m
        Hit Ratio:                      66.59%  4.03m
        Miss Ratio:                     33.41%  2.02m

        Colinear:                               2.02m
          Hit Ratio:                    0.04%   725
          Miss Ratio:                   99.96%  2.02m

        Stride:                                 3.90m
          Hit Ratio:                    99.98%  3.90m
          Miss Ratio:                   0.02%   826

DMU Misc:
        Reclaim:                                2.02m
          Successes:                    2.02%   40.86k
          Failures:                     97.98%  1.98m

        Streams:                                125.81k
          +Resets:                      0.36%   453
          -Resets:                      99.64%  125.36k
          Bogus:                                0

------------------------------------------------------------------------

VDEV Cache Summary:                             530.68k
        Hit Ratio:                      15.30%  81.21k
        Miss Ratio:                     70.40%  373.57k
        Delegations:                    14.30%  75.89k

------------------------------------------------------------------------

ZFS Tunables (sysctl):
        kern.maxusers                           512
        vm.kmem_size                            6222712832
        vm.kmem_size_scale                      1
        vm.kmem_size_min                        0
        vm.kmem_size_max                        329853485875
        vfs.zfs.l2c_only_size                   0
        vfs.zfs.mfu_ghost_data_lsize            91367424
        vfs.zfs.mfu_ghost_metadata_lsize        128350208
        vfs.zfs.mfu_ghost_size                  219717632
        vfs.zfs.mfu_data_lsize                  132299264
        vfs.zfs.mfu_metadata_lsize              20034048
        vfs.zfs.mfu_size                        160949760
        vfs.zfs.mru_ghost_data_lsize            45155328
        vfs.zfs.mru_ghost_metadata_lsize        642998784
        vfs.zfs.mru_ghost_size                  688154112
        vfs.zfs.mru_data_lsize                  347115520
        vfs.zfs.mru_metadata_lsize              10907136
        vfs.zfs.mru_size                        794174976
        vfs.zfs.anon_data_lsize                 0
        vfs.zfs.anon_metadata_lsize             0
        vfs.zfs.anon_size                       29469696
        vfs.zfs.l2arc_norw                      1
        vfs.zfs.l2arc_feed_again                1
        vfs.zfs.l2arc_noprefetch                1
        vfs.zfs.l2arc_feed_min_ms               200
        vfs.zfs.l2arc_feed_secs                 1
        vfs.zfs.l2arc_headroom                  2
        vfs.zfs.l2arc_write_boost               8388608
        vfs.zfs.l2arc_write_max                 8388608
        vfs.zfs.arc_meta_limit                  805306368
        vfs.zfs.arc_meta_used                   805310296
        vfs.zfs.arc_min                         402653184
        vfs.zfs.arc_max                         3221225472
        vfs.zfs.dedup.prefetch                  1
        vfs.zfs.mdcomp_disable                  0
        vfs.zfs.write_limit_override            0
        vfs.zfs.write_limit_inflated            19260174336
        vfs.zfs.write_limit_max                 802507264
        vfs.zfs.write_limit_min                 33554432
        vfs.zfs.write_limit_shift               3
        vfs.zfs.no_write_throttle               0
        vfs.zfs.zfetch.array_rd_sz              1048576
        vfs.zfs.zfetch.block_cap                256
        vfs.zfs.zfetch.min_sec_reap             2
        vfs.zfs.zfetch.max_streams              8
        vfs.zfs.prefetch_disable                0
        vfs.zfs.mg_alloc_failures               8
        vfs.zfs.check_hostid                    1
        vfs.zfs.recover                         0
        vfs.zfs.txg.synctime_ms                 1000
        vfs.zfs.txg.timeout                     5
        vfs.zfs.scrub_limit                     10
        vfs.zfs.vdev.cache.bshift               16
        vfs.zfs.vdev.cache.size                 10485760
        vfs.zfs.vdev.cache.max                  16384
        vfs.zfs.vdev.write_gap_limit            4096
        vfs.zfs.vdev.read_gap_limit             32768
        vfs.zfs.vdev.aggregation_limit          131072
        vfs.zfs.vdev.ramp_rate                  2
        vfs.zfs.vdev.time_shift                 6
        vfs.zfs.vdev.min_pending                4
        vfs.zfs.vdev.max_pending                10
        vfs.zfs.vdev.bio_flush_disable          0
        vfs.zfs.cache_flush_disable             0
        vfs.zfs.zil_replay_disable              0
        vfs.zfs.zio.use_uma                     0
        vfs.zfs.snapshot_list_prefetch          0
        vfs.zfs.version.zpl                     5
        vfs.zfs.version.spa                     28
        vfs.zfs.version.acl                     1
        vfs.zfs.debug                           0
        vfs.zfs.super_owner                     0

------------------------


representative (from during a trouble period -- you see not much is going on -- 
low load and the iostat during a calm good period is about the same)

zpool iostat zroot 1


              capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----

zroot        107G  41.9G      7    261  23.8K  1.52M
zroot        107G  41.9G     10    140  7.42K   272K
zroot        107G  41.9G      8    176  14.4K   547K
zroot        107G  41.9G      0     59      0   188K
zroot        107G  41.9G      5    171  6.44K  1.73M
zroot        107G  41.9G      4    284  8.42K  1006K
zroot        107G  41.9G      5    118  2.97K   260K
zroot        107G  41.9G     25    194  27.7K   623K
zroot        107G  41.9G      0    132      0   764K
zroot        107G  41.9G      1     95  6.44K  1.16M
zroot        107G  41.9G      8    272  16.3K   829K
zroot        107G  41.9G     56    212   103K   213K
zroot        107G  41.9G     22    221  27.7K   204K
zroot        107G  41.9G      2    455  1.48K   509K
zroot        107G  41.9G     14    198  7.42K   132K
zroot        107G  41.9G     14    270  7.42K   306K
zroot        107G  41.9G      6    273  3.46K   670K
zroot        107G  41.9G     21    175  10.9K   570K
zroot        107G  41.9G     17    179  8.91K   591K
zroot        107G  41.9G     11    289  17.3K   902K
zroot        107G  41.9G     13    121  6.93K   230K
zroot        107G  41.9G     18    238  9.41K   734K
zroot        107G  41.9G     99     61  50.5K   188K
zroot        107G  41.9G      0    222      0   862K
zroot        107G  41.9G     11    149  13.4K  1.12M
zroot        107G  41.9G     15    319  10.9K  1.05M
zroot        107G  41.9G      0    127      0   392K
zroot        107G  41.9G      0    159      0  1.70M
zroot        107G  41.9G     68    196   212K   601K
zroot        107G  41.9G     17    144  18.8K   295K
zroot        107G  41.9G     12    187  17.3K   588K
zroot        107G  41.9G      0    136      0  1.23M
zroot        107G  41.9G      6    209  23.8K   564K
zroot        107G  41.9G     11    199  12.4K   422K
zroot        107G  41.9G     12    178  9.41K   553K
zroot        107G  41.9G      0    140  1.48K  1.17M
zroot        107G  41.9G     48    200   128K   411K
zroot        107G  41.9G      8    191  16.8K   121K
zroot        107G  41.9G      1    397   1013   375K
zroot        107G  41.9G      0    263      0   132K
zroot        107G  41.9G     14    228  13.4K   235K
zroot        107G  41.9G      7     21  4.46K  10.9K
zroot        107G  41.9G      2    161  1.48K   156K


_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Reply via email to