[osol-help] Massive zpool import delay

Dunc Sun, 19 Jul 2009 06:26:34 +0100

Hi,

I recently started to get huge delays when booting my opensolaris box,
since upgrading to 2009.06 although not for the first couple of boots
afterwards. Thanks to a pointer from someone in #opensolaris, I've
discovered it's the zpool import of my storage pool that is taking ages.



I re-installed again, and have captured everything that I know how to
below as I import the pool.

Initial ZFS status after install
--------------------------------
Last login: Sun Jul 19 05:01:35 2009
Sun Microsystems Inc.   SunOS 5.11      snv_111b        November 2008
dunc at prefect:~$ pfexec bash
dunc at prefect:~# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool    37G  3.90G  33.1G    10%  ONLINE  -
dunc at prefect:~# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
rpool                   4.74G  31.7G  77.5K  /rpool
rpool/ROOT              2.86G  31.7G    19K  legacy
rpool/ROOT/opensolaris  2.86G  31.7G  2.86G  /
rpool/dump               959M  31.7G   959M  -
rpool/export             890K  31.7G    21K  /export
rpool/export/home        869K  31.7G    21K  /export/home
rpool/export/home/dunc   848K  31.7G   848K  /export/home/dunc
rpool/swap               959M  32.5G   101M  -

Importing the pool
------------------
dunc at prefect:~# zpool import
  pool: store
    id: 6301690472664648097
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        store       ONLINE
          mirror    ONLINE
            c10d0   ONLINE
            c9d0    ONLINE
          mirror    ONLINE
            c11d0   ONLINE
            c12d0   ONLINE
dunc at prefect:~#

dunc at prefect:~# date && zpool import -f store && date
Sunday, 19 July 2009 05:07:19 BST
Sunday, 19 July 2009 06:05:53 BST
dunc at prefect:~#



As you can see the import took approximately 1 hour. I'm sure this can't
be right, but I don't know why it's happening.

While this was going on, devfsadm was busy the whole time, please see
below some captured output from top at intervals.


Just after I ran the import:-

load averages:  0.66,  0.45,  0.22;               up 0+00:04:36
05:07:59
54 processes: 52 sleeping, 1 running, 1 on cpu
CPU states: 16.0% idle, 78.2% user, 21.8% kernel,  0.0% iowait,  0.0% swap
Kernel: 1050 ctxsw, 319 trap, 498 intr, 8951 syscall, 235 flt
Memory: 1920M phys mem, 1078M free mem, 959M total swap, 959M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
    63 root        7  31    0   13M 5944K run      0:22 76.16% devfsadm
   472 root       19  59    0   35M   29M sleep    0:07 18.62% fmd
   630 root        1  59    0 7364K 3044K sleep    0:00  1.05% zpool
   123 root       17  59    0 7268K 2716K sleep    0:00  0.17% syseventd
   641 dunc        1  59    0 3988K 2756K cpu      0:00  0.10% top


At this point, the pool seems to have imported OK though, as I can do this:-
dunc at prefect:~$ zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool    37G  3.90G  33.1G    10%  ONLINE  -
store  1.13T   691G   469G    59%  ONLINE  -
dunc at prefect:~$


More top output.....

load averages:  1.08,  1.29,  1.08;               up 0+00:28:31
05:31:54
57 processes: 55 sleeping, 1 running, 1 on cpu
CPU states:  5.8% idle, 82.0% user, 12.2% kernel,  0.0% iowait,  0.0% swap
Kernel: 352 ctxsw, 265 trap, 491 intr, 4838 syscall, 178 flt
Memory: 1920M phys mem, 427M free mem, 959M total swap, 959M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
    63 root        7  32    0   14M 7184K run     20:37 87.04% devfsadm
   472 root       19  59    0  226M  220M sleep    1:36  5.37% fmd
   630 root        1  59    0 7364K 3044K sleep    0:04  0.25% zpool
   650 dunc        1  59    0 4596K 3220K cpu      0:01  0.15% top



load averages:  1.09,  1.15,  1.10;               up 0+00:39:48
05:43:11
57 processes: 55 sleeping, 1 running, 1 on cpu
CPU states:  1.4% idle, 89.0% user,  9.6% kernel,  0.0% iowait,  0.0% swap
Kernel: 233 ctxsw, 132 trap, 426 intr, 4379 syscall, 67 flt
Memory: 1920M phys mem, 334M free mem, 959M total swap, 959M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
    63 root        7  11    0   14M 7280K run     30:13 94.17% devfsadm
   472 root       19  59    0  283M  276M sleep    2:00  3.16% fmd
   650 dunc        1  59    0 4644K 3268K cpu      0:02  0.16% top
   630 root        1  59    0 7364K 3044K sleep    0:05  0.09% zpool


load averages:  1.03,  1.09,  1.12;               up 0+00:56:19
05:59:42
57 processes: 55 sleeping, 1 running, 1 on cpu
CPU states:  6.4% idle, 85.6% user,  8.0% kernel,  0.0% iowait,  0.0% swap
Kernel: 170 ctxsw, 116 trap, 401 intr, 4089 syscall, 65 flt
Memory: 1920M phys mem, 233M free mem, 959M total swap, 959M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
    63 root        7  21    0   15M 8200K run     45:00 90.86% devfsadm
   472 root       19  59    0  363M  357M sleep    2:32  1.90% fmd
   650 dunc        1  59    0 4700K 3332K cpu      0:04  0.17% top
   630 root        1  59    0 7364K 3044K sleep    0:07  0.10% zpool





All that seems to happen is fmd uses more and more memory.


Eventually, it completes as you can see at the beginning, and I can use
the pool with no problems. At each reboot from now on though, I have to
wait an hour.



Sorry for the long email, I wanted to capture all the information.

Can anybody shed some light on this, or tell me how to start to debug
this problem?

Thanks in advance,

Dunc

[osol-help] Massive zpool import delay

Reply via email to