Hi,

I recently started to get huge delays when booting my opensolaris box,
since upgrading to 2009.06 although not for the first couple of boots
afterwards. Thanks to a pointer from someone in #opensolaris, I've
discovered it's the zpool import of my storage pool that is taking ages.


I re-installed again, and have captured everything that I know how to
below as I import the pool.

Initial ZFS status after install
--------------------------------
Last login: Sun Jul 19 05:01:35 2009
Sun Microsystems Inc.   SunOS 5.11      snv_111b        November 2008
dunc at prefect:~$ pfexec bash
dunc at prefect:~# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool    37G  3.90G  33.1G    10%  ONLINE  -
dunc at prefect:~# zfs list
NAME                     USED  AVAIL  REFER  MOUNTPOINT
rpool                   4.74G  31.7G  77.5K  /rpool
rpool/ROOT              2.86G  31.7G    19K  legacy
rpool/ROOT/opensolaris  2.86G  31.7G  2.86G  /
rpool/dump               959M  31.7G   959M  -
rpool/export             890K  31.7G    21K  /export
rpool/export/home        869K  31.7G    21K  /export/home
rpool/export/home/dunc   848K  31.7G   848K  /export/home/dunc
rpool/swap               959M  32.5G   101M  -

Importing the pool
------------------
dunc at prefect:~# zpool import
  pool: store
    id: 6301690472664648097
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        store       ONLINE
          mirror    ONLINE
            c10d0   ONLINE
            c9d0    ONLINE
          mirror    ONLINE
            c11d0   ONLINE
            c12d0   ONLINE
dunc at prefect:~#

dunc at prefect:~# date && zpool import -f store && date
Sunday, 19 July 2009 05:07:19 BST
Sunday, 19 July 2009 06:05:53 BST
dunc at prefect:~#



As you can see the import took approximately 1 hour. I'm sure this can't
be right, but I don't know why it's happening.

While this was going on, devfsadm was busy the whole time, please see
below some captured output from top at intervals.


Just after I ran the import:-

load averages:  0.66,  0.45,  0.22;               up 0+00:04:36
05:07:59
54 processes: 52 sleeping, 1 running, 1 on cpu
CPU states: 16.0% idle, 78.2% user, 21.8% kernel,  0.0% iowait,  0.0% swap
Kernel: 1050 ctxsw, 319 trap, 498 intr, 8951 syscall, 235 flt
Memory: 1920M phys mem, 1078M free mem, 959M total swap, 959M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
    63 root        7  31    0   13M 5944K run      0:22 76.16% devfsadm
   472 root       19  59    0   35M   29M sleep    0:07 18.62% fmd
   630 root        1  59    0 7364K 3044K sleep    0:00  1.05% zpool
   123 root       17  59    0 7268K 2716K sleep    0:00  0.17% syseventd
   641 dunc        1  59    0 3988K 2756K cpu      0:00  0.10% top


At this point, the pool seems to have imported OK though, as I can do this:-
dunc at prefect:~$ zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool    37G  3.90G  33.1G    10%  ONLINE  -
store  1.13T   691G   469G    59%  ONLINE  -
dunc at prefect:~$


More top output.....

load averages:  1.08,  1.29,  1.08;               up 0+00:28:31
05:31:54
57 processes: 55 sleeping, 1 running, 1 on cpu
CPU states:  5.8% idle, 82.0% user, 12.2% kernel,  0.0% iowait,  0.0% swap
Kernel: 352 ctxsw, 265 trap, 491 intr, 4838 syscall, 178 flt
Memory: 1920M phys mem, 427M free mem, 959M total swap, 959M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
    63 root        7  32    0   14M 7184K run     20:37 87.04% devfsadm
   472 root       19  59    0  226M  220M sleep    1:36  5.37% fmd
   630 root        1  59    0 7364K 3044K sleep    0:04  0.25% zpool
   650 dunc        1  59    0 4596K 3220K cpu      0:01  0.15% top



load averages:  1.09,  1.15,  1.10;               up 0+00:39:48
05:43:11
57 processes: 55 sleeping, 1 running, 1 on cpu
CPU states:  1.4% idle, 89.0% user,  9.6% kernel,  0.0% iowait,  0.0% swap
Kernel: 233 ctxsw, 132 trap, 426 intr, 4379 syscall, 67 flt
Memory: 1920M phys mem, 334M free mem, 959M total swap, 959M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
    63 root        7  11    0   14M 7280K run     30:13 94.17% devfsadm
   472 root       19  59    0  283M  276M sleep    2:00  3.16% fmd
   650 dunc        1  59    0 4644K 3268K cpu      0:02  0.16% top
   630 root        1  59    0 7364K 3044K sleep    0:05  0.09% zpool


load averages:  1.03,  1.09,  1.12;               up 0+00:56:19
05:59:42
57 processes: 55 sleeping, 1 running, 1 on cpu
CPU states:  6.4% idle, 85.6% user,  8.0% kernel,  0.0% iowait,  0.0% swap
Kernel: 170 ctxsw, 116 trap, 401 intr, 4089 syscall, 65 flt
Memory: 1920M phys mem, 233M free mem, 959M total swap, 959M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
    63 root        7  21    0   15M 8200K run     45:00 90.86% devfsadm
   472 root       19  59    0  363M  357M sleep    2:32  1.90% fmd
   650 dunc        1  59    0 4700K 3332K cpu      0:04  0.17% top
   630 root        1  59    0 7364K 3044K sleep    0:07  0.10% zpool





All that seems to happen is fmd uses more and more memory.


Eventually, it completes as you can see at the beginning, and I can use
the pool with no problems. At each reboot from now on though, I have to
wait an hour.



Sorry for the long email, I wanted to capture all the information.

Can anybody shed some light on this, or tell me how to start to debug
this problem?

Thanks in advance,

Dunc

Reply via email to