Re: [zfs-discuss] zpool import starves machine of memory

2011-08-24 Thread Paul Kraus
UPDATE (for those following along at home)...

After patching to latest and greatest Solaris 10 kernel update
and getting firmware on both OS drives (72 GB SAS) and server updated
to latest and greatest, Oracle has now officially declared it a bug
(CR#7082249). No word on when I'll hear back on status of this new bug
(which looks like an old bug, but the old bug has been fixed in the
patches I'm now running).

On Wed, Aug 3, 2011 at 9:19 AM, Paul Kraus p...@kraus-haus.org wrote:
    I am having a very odd problem, and so far the folks at Oracle
 Support have not provided a working solution, so I am asking the crowd
 here while still pursuing it via Oracle Support.

    The system is a T2000 running 10U9 with CPU-2010-01and two J4400
 loaded with 1 TB SATA drives. There is one zpool on the J4400 (3 x 15
 disk vdev + 3 hot spare). This system is the target for zfs send /
 recv replication from our production server.The OS is UFS on local
 disk.

snip

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Designer: Frankenstein, A New Musical
(http://www.facebook.com/event.php?eid=123170297765140)
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import starves machine of memory

2011-08-05 Thread Paul Kraus
Another update:

The configuration of the zpool is 45 x 1 TB drives in three
vdev's, each of 15 drives. We should have a net capacity of between 30
and 36 TB (and that agrees with my memory of the pool). I ran zdb -e
-d against the pool (not imported) and totaled the size of the
datasets and came up with just about 11 TB. This also agrees with my
memory (about 18 TB of data and about 1.5 compression ratio). If the
failed snapshot / zfs recv is 3 TB (like I think it should be) or
almost 8 TB (as Oracle is telling me based on some mdb -k examinations
of the dataset delete thread), I should still have almost 10 TB free.

I am making an assumption here, and that is that the size listed
for the dataset with zdb -d includes all snapshots of that dataset
(much like the SIZE field of a zfs list). If that is NOT the case,
then I need to come up with a different way to estimate the fullness
of this zpool.

On Thu, Aug 4, 2011 at 1:25 PM, Paul Kraus p...@kraus-haus.org wrote:
 Updates to my problem:

 1. The destroy operation appears to be restarting from the same point
 after the system hangs and has to be rebooted. Oracle gave me the
 following to track progress:

 echo '::pgrep zpool$ |::walk thread|::findstack -v' | mdb -k | grep
 dsl_dataset_destroy
 then take first arg of dsl_dataset_destroy and
 echo 'ARG::print dsl_dataset_t ds_phys-ds_used_bytes' | mdb -k

 I am logging these values every minute. Yesterday when I started
 tracking this I got a value of 0x75d97516b62, my last data point
 before the system hung was 0x4ee1098bdfd. My first first data point
 today after rebooting, restarting the logging scripts, and restarting
 the zpool import is 0x7a0b0634a1b. So it looks like I've made no real
 progress.

 2. It looks like the root cause of the original system crash that left
 the incomplete zfs recv snapshot is that the a zfs recv filled the
 zpool (there are two parallel zfs recv's running, one for an old
 configuration (many datasets) and one for the new (one large
 dataset)). My replication script checks for free space before stating
 the replication, but we had a huge data load and replication of it
 running (3 TB), and when it started there was room for it, but other
 (much smaller) data loads and replication may have consumed it. This
 system has no other activity on it, it is just a repository for this
 replicated data.

 So ... it looks like I have:
 - a full zpool
 - an incomplete (corrupt ?) snapshot from a zfs recv
 ... and every time I try to import this zpool I hang the system due to
 lack of memory (the box has 32 GB of RAM).

 Any suggestions how to delete / destroy this incomplete snapshot
 without running the system out of RAM ?

 On Wed, Aug 3, 2011 at 9:56 AM, Paul Kraus p...@kraus-haus.org wrote:
 An additional data point, when i try to do a zdb -e -d and find the
 incomplete zfs recv snapshot I get an error as follows:

 # sudo zdb -e -d xxx-yy-01 | grep %
 Could not open xxx-yy-01/aaa-bb-01/aaa-bb-01-01/%1309906801, error 16
 #

 Anyone know what error 16 means from zdb and how this might impact
 importing this zpool ?

 On Wed, Aug 3, 2011 at 9:19 AM, Paul Kraus p...@kraus-haus.org wrote:
    I am having a very odd problem, and so far the folks at Oracle
 Support have not provided a working solution, so I am asking the crowd
 here while still pursuing it via Oracle Support.

    The system is a T2000 running 10U9 with CPU-2010-01and two J4400
 loaded with 1 TB SATA drives. There is one zpool on the J4400 (3 x 15
 disk vdev + 3 hot spare). This system is the target for zfs send /
 recv replication from our production server.The OS is UFS on local
 disk.

     While I was on vacation this T2000 hung with out of resource
 errors. Other staff tried rebooting, which hung the box. Then they
 rebooted off of an old BE (10U9 without CPU-2010-01). Oracle Support
 had them apply a couple patches and an IDR to address zfs stability
 and reliability problems as well as set the following in /etc/system

 set zfs:zfs_arc_max = 0x7 (which is 28 GB)
 set zfs:arc_meta_limit = 0x7 (which is 28 GB)

    The system has 32 GB RAM and 32 (virtual) CPUs. They then tried
 importing the zpool and the system hung (after many hours) with the
 same out of resource error. At this point they left the problem for
 me :-(

    I removed the zfs.cache from the 10U9 + CPU 2010-10 BE and booted
 from that. I then applied the IDR (IDR146118-12 )and the zfs patch it
 depended on (145788-03). I did not include the zfs arc and zfs arc
 meta limits as I did not think they relevant. A zpool import shows the
 pool is OK and a sampling with zdb -l of the drives shows good labels.
 I started importing the zpool and after many hours it hung the system
 with out of resource errors. I had a number of tools running to see
 what was going on. The only thing this system is doing is importing
 the zpool.

 ARC had climbed to about 8 GB and then declined to 3 GB by the time
 the system hung. This tells me that 

Re: [zfs-discuss] zpool import starves machine of memory

2011-08-04 Thread Paul Kraus
Updates to my problem:

1. The destroy operation appears to be restarting from the same point
after the system hangs and has to be rebooted. Oracle gave me the
following to track progress:

echo '::pgrep zpool$ |::walk thread|::findstack -v' | mdb -k | grep
dsl_dataset_destroy
then take first arg of dsl_dataset_destroy and
echo 'ARG::print dsl_dataset_t ds_phys-ds_used_bytes' | mdb -k

I am logging these values every minute. Yesterday when I started
tracking this I got a value of 0x75d97516b62, my last data point
before the system hung was 0x4ee1098bdfd. My first first data point
today after rebooting, restarting the logging scripts, and restarting
the zpool import is 0x7a0b0634a1b. So it looks like I've made no real
progress.

2. It looks like the root cause of the original system crash that left
the incomplete zfs recv snapshot is that the a zfs recv filled the
zpool (there are two parallel zfs recv's running, one for an old
configuration (many datasets) and one for the new (one large
dataset)). My replication script checks for free space before stating
the replication, but we had a huge data load and replication of it
running (3 TB), and when it started there was room for it, but other
(much smaller) data loads and replication may have consumed it. This
system has no other activity on it, it is just a repository for this
replicated data.

So ... it looks like I have:
- a full zpool
- an incomplete (corrupt ?) snapshot from a zfs recv
... and every time I try to import this zpool I hang the system due to
lack of memory (the box has 32 GB of RAM).

Any suggestions how to delete / destroy this incomplete snapshot
without running the system out of RAM ?

On Wed, Aug 3, 2011 at 9:56 AM, Paul Kraus p...@kraus-haus.org wrote:
 An additional data point, when i try to do a zdb -e -d and find the
 incomplete zfs recv snapshot I get an error as follows:

 # sudo zdb -e -d xxx-yy-01 | grep %
 Could not open xxx-yy-01/aaa-bb-01/aaa-bb-01-01/%1309906801, error 16
 #

 Anyone know what error 16 means from zdb and how this might impact
 importing this zpool ?

 On Wed, Aug 3, 2011 at 9:19 AM, Paul Kraus p...@kraus-haus.org wrote:
    I am having a very odd problem, and so far the folks at Oracle
 Support have not provided a working solution, so I am asking the crowd
 here while still pursuing it via Oracle Support.

    The system is a T2000 running 10U9 with CPU-2010-01and two J4400
 loaded with 1 TB SATA drives. There is one zpool on the J4400 (3 x 15
 disk vdev + 3 hot spare). This system is the target for zfs send /
 recv replication from our production server.The OS is UFS on local
 disk.

     While I was on vacation this T2000 hung with out of resource
 errors. Other staff tried rebooting, which hung the box. Then they
 rebooted off of an old BE (10U9 without CPU-2010-01). Oracle Support
 had them apply a couple patches and an IDR to address zfs stability
 and reliability problems as well as set the following in /etc/system

 set zfs:zfs_arc_max = 0x7 (which is 28 GB)
 set zfs:arc_meta_limit = 0x7 (which is 28 GB)

    The system has 32 GB RAM and 32 (virtual) CPUs. They then tried
 importing the zpool and the system hung (after many hours) with the
 same out of resource error. At this point they left the problem for
 me :-(

    I removed the zfs.cache from the 10U9 + CPU 2010-10 BE and booted
 from that. I then applied the IDR (IDR146118-12 )and the zfs patch it
 depended on (145788-03). I did not include the zfs arc and zfs arc
 meta limits as I did not think they relevant. A zpool import shows the
 pool is OK and a sampling with zdb -l of the drives shows good labels.
 I started importing the zpool and after many hours it hung the system
 with out of resource errors. I had a number of tools running to see
 what was going on. The only thing this system is doing is importing
 the zpool.

 ARC had climbed to about 8 GB and then declined to 3 GB by the time
 the system hung. This tells me that there is something else consuming
 RAM and the ARC is releasing it.

 The hung TOP screen showed the largest user process only had 148 MB
 allocated (and much less resident).

 VMSTAT showed a scan rate of over 900,000 (NOT a typo) and almost 8 GB
 of free swap (so whatever is using memory cannot be paged out).

    So my guess is that there is a kernel module that is consuming all
 (and more) of the RAM in the box. I am looking for a way to query how
 much RAM each kernel module is using and script that in a loop (which
 will hang when the box runs out of RAM next). I am very open to
 suggestions here.

   Since this is the recv end of replication, I assume there was a zfs
 recv going on at the time the system initially hung. I know there was
 a 3+ TB snapshot replicating (via a 100 Mbps WAN link) when I left for
 vacation, that may have still been running. I also assume that any
 partial snapshots (% instead of @) are being removed when the pool is
 imported. But what could be causing a partial snapshot 

[zfs-discuss] zpool import starves machine of memory

2011-08-03 Thread Paul Kraus
I am having a very odd problem, and so far the folks at Oracle
Support have not provided a working solution, so I am asking the crowd
here while still pursuing it via Oracle Support.

The system is a T2000 running 10U9 with CPU-2010-01and two J4400
loaded with 1 TB SATA drives. There is one zpool on the J4400 (3 x 15
disk vdev + 3 hot spare). This system is the target for zfs send /
recv replication from our production server.The OS is UFS on local
disk.

 While I was on vacation this T2000 hung with out of resource
errors. Other staff tried rebooting, which hung the box. Then they
rebooted off of an old BE (10U9 without CPU-2010-01). Oracle Support
had them apply a couple patches and an IDR to address zfs stability
and reliability problems as well as set the following in /etc/system

set zfs:zfs_arc_max = 0x7 (which is 28 GB)
set zfs:arc_meta_limit = 0x7 (which is 28 GB)

The system has 32 GB RAM and 32 (virtual) CPUs. They then tried
importing the zpool and the system hung (after many hours) with the
same out of resource error. At this point they left the problem for
me :-(

I removed the zfs.cache from the 10U9 + CPU 2010-10 BE and booted
from that. I then applied the IDR (IDR146118-12 )and the zfs patch it
depended on (145788-03). I did not include the zfs arc and zfs arc
meta limits as I did not think they relevant. A zpool import shows the
pool is OK and a sampling with zdb -l of the drives shows good labels.
I started importing the zpool and after many hours it hung the system
with out of resource errors. I had a number of tools running to see
what was going on. The only thing this system is doing is importing
the zpool.

ARC had climbed to about 8 GB and then declined to 3 GB by the time
the system hung. This tells me that there is something else consuming
RAM and the ARC is releasing it.

The hung TOP screen showed the largest user process only had 148 MB
allocated (and much less resident).

VMSTAT showed a scan rate of over 900,000 (NOT a typo) and almost 8 GB
of free swap (so whatever is using memory cannot be paged out).

So my guess is that there is a kernel module that is consuming all
(and more) of the RAM in the box. I am looking for a way to query how
much RAM each kernel module is using and script that in a loop (which
will hang when the box runs out of RAM next). I am very open to
suggestions here.

   Since this is the recv end of replication, I assume there was a zfs
recv going on at the time the system initially hung. I know there was
a 3+ TB snapshot replicating (via a 100 Mbps WAN link) when I left for
vacation, that may have still been running. I also assume that any
partial snapshots (% instead of @) are being removed when the pool is
imported. But what could be causing a partial snapshot removal, even
of a very large snapshot, to run the system out of RAM ? What caused
the initial hang of the system (I assume due to out of RAM) ? I did
not think there was a limit to the size of either a snapshot or a zfs
recv.

Hung TOP screen:

load averages: 91.43, 33.48, 18.989 xxx-xxx1   18:45:34
84 processes:  69 sleeping, 12 running, 1 zombie, 2 on cpu
CPU states: 95.2% idle,  0.5% user,  4.4% kernel,  0.0% iowait,  0.0% swap
Memory: 31.9G real, 199M free, 267M swap in use, 7.7G swap free

   PID USERNAME THR PR NCE  SIZE   RES STATE   TIME FLTSCPU COMMAND
   533 root  51 59   0  148M 30.6M run   520:210  9.77% java
  1210 yy 1  0   0 5248K 1048K cpu25   2:080  2.23% xload
 14720 yy 1 59   0 3248K 1256K cpu24   1:560  0.03% top
   154 root   1 59   0 4024K 1328K sleep   1:170  0.02% vmstat
  1268 yy 1 59   0 4248K 1568K sleep   1:260  0.01% iostat
...

VMSTAT:

kthr  memorypagedisk  faults  cpu
 r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m3   in   sy   cs us sy id
 0 0 112 8117096 211888 55 46 0 0 425 0 912684 0 0 0 0  976  166  836  0  2 98
 0 0 112 8117096 211936 53 51 6 0 394 0 926702 0 0 0 0  976  167  833  0  2 98

ARC size (B): 4065882656

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Designer: Frankenstein, A New Musical
(http://www.facebook.com/event.php?eid=123170297765140)
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import starves machine of memory

2011-08-03 Thread Paul Kraus
An additional data point, when i try to do a zdb -e -d and find the
incomplete zfs recv snapshot I get an error as follows:

# sudo zdb -e -d xxx-yy-01 | grep %
Could not open xxx-yy-01/aaa-bb-01/aaa-bb-01-01/%1309906801, error 16
#

Anyone know what error 16 means from zdb and how this might impact
importing this zpool ?

On Wed, Aug 3, 2011 at 9:19 AM, Paul Kraus p...@kraus-haus.org wrote:
    I am having a very odd problem, and so far the folks at Oracle
 Support have not provided a working solution, so I am asking the crowd
 here while still pursuing it via Oracle Support.

    The system is a T2000 running 10U9 with CPU-2010-01and two J4400
 loaded with 1 TB SATA drives. There is one zpool on the J4400 (3 x 15
 disk vdev + 3 hot spare). This system is the target for zfs send /
 recv replication from our production server.The OS is UFS on local
 disk.

     While I was on vacation this T2000 hung with out of resource
 errors. Other staff tried rebooting, which hung the box. Then they
 rebooted off of an old BE (10U9 without CPU-2010-01). Oracle Support
 had them apply a couple patches and an IDR to address zfs stability
 and reliability problems as well as set the following in /etc/system

 set zfs:zfs_arc_max = 0x7 (which is 28 GB)
 set zfs:arc_meta_limit = 0x7 (which is 28 GB)

    The system has 32 GB RAM and 32 (virtual) CPUs. They then tried
 importing the zpool and the system hung (after many hours) with the
 same out of resource error. At this point they left the problem for
 me :-(

    I removed the zfs.cache from the 10U9 + CPU 2010-10 BE and booted
 from that. I then applied the IDR (IDR146118-12 )and the zfs patch it
 depended on (145788-03). I did not include the zfs arc and zfs arc
 meta limits as I did not think they relevant. A zpool import shows the
 pool is OK and a sampling with zdb -l of the drives shows good labels.
 I started importing the zpool and after many hours it hung the system
 with out of resource errors. I had a number of tools running to see
 what was going on. The only thing this system is doing is importing
 the zpool.

 ARC had climbed to about 8 GB and then declined to 3 GB by the time
 the system hung. This tells me that there is something else consuming
 RAM and the ARC is releasing it.

 The hung TOP screen showed the largest user process only had 148 MB
 allocated (and much less resident).

 VMSTAT showed a scan rate of over 900,000 (NOT a typo) and almost 8 GB
 of free swap (so whatever is using memory cannot be paged out).

    So my guess is that there is a kernel module that is consuming all
 (and more) of the RAM in the box. I am looking for a way to query how
 much RAM each kernel module is using and script that in a loop (which
 will hang when the box runs out of RAM next). I am very open to
 suggestions here.

   Since this is the recv end of replication, I assume there was a zfs
 recv going on at the time the system initially hung. I know there was
 a 3+ TB snapshot replicating (via a 100 Mbps WAN link) when I left for
 vacation, that may have still been running. I also assume that any
 partial snapshots (% instead of @) are being removed when the pool is
 imported. But what could be causing a partial snapshot removal, even
 of a very large snapshot, to run the system out of RAM ? What caused
 the initial hang of the system (I assume due to out of RAM) ? I did
 not think there was a limit to the size of either a snapshot or a zfs
 recv.

 Hung TOP screen:

 load averages: 91.43, 33.48, 18.989             xxx-xxx1               
 18:45:34
 84 processes:  69 sleeping, 12 running, 1 zombie, 2 on cpu
 CPU states: 95.2% idle,  0.5% user,  4.4% kernel,  0.0% iowait,  0.0% swap
 Memory: 31.9G real, 199M free, 267M swap in use, 7.7G swap free

   PID USERNAME THR PR NCE  SIZE   RES STATE   TIME FLTS    CPU COMMAND
   533 root      51 59   0  148M 30.6M run   520:21    0  9.77% java
  1210 yy     1  0   0 5248K 1048K cpu25   2:08    0  2.23% xload
  14720 yy     1 59   0 3248K 1256K cpu24   1:56    0  0.03% top
   154 root       1 59   0 4024K 1328K sleep   1:17    0  0.02% vmstat
  1268 yy     1 59   0 4248K 1568K sleep   1:26    0  0.01% iostat
 ...

 VMSTAT:

 kthr      memory            page            disk          faults      cpu
  r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m3   in   sy   cs us sy id
  0 0 112 8117096 211888 55 46 0 0 425 0 912684 0 0 0 0  976  166  836  0  2 98
  0 0 112 8117096 211936 53 51 6 0 394 0 926702 0 0 0 0  976  167  833  0  2 98

 ARC size (B): 4065882656

 --
 {1-2-3-4-5-6-7-}
 Paul Kraus
 - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
 - Sound Designer: Frankenstein, A New Musical
 (http://www.facebook.com/event.php?eid=123170297765140)
 - Sound Coordinator, Schenectady Light Opera Company (
 http://www.sloctheater.org/ )
 - Technical Advisor, RPI Players




--