Re: [zfs-discuss] test for holes in a file?

2012-03-27 Thread Bob Friesenhahn

On Mon, 26 Mar 2012, Mike Gerdts wrote:


If file space usage is less than file directory size then it must contain a
hole.  Even for compressed files, I am pretty sure that Solaris reports the
uncompressed space usage.


That's not the case.


You are right.  I should have tested this prior to posting. :-(

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-27 Thread Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D.

hi
you did not answer the question, what is the RAM of the server? how many 
socket and core  etc

what is the block size of zfs?
what is the cache  ram of your  san array?
what is the block size/strip size  of your raid in san array? raid 5 or 
what?

what is your test program and how (from what kind  client)
regards



On 3/26/2012 11:13 PM, Aubrey Li wrote:

On Tue, Mar 27, 2012 at 1:15 AM, Jim Klimov  wrote:

Well, as a further attempt down this road, is it possible for you to rule
out
ZFS from swapping - i.e. if RAM amounts permit, disable the swap at all
(swap -d /dev/zvol/dsk/rpool/swap) or relocate it to dedicated slices of
same or better yet separate disks?


Thanks Jim for your suggestion!



If you do have lots of swapping activity (that can be seen in "vmstat 1" as
si/so columns) going on in a zvol, you're likely to get much fragmentation
in the pool, and searching for contiguous stretches of space can become
tricky (and time-consuming), or larger writes can get broken down into
many smaller random writes and/or "gang blocks", which is also slower.
At least such waiting on disks can explain the overall large kernel times.

I took swapping activity into account, even when the CPU% is 100%, "si"
(swap-ins) and "so" (swap-outs) are always ZEROs.


You can also see the disk wait times ratio in "iostat -xzn 1" column "%w"
and disk busy times ratio in "%b" (second and third from the right).
I dont't remember you posting that.

If these are accounting in tens, or even close or equal to 100%, then
your disks are the actual bottleneck. Speeding up that subsystem,
including addition of cache (ARC RAM, L2ARC SSD, maybe ZIL
SSD/DDRDrive) and combatting fragmentation by moving swap and
other scratch spaces to dedicated pools or raw slices might help.

My storage system is not quite busy, and there are only read operations.
=
# iostat -xnz 3
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   112.40.0 1691.50.0  0.0  0.50.04.8   0  41 c11t0d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   118.70.0 1867.00.0  0.0  0.50.04.5   0  42 c11t0d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   127.70.0 2121.60.0  0.0  0.60.04.7   0  44 c11t0d0
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   141.30.0 2158.50.0  0.0  0.70.04.6   0  48 c11t0d0
==

Thanks,
-Aubrey
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Hung-Sheng Tsao Ph D.
Founder&  Principal
HopBit GridComputing LLC
cell: 9734950840

http://laotsao.blogspot.com/
http://laotsao.wordpress.com/
http://blogs.oracle.com/hstsao/

<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import

2012-03-27 Thread Paul Kraus
On Tue, Mar 27, 2012 at 3:14 AM, Carsten John  wrote:
> Hallo everybody,
>
> I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic 
> during the import of a zpool (some 30TB) containing ~500 zfs filesystems 
> after reboot. This causes a reboot loop, until booted single user and removed 
> /etc/zfs/zpool.cache.
>
>
> From /var/adm/messages:
>
> savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf 
> Page fault) rp=ff002f9cec50 addr=20 occurred in module "zfs" due to a 
> NULL pointer dereference
> savecore: [ID 882351 auth.error] Saving compressed system crash dump in 
> /var/crash/vmdump.2
>

I ran into a very similar problem with Solaris 10U9 and the
replica (zfs send | zfs recv destination) of a zpool of about 25 TB of
data. The problem was an incomplete snapshot (the zfs send | zfs recv
had been interrupted). On boot the system was trying to import the
zpool and as part of that it was trying to destroy the offending
(incomplete) snapshot. This was zpool version 22 and destruction of
snapshots is handled as a single TXG. The problem was that the
operation was running the system out of RAM (32 GB worth). There is a
fix for this and it is in zpool 26 (or newer), but any snapshots
created while the zpool is at a version prior to 26 will have the
problem on-disk. We have support with Oracle and were able to get a
loaner system with 128 GB RAM to clean up the zpool (it took about 75
GB RAM to do so).

If you are at zpool 26 or later this is not your problem. If you
are at zpool < 26, then test for an incomplete snapshot by importing
the pool read only, then `zdb -d  | grep '%'` as the incomplete
snapshot will have a '%' instead of a '@' as the dataset / snapshot
separator. You can also run the zdb against the _un_imported_ zpool
using the -e option to zdb.

See the following Oracle Bugs for more information.

CR# 6876953
CR# 6910767
CR# 7082249

CR#7082249 has been marked as a duplicate of CR# 6948890

P.S. I have a suspect that the incomplete snapshot was also corrupt in
some strange way, but could never make a solid determination of that.
We think what caused the zfs send | zfs recv to be interrupted was
hitting an e1000g Ethernet device driver bug.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, Troy Civic Theatre Company
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] kernel panic during zfs import

2012-03-27 Thread Jim Klimov

2012-03-27 11:14, Carsten John write:

I saw a similar effect some time ago on a opensolaris box (build 111b). That 
time my final solution was to copy over the read only mounted stuff to a newly 
created pool. As it is the second time this failure occures (on different 
machines) I'm really concerned about overall reliability



Any suggestions?


A couple of months ago I reported a similar issue (though with
a different stacktrace and code path). I tracked it to code in
freeing of deduped blocks where a valid code path could return
a NULL pointer, but further routines used the pointer as if it
is always valid - thus a NULL dereference when the pool was
imported RW and tried to release blocks marked for deletion.

Adding a check for non-NULLness in my private rebuild of oi_151a
has fixed the issue. I wouldn't be surprised to see similar
slackiness in other parts of the code now. Not checking input
values in routines seems like an arrogant mistake waiting to
fire (and it did for us).

I am not sure how to make a webrev and ultimately a signed-off
contribution upstream, but I posted my patch and research on
the list and in illumos bugtracker.

I am not sure how you can fix a S11 system though.
If it is at zpool v28 or older, you can try to import it into
an openindiana installation, perhaps rebuilt for similar
patched code that would check for NULLs and fix your pool
(and then reuse it in S11 if you must). The source is there
on http://src.illumos.org and your stacktrace should tell you
in which functions you should start looking...

Good luck,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-27 Thread Phil Harman
One of the glories of Solaris is that it is so very observable. Then 
there are the many excellent blog posts, wiki entries, and books - some 
or which are authored by contributors to this very thread - explaining 
how Solaris works. But these virtues are also a snare to some, and it is 
not uncommon for even experienced practitioners to become fixated by 
this or that.


Aubrey, a 70:30 user to system ratio is pretty respectable. Running at 
100% is not so pretty (e.g. I would be surprised if you DIDN'T see many 
involuntary context switches (icsw) in such a scenario). Esteemed 
experts have assured you that ZFS lock contention is not you main 
concern. I would run with that.


You said it was a stress test. That raises many questions for me. I am 
much more interested in how systems perform in the real world. In my 
experience, many of the issues we find in production are not the ones we 
found in the lab. Indeed, I would argue that too many systems are tuned 
to run simplistic benchmarks instead of real workloads.


However, I don't think it's helpful to continue this discussion here. 
There are some esteemed experienced practitioners "out there" who would 
be only too happy to provide holistic systems performance tuning and 
health check services to you on a commercial basis (I hear that some may 
even accept PayPal).


Phil
(p...@harmanholistix.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] kernel panic during zfs import

2012-03-27 Thread Carsten John
Hallo everybody,

I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic 
during the import of a zpool (some 30TB) containing ~500 zfs filesystems after 
reboot. This causes a reboot loop, until booted single user and removed 
/etc/zfs/zpool.cache.


>From /var/adm/messages:

savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page 
fault) rp=ff002f9cec50 addr=20 occurred in module "zfs" due to a NULL 
pointer dereference
savecore: [ID 882351 auth.error] Saving compressed system crash dump in 
/var/crash/vmdump.2

This is what mdb tells:

mdb unix.2 vmcore.2
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp 
scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs 
random fcp idm sata fcip cpc crypto ufs logindmux ptm sppp ]
$c
zap_leaf_lookup_closest+0x45(ff0700ca2a98, 0, 0, ff002f9cedb0)
fzap_cursor_retrieve+0xcd(ff0700ca2a98, ff002f9ceed0, ff002f9cef10)
zap_cursor_retrieve+0x195(ff002f9ceed0, ff002f9cef10)
zfs_purgedir+0x4d(ff0721d32c20)
zfs_rmnode+0x57(ff0721d32c20)
zfs_zinactive+0xb4(ff0721d32c20)
zfs_inactive+0x1a3(ff0721d3a700, ff07149dc1a0, 0)
fop_inactive+0xb1(ff0721d3a700, ff07149dc1a0, 0)
vn_rele+0x58(ff0721d3a700)
zfs_unlinked_drain+0xa7(ff07022dab40)
zfsvfs_setup+0xf1(ff07022dab40, 1)
zfs_domount+0x152(ff07223e3c70, ff0717830080)
zfs_mount+0x4e3(ff07223e3c70, ff07223e5900, ff002f9cfe20, 
ff07149dc1a0)
fsop_mount+0x22(ff07223e3c70, ff07223e5900, ff002f9cfe20, 
ff07149dc1a0)
domount+0xd2f(0, ff002f9cfe20, ff07223e5900, ff07149dc1a0, 
ff002f9cfe18)
mount+0xc0(ff0713612c78, ff002f9cfe98)
syscall_ap+0x92()
_sys_sysenter_post_swapgs+0x149()


I can import the pool readonly.

The server is a mirror for our primary file server and is synced via zfs 
send/receive.

I saw a similar effect some time ago on a opensolaris box (build 111b). That 
time my final solution was to copy over the read only mounted stuff to a newly 
created pool. As it is the second time this failure occures (on different 
machines) I'm really concerned about overall reliability 



Any suggestions?


thx

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] test for holes in a file?

2012-03-27 Thread Casper . Dik

>On Mon, 26 Mar 2012, Andrew Gabriel wrote:
>
>> I just played and knocked this up (note the stunning lack of comments, 
>> missing optarg processing, etc)...
>> Give it a list of files to check...
>
>This is a cool program, but programmers were asking (and answering) 
>this same question 20+ years ago before there was anything like 
>SEEK_HOLE.
>
>If file space usage is less than file directory size then it must 
>contain a hole.  Even for compressed files, I am pretty sure that 
>Solaris reports the uncompressed space usage.
>

Unfortunately not true with filesystems which compress data.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss