Re: [zfs-discuss] Question about ZFS Incremental Send/Receive

2009-04-29 Thread Darren J Moffat

Mattias Pantzare wrote:

O I feel like I understand what tar is doing, but I'm curious about what is it

that ZFS is looking at that makes it a successful incremental send? That
is, not send the entire file again. Does it have to do with how the
application (tar in this example) does a file open, fopen(), and what mode
is used? i.e. open for read, open for write, open for append. Or is it
looking at a file system header, or checksum? I'm just trying to explain
some observed behavior we're seeing during our testing.

My proof of concept is to remote replicate these container files, which
are created by a 3rd party application.


ZFS knows what blocks where written since the first snapshot was taken.

Filenames or type of open is not important.

If you open a file and rewrite all blocks in that file with the same
content all those block will be sent. If you rewrite 5 block only 5
blocks are sent (plus the meta data that where updated).


Providing the application doing this does exactly what you said and not 
what a lot of apps (particularly editors) do which is write in a tmp 
file, unlink and rename.



The way it works is that all blocks have a time stamp. Block with a time
stamp newer that the first snapshot will be sent.


Not really a time stamp but a transaction group number.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raidz vdev size... again.

2009-04-29 Thread Bob Friesenhahn

On Tue, 28 Apr 2009, Richard Elling wrote:


I suppose if you could freeze the media to 0K, then it would not decay.
But that isn't the world I live in :-).  There is a whole Journal devoted
to things magnetic, with lots of studies of interesting compounds.  But
from a practical perspective, it is worth noting that some magnetic tapes
have a rated shelf life of 8-10 years while enterprise-class backup tapes
are only rated at 30 years.  Most disks have an expected operational life
of 5 years or so.  As Tim notes, it is a good idea to plan for migrating
important data to newer devices over time.


I am definitely a fan of migrating data.  As far as media degredation 
goes, perhaps much of the concern is the stability of the base stock 
(e.g. plastic) or disk drive mechanism and heads, and not the ability 
of the magnetic stuff to maintain its magnetism.


However, even the planet earth has an average shelf-life of 10,000 
years, after which the poles may suddenly be reversed (compass points 
in opposite direction).


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool import crash, import degraded mirror?

2009-04-29 Thread Rob Logan

When I type `zpool import` to see what pools are out there, it gets to

/1: open(/dev/dsk/c5t2d0s0, O_RDONLY) = 6
/1: stat64(/usr/local/apache2/lib/libdevid.so.1, 0x08042758) Err#2 ENOENT
/1: stat64(/usr/lib/libdevid.so.1, 0x08042758)= 0
/1: d=0x02D90002 i=241208 m=0100755 l=1  u=0 g=2 sz=61756
/1: at = Apr 29 23:41:17 EDT 2009  [ 1241062877 ]
/1: mt = Apr 27 01:45:19 EDT 2009  [ 124089 ]
/1: ct = Apr 27 01:45:19 EDT 2009  [ 124089 ]
/1: bsz=61952 blks=122   fs=zfs
/1: resolvepath(/usr/lib/libdevid.so.1, /lib/libdevid.so.1, 1023) = 18
/1: open(/usr/lib/libdevid.so.1, O_RDONLY)= 7
/1: mmapobj(7, 0x0002, 0xFEC70640, 0x080427C4, 0x) = 0
/1: close(7)= 0
/1: memcntl(0xFEC5, 4048, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
/1: fxstat(2, 6, 0x080430C0)= 0
/1: d=0x04A0 i=5015 m=0060400 l=1  u=0 g=0 
rdev=0x01800340
/1: at = Nov 19 21:19:26 EST 2008  [ 1227147566 ]
/1: mt = Nov 19 21:19:26 EST 2008  [ 1227147566 ]
/1: ct = Apr 29 23:23:11 EDT 2009  [ 1241061791 ]
/1: bsz=8192  blks=1 fs=devfs
/1: modctl(MODSIZEOF_DEVID, 0x01800340, 0x080430BC, 0xFEC51239, 0xFE8E92C0) 
= 0
/1: modctl(MODGETDEVID, 0x01800340, 0x0038, 0x080D5A48, 0xFE8E92C0) = 0
/1: fxstat(2, 6, 0x080430C0)= 0
/1: d=0x04A0 i=5015 m=0060400 l=1  u=0 g=0 
rdev=0x01800340
/1: at = Nov 19 21:19:26 EST 2008  [ 1227147566 ]
/1: mt = Nov 19 21:19:26 EST 2008  [ 1227147566 ]
/1: ct = Apr 29 23:23:11 EDT 2009  [ 1241061791 ]
/1: bsz=8192  blks=1 fs=devfs
/1: modctl(MODSIZEOF_MINORNAME, 0x01800340, 0x6000, 0x080430BC, 
0xFE8E92C0) = 0
/1: modctl(MODGETMINORNAME, 0x01800340, 0x6000, 0x0002, 0x0808FFC8) 
= 0
/1: close(6)= 0
/1: ioctl(3, ZFS_IOC_POOL_STATS, 0x08042220)= 0

and then the machine dies consistently with:

panic[cpu1]/thread=ff01d045a3a0:
BAD TRAP: type=e (#pf Page fault) rp=ff000857f4f0 addr=260 occurred in module 
unix due to a NULL pointer dereference

zpool:
#pf Page fault
Bad kernel fault at addr=0x260
pid=576, pc=0xfb854e8b, sp=0xff000857f5e8, eflags=0x10246
cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
cr2: 260
cr3: 12b69
cr8: c

rdi:  260 rsi:4 rdx: ff01d045a3a0
rcx:0  r8:   40  r9:21ead
rax:0 rbx:0 rbp: ff000857f640
r10:  bf88840 r11: ff01d041e000 r12:0
r13:  260 r14:4 r15: ff01ce12ca28
fsb:0 gsb: ff01ce985ac0  ds:   4b
 es:   4b  fs:0  gs:  1c3
trp:e err:2 rip: fb854e8b
 cs:   30 rfl:10246 rsp: ff000857f5e8
 ss:   38

ff000857f3d0 unix:die+dd ()
ff000857f4e0 unix:trap+1752 ()
ff000857f4f0 unix:cmntrap+e9 ()
ff000857f640 unix:mutex_enter+b ()
ff000857f660 zfs:zio_buf_alloc+2c ()
ff000857f6a0 zfs:arc_get_data_buf+173 ()
ff000857f6f0 zfs:arc_buf_alloc+a2 ()
ff000857f770 zfs:dbuf_read_impl+1b0 ()
ff000857f7d0 zfs:dbuf_read+fe ()
ff000857f850 zfs:dnode_hold_impl+d9 ()
ff000857f880 zfs:dnode_hold+2b ()
ff000857f8f0 zfs:dmu_buf_hold+43 ()
ff000857f990 zfs:zap_lockdir+67 ()
ff000857fa20 zfs:zap_lookup_norm+55 ()
ff000857fa80 zfs:zap_lookup+2d ()
ff000857faf0 zfs:dsl_pool_open+91 ()
ff000857fbb0 zfs:spa_load+696 ()
ff000857fc00 zfs:spa_tryimport+95 ()
ff000857fc40 zfs:zfs_ioc_pool_tryimport+3e ()
ff000857fcc0 zfs:zfsdev_ioctl+10b ()
ff000857fd00 genunix:cdev_ioctl+45 ()
ff000857fd40 specfs:spec_ioctl+83 ()
ff000857fdc0 genunix:fop_ioctl+7b ()
ff000857fec0 genunix:ioctl+18e ()
ff000857ff10 unix:brand_sys_sysenter+1e6 ()

the offending disk, c5t2d0s0, is part of a mirror that if removed I can
see the results (from the other mirror half) and the machine does not crash.
all 8 labels look diff perfect

version=13
name='r'
state=0
txg=2110897
pool_guid=10861732602511278403
hostid=13384243
hostname='nas'
top_guid=6092190056527819247
guid=16682108003687674581
vdev_tree
type='mirror'
id=0
guid=6092190056527819247
whole_disk=0
metaslab_array=23
metaslab_shift=31
ashift=9
asize=320032473088
is_log=0
children[0]
type='disk'
id=0
guid=16682108003687674581
path='/dev/dsk/c5t2d0s0'