Re: [zfs-discuss] ZFS doesn't notice errors in mirrored log device?

2010-11-23 Thread Victor Latushkin

On Nov 13, 2010, at 7:33 AM, Edward Ned Harvey wrote:

 Log devices are generally write-only.  They are only read during boot, after 
 an ungraceful crash.  It is extremely difficult to get a significant number 
 of GB used on the log device, because they are flushed out to primary storage 
 so frequently.  They are not read when you do a scrub.

This is wrong. We do read ZIL blocks when scrubbing, as you for example can 
have a filesystem with non-empty ZIL which is not mounted.

regards
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ashift and vdevs

2010-11-23 Thread taemun
zdb -C shows an shift value on each vdev in my pool, I was just wondering if
it is vdev specific, or pool wide. Google didn't seem to know.

I'm considering a mixed pool with some advanced format (4KB sector)
drives, and some normal 512B sector drives, and was wondering if the ashift
can be set per vdev, or only per pool. Theoretically, this would save me
some size on metadata on the 512B sector drives.

Cheers,
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] possible zfs recv bug?

2010-11-23 Thread James Van Artsdalen
I am seeing a zfs recv bug on FreeBSD and am wondering if someone could test 
this in the Solaris code.  If it fails there then I guess a bug report into 
Solaris is needed.

This is a perverse case of filesystem renaming between snapshots.

kraken:/root# cat zt

zpool create rec1 da3
zpool create rec2 da4

zfs create rec1/a
zfs create rec1/a/b

zfs snapshot -r r...@s1
zfs send -R r...@s1 | zfs recv -dvuF rec2

zfs rename rec1/a/b rec1/c
zfs destroy -r rec1/a
zfs create rec1/a
zfs rename rec1/c rec1/a/b # if the rename target is anything other than 
rec1/a/b the zfs recv result is right

zfs snapshot -r r...@s2
zfs send -R -I @s1 r...@s2 | zfs recv -dvuF rec2
kraken:/root# sh -x zt
+ zpool create rec1 da3
+ zpool create rec2 da4
+ zfs create rec1/a
+ zfs create rec1/a/b
+ zfs snapshot -r r...@s1
+ zfs send -R r...@s1
+ zfs recv -dvuF rec2
receiving full stream of r...@s1 into r...@s1
received 47.4KB stream in 2 seconds (23.7KB/sec)
receiving full stream of rec1/a...@s1 into rec2/a...@s1
received 47.9KB stream in 1 seconds (47.9KB/sec)
receiving full stream of rec1/a/b...@s1 into rec2/a/b...@s1
received 46.3KB stream in 1 seconds (46.3KB/sec)
+ zfs rename rec1/a/b rec1/c
+ zfs destroy -r rec1/a
+ zfs create rec1/a
+ zfs rename rec1/c rec1/a/b
+ zfs snapshot -r r...@s2
+ zfs send -R -I @s1 r...@s2
+ zfs recv -dvuF rec2
attempting destroy rec2/a...@s1
success
attempting destroy rec2/a
failed - trying rename rec2/a to rec2/recv-2176-1
local fs rec2/a/b new parent not found
cannot open 'rec2/a/b': dataset does not exist
another pass:
attempting destroy rec2/recv-2176-1
failed (0)
receiving incremental stream of r...@s2 into r...@s2
received 10.8KB stream in 2 seconds (5.41KB/sec)
receiving full stream of rec1/a...@s2 into rec2/a...@s2
received 47.9KB stream in 1 seconds (47.9KB/sec)
receiving incremental stream of rec1/a/b...@s2 into rec2/recv-2176-1/b...@s2
received 312B stream in 2 seconds (156B/sec)
local fs rec2/a does not have fromsnap (s1 in stream); must have been deleted 
locally; ignoring
attempting destroy rec2/recv-2176-1
failed (0)
kraken:/root# zfs list | grep rec1
rec1   238K   1.78T 
   32K  /rec1
rec1/a  63K   1.78T 
   32K  /rec1/a
rec1/a/b31K   1.78T 
   31K  /rec1/a/b
kraken:/root# zfs list | grep rec2
rec2   293K   1.78T 
   32K  /rec2
rec2/a  32K   1.78T 
   32K  /rec2/a
rec2/recv-2176-164K   1.78T 
   32K  /rec2/recv-2176-1
rec2/recv-2176-1/b  32K  1.78T  
  31K  /rec2/recv-2176-1/b
kraken:/root#
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-23 Thread David Magda
On Tue, November 23, 2010 08:53, taemun wrote:
 zdb -C shows an shift value on each vdev in my pool, I was just wondering
 if it is vdev specific, or pool wide. Google didn't seem to know.

 I'm considering a mixed pool with some advanced format (4KB sector)
 drives, and some normal 512B sector drives, and was wondering if the
 ashift can be set per vdev, or only per pool. Theoretically, this would
 save me some size on metadata on the 512B sector drives.

It's a per-pool property, and currently hard coded to a value of nine
(i.e., 2^9 = 512). Sun/Oracle are aware of the new, upcoming sector size/s
and some changes have been made in the code:

a. PSARC/2008/769: Multiple disk sector size support
http://arc.opensolaris.org/caselog/PSARC/2008/769/
b. PSARC/2010/296: Add tunable to control RMW for Flash Devices
http://arc.opensolaris.org/caselog/PSARC/2010/296/

(a) appears to have been fixed in snv_118 or so:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6710930

However, at this time, there is no publicly available code that
dynamically determines physical sector size and then adjusts ZFS pools
automatically. Even if there was, most disks don't support the necessary
ATA/SCSI command extensions to report on physical and logical sizes
differences. AFAIK, they all simply report 512 when asked.

If all of your disks will be 4K, you can hack together a solution to take
advantage of that fact:

http://tinyurl.com/25gmy7o
http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html


Hopefully it'll make it into at least Solaris 11, as during the lifetime
of that product there will be even more disks with that property. There's
also the fact that many LUNs from SANs also have alignment issues, though
they tend to be at 64K. (At least that's what VMware and NetApp best
practices state.)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-23 Thread Krunal Desai
Interesting, I didn't realize that Soracle was working on/had a
solution somewhat in place for 4K-drives. I wonder what will happen
first for me, Hitachi 7K2000s hitting a reasonable price, or
4K/variable-size sector support hiting so I can use Samsung F4s or
Barracuda LPs.

On Tue, Nov 23, 2010 at 9:40 AM, David Magda dma...@ee.ryerson.ca wrote:
 On Tue, November 23, 2010 08:53, taemun wrote:
 zdb -C shows an shift value on each vdev in my pool, I was just wondering
 if it is vdev specific, or pool wide. Google didn't seem to know.

 I'm considering a mixed pool with some advanced format (4KB sector)
 drives, and some normal 512B sector drives, and was wondering if the
 ashift can be set per vdev, or only per pool. Theoretically, this would
 save me some size on metadata on the 512B sector drives.

 It's a per-pool property, and currently hard coded to a value of nine
 (i.e., 2^9 = 512). Sun/Oracle are aware of the new, upcoming sector size/s
 and some changes have been made in the code:

 a. PSARC/2008/769: Multiple disk sector size support
        http://arc.opensolaris.org/caselog/PSARC/2008/769/
 b. PSARC/2010/296: Add tunable to control RMW for Flash Devices
        http://arc.opensolaris.org/caselog/PSARC/2010/296/

 (a) appears to have been fixed in snv_118 or so:

        http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6710930

 However, at this time, there is no publicly available code that
 dynamically determines physical sector size and then adjusts ZFS pools
 automatically. Even if there was, most disks don't support the necessary
 ATA/SCSI command extensions to report on physical and logical sizes
 differences. AFAIK, they all simply report 512 when asked.

 If all of your disks will be 4K, you can hack together a solution to take
 advantage of that fact:

 http://tinyurl.com/25gmy7o
 http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html


 Hopefully it'll make it into at least Solaris 11, as during the lifetime
 of that product there will be even more disks with that property. There's
 also the fact that many LUNs from SANs also have alignment issues, though
 they tend to be at 64K. (At least that's what VMware and NetApp best
 practices state.)


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
--khd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-23 Thread taemun
Cheers for the links David, but you'll note that I've commented on the blog
you linked (ie, was aware of it). The zpool-12 binary linked from
http://digitaldj.net/2010/11/03/zfs-zpool-v28-openindiana-b147-4k-drives-and-you/
worked
perfectly on my SX11 installation. (It threw some error on b134, so it
relies on some external code, to some extent.)

I'd note for those who are going to try, that that binary produces a pool of
as high a version as the system supports. I was surprised that it was higher
than the code for which it was compiled (ie, b147 = zpool v28).

I'm currently populating a pool with a 9-wide raidz vdev of Samsung HD204UI
2TB (5400rpm, 4KB sector) and a 9-wide raidz vdev of Seagate LP ST32000542AS
2TB (5900 rpm, 4KB sector) which was created with that binary, and haven't
seen any of the performance issues I've had in the past with WD EARS drives.

It would be lovely if Oracle could see fit to implementing correct detection
of these drives! Or, at the very least, an -o ashift=12 parameter in the
zpool create function.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ashift and vdevs

2010-11-23 Thread Krunal Desai
On Tue, Nov 23, 2010 at 9:59 AM, taemun tae...@gmail.com wrote:
 I'm currently populating a pool with a 9-wide raidz vdev of Samsung HD204UI
 2TB (5400rpm, 4KB sector) and a 9-wide raidz vdev of Seagate LP ST32000542AS
 2TB (5900 rpm, 4KB sector) which was created with that binary, and haven't
 seen any of the performance issues I've had in the past with WD EARS drives.
 It would be lovely if Oracle could see fit to implementing correct detection
 of these drives! Or, at the very least, an -o ashift=12 parameter in the
 zpool create function.

What is the upgrade path like from this? For example, currently I
have b134 OpenSolaris with 8x1.5TB drives in a -Z2 storage pool. I
would like to go to OpenIndiana and move that data to a new pool built
of 3 6-drive -Z2s (utilizing 2TB drives). I am going to stagger my
drive purchases to give my wallet a breather, so I would likely start
with 2 6-drive -Z2s at the beginning. If I was to use that
binary/hack to force the ashift for 4k drives, I should be able to
reconcile and upgrade to a zpool version down the road that is happy
and aware of 4k drives?

I know the safest route would be to just go with 512-byte sector
7K2000s, but their prices do not drop nearly as often as the LPs or
F4s do.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] drive replaced from spare

2010-11-23 Thread Tony Schreiner
I have a x4540 with a single pool made from a bunch of raidz1's with  
2  spares (solaris 10 u7).  Been running great for over a year, but  
I've  had my first event.


A day ago the system activated one of the spares c4t7d0, but given the  
status  below, I'm not sure what to do next.


# zpool status
  pool: pool1
 state: ONLINE
 scrub: resilver completed after 2h25m with 0 errors on Mon Nov 22
14:25:12 2010
config:

NAME  STATE READ WRITE CKSUM
pool1 ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c0t3d0ONLINE   0 0 0
spare ONLINE   0 0 0
  c1t3d0  ONLINE   0 0 0
  c4t7d0  ONLINE   0 0 0
c2t3d0ONLINE   0 0 0
c3t3d0ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c5t3d0ONLINE   0 0 0



c3t1d0ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
spares
  c4t7d0  INUSE currently in use
  c5t7d0  AVAIL

am I supposed to do something with c1t3d0 now?

Thanks,
Tony Schreiner
Boston College
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] drive replaced from spare

2010-11-23 Thread Bryan Horstmann-Allen
+--
| On 2010-11-23 13:28:38, Tony Schreiner wrote:
| 
 am I supposed to do something with c1t3d0 now?

Presumably you want to replace the dead drive with one that works?

zpool offline the dead drive, if it isn't already, pull it, plug in the new
one, do devfsadm -C, cfgadm -al, watch dmesg to see if the ctd changed, then
use zpool replace deaddisk newdisk to get the pool proper again.

Spares are .. spares. They're there for events, not for running in production.

The above process is documented more usefully in the ZFS Administration Guide.

Cheers.
-- 
bdha
cyberpunk is dead. long live cyberpunk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] possible zfs recv bug?

2010-11-23 Thread Matthew Ahrens
I verified that this bug exists in OpenSolaris as well.  The problem is that
we can't destroy the old filesystem a (which has been renamed to
rec2/recv-2176-1 in this case).  We can't destroy it because it has a
child, b.  We need to rename b to be under the new a.  However, we are
not renaming it, which is the root cause of the problem.

This code in recv_incremental_replication() should detect that we should
rename b:

if ((stream_parent_fromsnap_guid != 0 
parent_fromsnap_guid != 0 
stream_parent_fromsnap_guid != parent_fromsnap_guid) ||
...

But this will not trigger because we have already destroyed the snapshots of
b's parent (the old a, now rec2/recv-2176-1), so parent_fromsnap_guid
will be 0.  I believe that the fix for bug 6921421 introduced this code in
build 135, it used to read:

if ((stream_parent_fromsnap_guid != 0 
stream_parent_fromsnap_guid != parent_fromsnap_guid) ||
...

So we will have to investigate and see why the  parent_fromsnap_guid !=
0 is now needed.

--matt

On Tue, Nov 23, 2010 at 6:16 AM, James Van Artsdalen 
james-opensola...@jrv.org wrote:

 I am seeing a zfs recv bug on FreeBSD and am wondering if someone could
 test this in the Solaris code.  If it fails there then I guess a bug report
 into Solaris is needed.

 This is a perverse case of filesystem renaming between snapshots.

 kraken:/root# cat zt

 zpool create rec1 da3
 zpool create rec2 da4

 zfs create rec1/a
 zfs create rec1/a/b

 zfs snapshot -r r...@s1
 zfs send -R r...@s1 | zfs recv -dvuF rec2

 zfs rename rec1/a/b rec1/c
 zfs destroy -r rec1/a
 zfs create rec1/a
 zfs rename rec1/c rec1/a/b # if the rename target is anything other than
 rec1/a/b the zfs recv result is right

 zfs snapshot -r r...@s2
 zfs send -R -I @s1 r...@s2 | zfs recv -dvuF rec2
 kraken:/root# sh -x zt
 + zpool create rec1 da3
 + zpool create rec2 da4
 + zfs create rec1/a
 + zfs create rec1/a/b
 + zfs snapshot -r r...@s1
 + zfs send -R r...@s1
 + zfs recv -dvuF rec2
 receiving full stream of r...@s1 into r...@s1
 received 47.4KB stream in 2 seconds (23.7KB/sec)
 receiving full stream of rec1/a...@s1 into rec2/a...@s1
 received 47.9KB stream in 1 seconds (47.9KB/sec)
 receiving full stream of rec1/a/b...@s1 into rec2/a/b...@s1
 received 46.3KB stream in 1 seconds (46.3KB/sec)
 + zfs rename rec1/a/b rec1/c
 + zfs destroy -r rec1/a
 + zfs create rec1/a
 + zfs rename rec1/c rec1/a/b
 + zfs snapshot -r r...@s2
 + zfs send -R -I @s1 r...@s2
 + zfs recv -dvuF rec2
 attempting destroy rec2/a...@s1
 success
 attempting destroy rec2/a
 failed - trying rename rec2/a to rec2/recv-2176-1
 local fs rec2/a/b new parent not found
 cannot open 'rec2/a/b': dataset does not exist
 another pass:
 attempting destroy rec2/recv-2176-1
 failed (0)
 receiving incremental stream of r...@s2 into r...@s2
 received 10.8KB stream in 2 seconds (5.41KB/sec)
 receiving full stream of rec1/a...@s2 into rec2/a...@s2
 received 47.9KB stream in 1 seconds (47.9KB/sec)
 receiving incremental stream of rec1/a/b...@s2 into rec2/recv-2176-1/b...@s2
 received 312B stream in 2 seconds (156B/sec)
 local fs rec2/a does not have fromsnap (s1 in stream); must have been
 deleted locally; ignoring
 attempting destroy rec2/recv-2176-1
 failed (0)
 kraken:/root# zfs list | grep rec1
 rec1   238K
 1.78T32K  /rec1
 rec1/a  63K
 1.78T32K  /rec1/a
 rec1/a/b31K
 1.78T31K  /rec1/a/b
 kraken:/root# zfs list | grep rec2
 rec2   293K
 1.78T32K  /rec2
 rec2/a  32K
 1.78T32K  /rec2/a
 rec2/recv-2176-164K
 1.78T32K  /rec2/recv-2176-1
 rec2/recv-2176-1/b  32K
  1.78T31K  /rec2/recv-2176-1/b
 kraken:/root#
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAID-Z/mirror hybrid allocator

2010-11-23 Thread StorageConcepts
Hi, 

I did a quick test (because I'm curious also). The Hardware was a 3 SATA Disk 
RaidZ1.

What I did: 

1) Create a pool with NexentaStor 3.0.4 (Pool Version 26, Raidz1 with 3 disks)
2) Disabled all caching (primarycache=none, secondarycache=none) to force media 
access
3) Copied and extracted a recent Linux Kernel to generate meta data intensive 
workload (lots of small files)
4) Copied the Linux Kernel 10 times

Then I booted into SOL11 and did: 

5) ran time du -sh . on the dataset three times and did the average 
6)  upgraded the pool to version 31. 
7)  I rewrote the data (repeated steps 3 and 4). 
8) Then I measured the time again (three times average again as in step 5) 

I did see a ~13% improvement. 

Here are the numbers: 

Pool Version 26: 
---

r...@solaris11:/volumes/mypool# time du -sh .
3.3G.

real1m51.509s
user0m1.178s
sys 0m27.115s
r...@solaris11:/volumes/mypool# time du -sh .
3.3G.

real1m55.953s
user0m1.128s
sys 0m25.510s
r...@solaris11:/volumes/mypool# time du -sh .
3.3G.

real1m48.442s
user0m1.096s
sys 0m24.405s

= 111 Sec

Pool Version 31: 


r...@solaris11:/volumes/mypool# time du -sh .
3.3G.

real1m30.376s
user0m1.049s
sys 0m21.775s

r...@solaris11:/volumes/mypool# time du -sh .
3.3G.

real1m45.745s
user0m1.105s
sys 0m24.739s

r...@solaris11:/volumes/mypool# time du -sh .
3.3G.

real1m38.199s
user0m1.093s
sys 0m24.096s

= 97 Sec

This means 14 seconds faster, which equals to ~13% of the total 111 secs. 

I expect even more exciting results for wider raidz and raidz2 arrays.

Regards, 
Robert
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-11-23 Thread StorageConcepts
I just tested crypto a little and I have some send/receive specific questions 
about it. It would be great if someone could clarify.

Currently ZFS has no background rewriter. However the fact that ZFS applies 
most of the properties and tunables (like dedup or compression) on write time 
for all newly written data makes send/receive a good (offline) workaround to 
compress, decompress or dup/dedup all your data by just doing a local 
send/receive. 

I wanted to test the same thing for crypto. Crypto is a create only propert, so 
I can not change it to encrypt or decrypt a existing dataset.

It seems that I can send a encrypted dataset to a unencrypted target. If I send 
via -p (to keep the properties), this works also and ZFS asks me for a 
passphrase.

When testing the other way around, encrypting existing datasets, I would like 
to send a (unencrypted) dataset to a encrypted target set. It seems however 
that this is not possible. 

---
r...@solaris11:~# zfs list mypool/secret_received
cannot open 'mypool/secret_received': dataset does not exist
r...@solaris11:~# zfs send mypool/plaint...@test | zfs receive -o encryption=on 
mypool/secret_received
cannot receive: cannot override received encryption
---

Is there a implementation/technical  reason for not allowing this ? 

Based on the tests that decryption by send/receive works, I would assume that 
encryption by send/receive would also be technically possible ??

Also it seems that I can not decrypt by send/receive if I want to send all 
other properties: 

---
r...@solaris11:~# zfs send -p mypool/sec...@test | zfs receive -x encryption 
mypool/publicneu
cannot receive: cannot override received encryption
---

This looks like a implementation limitation to me, because actually it is the 
same as the simple working non property send receive  above. 

Some clarification on the impact of encryption to send/receive would help. 

Regards, 
Robert
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-11-23 Thread Darren J Moffat

On 23/11/2010 21:01, StorageConcepts wrote:

r...@solaris11:~# zfs list mypool/secret_received
cannot open 'mypool/secret_received': dataset does not exist
r...@solaris11:~# zfs send mypool/plaint...@test | zfs receive -o encryption=on 
mypool/secret_received
cannot receive: cannot override received encryption
---

Is there a implementation/technical  reason for not allowing this ?


Yes there is, this is because of how the ZPL metadata is written to disk 
- it is slightly different between encrypted and non encrypted cases and 
unfortunately that difference shows up even in the ZFS send stream.


It is a known (and documented in the Admin guide) restriction.

If we allowed the receive to proceed the result would be that some ZPL 
metadata (including filenames) for some files may end up on disk in the 
clear, there are various cases where this could happen but it is most 
likely to happen when the filesystem is being used by Windows clients 
because of the combination of things that happen - but it can equally 
well happen with only local ZPL usage too particularly if there are 
large ACLs in use.


In the mean time the best workaround I can offer is to use 
tar/cpio/rsync, but obviously you lose your snapshot history that way.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] possible zfs recv bug?

2010-11-23 Thread Tom Erickson
Thanks, James, for reporting this, and thanks, Matt, for the analysis. I filed 
7002362 to track this.


Tom

On 11/23/10 10:43 AM, Matthew Ahrens wrote:

I verified that this bug exists in OpenSolaris as well.  The problem is that we
can't destroy the old filesystem a (which has been renamed to
rec2/recv-2176-1 in this case).  We can't destroy it because it has a child,
b.  We need to rename b to be under the new a.  However, we are not
renaming it, which is the root cause of the problem.

This code in recv_incremental_replication() should detect that we should rename 
b:

 if ((stream_parent_fromsnap_guid != 0 
 parent_fromsnap_guid != 0 
 stream_parent_fromsnap_guid != parent_fromsnap_guid) || ...

But this will not trigger because we have already destroyed the snapshots of b's
parent (the old a, now rec2/recv-2176-1), so parent_fromsnap_guid will be 0.
  I believe that the fix for bug 6921421 introduced this code in build 135, it
used to read:

 if ((stream_parent_fromsnap_guid != 0 
 stream_parent_fromsnap_guid != parent_fromsnap_guid) || ...

So we will have to investigate and see why the  parent_fromsnap_guid != 0 is
now needed.

--matt

On Tue, Nov 23, 2010 at 6:16 AM, James Van Artsdalen james-opensola...@jrv.org
mailto:james-opensola...@jrv.org wrote:

I am seeing a zfs recv bug on FreeBSD and am wondering if someone could test
this in the Solaris code.  If it fails there then I guess a bug report into
Solaris is needed.

This is a perverse case of filesystem renaming between snapshots.

kraken:/root# cat zt

zpool create rec1 da3
zpool create rec2 da4

zfs create rec1/a
zfs create rec1/a/b

zfs snapshot -r r...@s1
zfs send -R r...@s1 | zfs recv -dvuF rec2

zfs rename rec1/a/b rec1/c
zfs destroy -r rec1/a
zfs create rec1/a
zfs rename rec1/c rec1/a/b # if the rename target is anything other than
rec1/a/b the zfs recv result is right

zfs snapshot -r r...@s2
zfs send -R -I @s1 r...@s2 | zfs recv -dvuF rec2
kraken:/root# sh -x zt
+ zpool create rec1 da3
+ zpool create rec2 da4
+ zfs create rec1/a
+ zfs create rec1/a/b
+ zfs snapshot -r r...@s1
+ zfs send -R r...@s1
+ zfs recv -dvuF rec2
receiving full stream of r...@s1 into r...@s1
received 47.4KB stream in 2 seconds (23.7KB/sec)
receiving full stream of rec1/a...@s1 into rec2/a...@s1
received 47.9KB stream in 1 seconds (47.9KB/sec)
receiving full stream of rec1/a/b...@s1 into rec2/a/b...@s1
received 46.3KB stream in 1 seconds (46.3KB/sec)
+ zfs rename rec1/a/b rec1/c
+ zfs destroy -r rec1/a
+ zfs create rec1/a
+ zfs rename rec1/c rec1/a/b
+ zfs snapshot -r r...@s2
+ zfs send -R -I @s1 r...@s2
+ zfs recv -dvuF rec2
attempting destroy rec2/a...@s1
success
attempting destroy rec2/a
failed - trying rename rec2/a to rec2/recv-2176-1
local fs rec2/a/b new parent not found
cannot open 'rec2/a/b': dataset does not exist
another pass:
attempting destroy rec2/recv-2176-1
failed (0)
receiving incremental stream of r...@s2 into r...@s2
received 10.8KB stream in 2 seconds (5.41KB/sec)
receiving full stream of rec1/a...@s2 into rec2/a...@s2
received 47.9KB stream in 1 seconds (47.9KB/sec)
receiving incremental stream of rec1/a/b...@s2 into rec2/recv-2176-1/b...@s2
received 312B stream in 2 seconds (156B/sec)
local fs rec2/a does not have fromsnap (s1 in stream); must have been
deleted locally; ignoring
attempting destroy rec2/recv-2176-1
failed (0)
kraken:/root# zfs list | grep rec1
rec1   238K
1.78T32K  /rec1
rec1/a  63K
1.78T32K  /rec1/a
rec1/a/b31K
1.78T31K  /rec1/a/b
kraken:/root# zfs list | grep rec2
rec2   293K
1.78T32K  /rec2
rec2/a  32K
1.78T32K  /rec2/a
rec2/recv-2176-164K
1.78T32K  /rec2/recv-2176-1
rec2/recv-2176-1/b  32K
  1.78T31K  /rec2/recv-2176-1/b
kraken:/root#
--
This message posted from opensolaris.org http://opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss