Re: [zfs-discuss] Repairing Faulted ZFS pool when zbd doesn't recognize the pool as existing

2012-07-09 Thread Kwang Whee Lee
Hi Chris,

Notice your message below, would you mind to share the steps on how the 
recovery works for you? I have kind of similar issue.



Quick update;

 George has been very helpful, and there is progress with my zpool. I've got 
partial read ability at this point, and some data is being copied off.



It was _way_ beyond my skillset to do anything.



Once we have things resolved to a better level, I'll post more details (with a 
lot of help from George I'd say).

-

Thanks in advance for your inputs!

Regards
Kwang Whee Lee

EMAIL DISCLAIMER This email message and its attachments are confidential and 
may also contain copyright or privileged material. If you are not the intended 
recipient, you may not forward the email or disclose or use the information 
contained in it. If you have received this email message in error, please 
advise the sender immediately by replying to this email and delete the message 
and any associated attachments. Any views, opinions, conclusions, advice or 
statements expressed in this email message are those of the individual sender 
and should not be relied upon as the considered view, opinion, conclusions, 
advice or statement of this company except where the sender expressly, and with 
authority, states them to be the considered view, opinion, conclusions, advice 
or statement of this company. Every care is taken but we recommend that you 
scan any attachments for viruses.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing Faulted ZFS pool and missing disks

2012-07-09 Thread Kwang Whee Lee
Hello all,

I have been struggled with ZFS and my data on the OpenSolaris 2009.06 and 
Solaris 11. Last month, my ZFS pool tank (with RAIDz1 configured) became 
unavailable and 4 out of 6 SCSI disks could not be recognized by OpenSolaris 
#format command.


1)  The four missing Seagate disks (1000.20GB) are c7t0d0, c7t1d0, c7t3d0, 
and c7t4d0.

root@MEL-SUN-X2270:~# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c7t2d0 
  /pci@0,0/pci8086,340e@7/pci1000,3150@0/sd@2,0
   1. c7t5d0 
  /pci@0,0/pci8086,340e@7/pci1000,3150@0/sd@5,0
   2. c9d0 
  /pci@0,0/pci-ide@1f,2/ide@0/cmdk@0,0
   3. c10d0 
  /pci@0,0/pci-ide@1f,2/ide@1/cmdk@0,0
   4. c10d1 
  /pci@0,0/pci-ide@1f,2/ide@1/cmdk@1,0



root@MEL-SUN-X2270:~# iostat -E
cmdk0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: HITACHI HUA7250 Revision:  Serial No: GTF402P6GUUS3F  Size: 500.10GB 
<500101152768 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
cmdk1 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: HITACHI HUA7250 Revision:  Serial No: GTF402P6GUUGEF  Size: 500.10GB 
<500101152768 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
cmdk2 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: SSDSA2SH032G1SB Revision:  Serial No: CVEM02830008032 Size: 32.00GB 
<31999500288 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
sd1   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST31000528AS Revision: CC35 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd2   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST31000528AS Revision: CC35 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd3   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST31000528AS Revision: CC38 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd4   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST31000528AS Revision: CC37 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd5   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST31000528AS Revision: CC35 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
sd6   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST31000528AS Revision: CC37 Serial No:
Size: 1000.20GB <1000204886016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0



root@MEL-SUN-X2270:~# zpool status -v
  pool: rpool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
rpoolONLINE   0 0 0
  mirror ONLINE   0 0 0
c9d0s0   ONLINE   0 0 0
c10d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: tank
state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankUNAVAIL  0 0 0  insufficient replicas
  raidz1UNAVAIL  0 0 0  insufficient replicas
c7t0d0  UNAVAIL  0 0 0  cannot open
c7t1d0  UNAVAIL  0 0 0  cannot open
c7t2d0  ONLINE   0 0 0
c7t3d0  UNAVAIL  0 0 0  cannot open
c7t4d0  UNAVAIL  0 0 0  cannot open
c7t5d0  ONLINE   0 0 0

  pool: temppool1
state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
temppool1   UNAVAIL  0 0 0  insufficient replicas
  c13t0d0   UNAVAIL  0 0 0  cannot open


2)  I did try power-cycle my SunFire X2270 server and the J4200 array. The 
tank was remained faulted and the 

[zfs-discuss] Cannot reset ZFS reservation and refreservation on volume

2012-07-09 Thread Dan Vatca
When creating a new zfs volume the calculated refreservation is greater than 
volsize to account for number of copies and metadata:

root@test:~# zfs create -V 1G rpool/test
root@test:~# zfs get -Hp volsize,volblocksize,copies,refreservation rpool/test
rpool/test  volsize 1073741824  local
rpool/test  volblocksize8192-
rpool/test  copies  1   default
rpool/test  refreservation  1107820544  local

After I set refreservation to none, I am no longer able to reset refreservation 
back to the required refreservation, since there is a check in libzfs that 
prevents it:

root@danstore2:/lib# zfs set refreservation=none rpool/test
root@danstore2:/lib# zfs get -Hp volsize,volblocksize,copies,refreservation 
rpool/test
rpool/test  volsize 1073741824  local
rpool/test  volblocksize8192-
rpool/test  copies  1   default
rpool/test  refreservation  0   local
root@danstore2:/lib# zfs set refreservation=1107820544 rpool/test
cannot set property for 'rpool/test': 'refreservation' is greater than current 
volume size

Is this an intended behavior or a bug?

The same is true for reservation. Setting reservation on a volume is also 
limited to volsize, but reading the documentation 
(http://docs.oracle.com/cd/E19253-01/819-5461/gazvb/index.html) I understand 
reservation may be as large as the user wants it to be. I think this is so 
because:
1. "The quota and reservation properties are convenient for managing disk space 
consumed by datasets and their descendents"
2. " … descendents, such as snapshots and clones"
If I understand correctly, the reservation on a volume accounts for all space 
consumed by the volume, its metadata and copies, and its descendant snapshots 
and clones, so it does not make any sense to limit it to volsize.

Getting into libzfs code, I found that zfs_valid_proplist (in libzfs_dataset.c) 
specifically checks and prevents setting reservation and refreservation to more 
than volsize. I think the check should be removed for ZFS_PROP_RESERVATION, and 
limited to zvol_volsize_to_reservation(volsize, nvl) for 
ZFS_PROP_REFRESERVATION (when type == ZFS_TYPE_VOLUME).

Dan Vatca

On  6 Jul 2012, at 0:00, Stefan Ring wrote:

>> Actually, a write to memory for a memory mapped file is more similar to
>> write(2).  If two programs have the same file mapped then the effect on the
>> memory they share is instantaneous because it is the same physical memory.
>> A mmapped file becomes shared memory as soon as it is mapped at least twice.
> 
> True, for some interpretation of "instantaneous". It does not
> establish a happens-before relationship though, as
> store-munmap/mmap-load does.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scenario sanity check

2012-07-09 Thread Ian Collins

On 07/10/12 05:26 AM, Brian Wilson wrote:

Yep, thanks, and to answer Ian with more detail on what TruCopy does.
TruCopy mirrors between the two storage arrays, with software running on
the arrays, and keeps a list of dirty/changed 'tracks' while the mirror
is split. I think they call it something other than 'tracks' for HDS,
but, whatever.  When it resyncs the mirrors it sets the target luns
read-only (which is why I export the zpools first), and the source array
reads the changed tracks, and writes them across dedicated mirror ports
and fibre links to the target array's dedicated mirror ports, which then
brings the target luns up to synchronized. So, yes, like Richard says,
there is IO, but it's isolated to the arrays, and it's scheduled as
lower priority on the source array than production traffic. For example
it can take an hour or more to re-synchronize a particularly busy 250 GB
lun. (though you can do more than one at a time without it taking longer
or impacting production any more unless you choke the mirror links,
which we do our best not to do) That lower priority, dedicated ports on
the arrays, etc, all makes the noticaeble impact on the production
storage luns from the production server as un-noticable as I can make it
in my environment.


Thank you for the background on TruCopy.   Reading the above, it looks 
like you can have pretty long time without a true copy!  I guess my view 
on replication is you are always going to have X number of I/O 
operations and now dense they are depends on how up to date you want 
you're copy to be.


What I still don't understand is why a service interruption is 
preferable to a wee bit more I/O?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scenario sanity check

2012-07-09 Thread Brian Wilson



On 07/06/12, Richard Elling  wrote:




First things first, the panic is a bug. Please file one with your OS 
supplier.More below...


Thanks! It helps that it recurred a second night in a row.



On Jul 6, 2012, at 4:55 PM, Ian Collins wrote:


> On 07/ 7/12 11:29 AM, Brian Wilson wrote:
>
> > On 07/ 6/12 04:17 PM, Ian Collins wrote:
> >
>
> >
> > > On 07/ 7/12 08:34 AM, Brian Wilson wrote:
> > >
> >
>
> >
> > >
> > > > Hello,
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > I'd like a sanity check from people more knowledgeable than myself.
> > > >
> > >
> >
>
> >
> > >
> > > > I'm managing backups on a production system. Previously I was using
> > > >
> > >
> >
>
> >
> > >
> > > > another volume manager and filesystem on Solaris, and I've just switched
> > > >
> > >
> >
>
> >
> > >
> > > > to using ZFS.
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > My model is -
> > > >
> > >
> >
>
> >
> > >
> > > > Production Server A
> > > >
> > >
> >
>
> >
> > >
> > > > Test Server B
> > > >
> > >
> >
>
> >
> > >
> > > > Mirrored storage arrays (HDS TruCopy if it matters)
> > > >
> > >
> >
>
> >
> > >
> > > > Backup software (TSM)
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > Production server A sees the live volumes.
> > > >
> > >
> >
>
> >
> > >
> > > > Test Server B sees the TruCopy mirrors of the live volumes. (it sees
> > > >
> > >
> >
>
> >
> > >
> > > > the second storage array, the production server sees the primary array)
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > Production server A shuts down zone C, and exports the zpools for
> > > >
> > >
> >
>
> >
> > >
> > > > zone C.
> > > >
> > >
> >
>
> >
> > >
> > > > Production server A splits the mirror to secondary storage array,
> > > >
> > >
> >
>
> >
> > >
> > > > leaving the mirror writable.
> > > >
> > >
> >
>
> >
> > >
> > > > Production server A re-imports the pools for zone C, and boots zone C.
> > > >
> > >
> >
>
> >
> > >
> > > > Test Server B imports the ZFS pool using -R /backup.
> > > >
> > >
> >
>
> >
> > >
> > > > Backup software backs up the mounted mirror volumes on Test Server B.
> > > >
> > >
> >
>
> >
> > >
> > > >
> > > >
> > >
> >
>
> >
> > >
> > > > Later in the day after the backups finish, a script exports the ZFS
> > > >
> > >
> >
>
> >
> > >
> > > > pools on test server B, and re-establishes the TruCopy mirror between
> > > >
> > >
> >
>
> >
> > >
> > > > the storage arrays.
> > > >
> > >
> >
>
> >
> > > That looks awfully complicated. Why don't you just clone a snapshot
> > >
> >
>
> >
> > > and back up the clone?
> > >
> >
>
> >
> > >
> > >
> >
>
> > Taking a snapshot and cloning incurs IO. Backing up the clone incurs a
> >
>
> > lot more IO reading off the disks and going over the network. These
> >
>
> > aren't acceptable costs in my situation.
> >
>
>


Yet it is acceptable to shut down the zones and export the pools?
I'm interested to understand how a service outage is preferred over I/O?


> So splitting a mirror and reconnecting it doesn't incur I/O?
>
>


It does.


>
> > The solution is complicated if you're starting from scratch. I'm
> >
>
> > working in an environment that already had all the pieces in place
> >
>
> > (offsite synchronous mirroring, a test server to mount stuff up on,
> >
>
> > scripts that automated the storage array mirror management, etc). It
> >
>
> > was setup that way specifically to accomplish short downtime outages for
> >
>
> > cold backups with minimal or no IO hit to production. So while it's
> >
>
> > complicated, when it was put together it was also the most obvious thing
> >
>
> > to do to drop my backup window to almost nothing, and keep all the IO
> >
>
> > from the backup from impacting production. And like I said, with a
> >
>
> > different volume manager, it's been rock solid for years.
> >
>
>


... where data corruption is blissfully ignored? I'm not sure what volume
manager you were using, but SVM has absolutely zero data integrity
checking :-( And no, we do not miss using SVM :-)

I was trying to avoid sounding like a brand snob ('my old volume manager 
did X, why doesn't ZFS?'), because that's truely not my attitude, I 
prefer ZFS. I was using VxVM and VxFS - still no integrity checking, I 
agree :-)









>
> > So, to ask the sanity check more specifically -
> >
>
> > Is it reasonable to expect ZFS pools to be exported, have their luns
> >
>
> > change underneath, then later import the same pool on those changed
> >
>
> > drives again?
> >
>
>


Yes, we do this quite frequently. And it is tested ad nauseum. Methinks it is
simply a bug, perhaps one that is already fixed.


Excellent, that's exactly what I was hoping to hear. Thank you!



> If you were splitting ZFS mirrors to read data from one half all would be 
sweet (and you wouldn't have to export the pool). I guess the question here is 
what does TruCopy do under the hood when you re-connect the mirror?