Re: [zfs-discuss] unable to import the zpool

2012-08-01 Thread Suresh Kumar
Hi Hung-sheng,

Thanks for your response.

I tried to import the zpool using *zpool import -nF tXstpool*
please consider the below output.

*bash-3.2#  zpool import -nF tXstpool
bash-3.2#
bash-3.2# zpool status tXstpool
cannot open 'tXstpool': no such pool
*
I got these meesages when I run the command using *truss.*

* truss -aefo /zpool.txt zpool import -F tXstpool*

  742  14582:  ioctl(3, ZFS_IOC_POOL_STATS, 0x08041F40)Err#2 ENOENT
  743  14582:  ioctl(3, ZFS_IOC_POOL_TRYIMPORT, 0x08041F90)= 0
  744  14582:  sysinfo(SI_HW_SERIAL, "75706560", 11)   = 9
  745  14582:  ioctl(3, ZFS_IOC_POOL_IMPORT, 0x08041C40)   Err#6 ENXIO
  746  14582:  fstat64(2, 0x08040C70)  = 0
  747  14582:  write(2, " c a n n o t   i m p o r".., 24)  = 24
  748  14582:  write(2, " :  ", 2) = 2
  749  14582:  write(2, " o n e   o r   m o r e  ".., 44)  = 44
  750  14582:  write(2, "\n", 1)   = 1
*Thanks & Regards*
*Suresh*
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Peter Jeremy
On 2012-Aug-01 21:00:46 +0530, Nigel W  wrote:
>I think a fantastic idea for dealing with the DDT (and all other
>metadata for that matter) would be an option to put (a copy of)
>metadata exclusively on a SSD.

This is on my wishlist as well.  I believe ZEVO supports it so possibly
it'll be available in ZFS in the near future.

-- 
Peter Jeremy


pgpNyzMT6fOdD.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] online increase of zfs after LUN increase ?

2012-08-01 Thread Habony, Zsolt
Hello,

I have run "zpool online -e" only.
I have not set autoexpand property, (as it is not set by default.)

(My understanding was that a controlled way of expansion is "zpool online -e", 
where you decide when to increase actually, 
and a "non-controlled" fully automatic way was setting autoexpand on.)

I have no detailed description of the bug, as I have no access to internal bug 
database, though it looked like LUN size change is visible for Solaris, (format 
indeed showed a bigger size for me ), but vtoc,  and partition sizes remain the 
old small sizes.  And I would have had to resize partitions manually.

Zsolt


From: Cindy Swearingen [cindy.swearin...@oracle.com]
Sent: Wednesday, August 01, 2012 8:00 PM
To: Habony, Zsolt
Cc: Hung-Sheng Tsao (LaoTsao) Ph.D; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] online increase of zfs after LUN increase ?

Hi--

If the S10 patch is installed on this system...

Can you remind us if you ran the zpool online -e command after the
LUN is expanded and the autoexpand propery is set?

I hear that some storage that doesn't generate the correct codes
in response to a LUN expansion so you might need to run this command
even if autoexpand is set.

Thanks,

Cindy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Jim Klimov

2012-08-01 23:34, opensolarisisdeadlongliveopensolaris пишет:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Jim Klimov

Well, there is at least a couple of failure scenarios where
copies>1 are good:

1) A single-disk pool, as in a laptop. Noise on the bus,
 media degradation, or any other reason to misread or
 miswrite a block can result in a failed pool.


How does mac/win/lin handle this situation?  (Not counting btrfs.)

Such noise might result in a temporarily faulted pool (blue screen of death) 
that is fully recovered after reboot.



In some of my cases I was "lucky" enough to get a corrupted /sbin/init
or something like that once, and the box had no other BE's yet, so the
OS could not do anything reasonable after boot. It is different from a
"corrupted zpool", but ended in a useless OS image due to one broken
sector nonetheless.


> Meanwhile you're always paying for it in terms of performance, and 
it's all solvable via pool redundancy.


For a single-disk box, "copies" IS the redundancy. ;)

The discussion did stray off from my original question, though ;)

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Jim Klimov

2012-08-01 23:40, opensolarisisdeadlongliveopensolaris пишет:


Agreed, ARC/L2ARC help in finding the DDT, but whenever you've got a snapshot 
destroy (happens every 15 minutes) you've got a lot of entries you need to 
write.  Those are all scattered about the pool...  Even if you can find them 
fast, it's still a bear.


No, these entries you need to update are scattered around your
SSD (be it ARC or a hypothetical SSD-based copy of metadata
which I also "campaigned" for some time ago). We agreed (or
assumed) that with SSDs in place you can find the DDT entries
to update relatively fast now. The values are changed in RAM
and flushed to disk as part of an upcoming TXG commit, likely
in a limited number of disk head strokes (lots to coalesce),
and the way I see it - the updated copy remains in the ARC
instead of the obsolete DDT entry, and can make it into L2ARC
sometime in the future, as well.

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread opensolarisisdeadlongliveopensolaris
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> 2012-08-01 22:07, opensolarisisdeadlongliveopensolaris пишет:
> > L2ARC is a read cache.  Hence the "R" and "C" in "L2ARC."
> 
> "R" is replacement, but what the hell ;)
> 
> > This means two major things:
> > #1  Writes don't benefit,
> > and
> > #2  There's no way to load the whole DDT into the cache anyway.  So you're
> guaranteed to have performance degradation with the dedup.
> 
> If the whole DDT does make it into the cache, or onto an SSD
> storing an extra copy of all pool metadata, then searching
> for a particular entry in DDT would be faster. When you write
> (or delete) and need to update the counters in DDT, or even
> ultimately remove an unreferenced entry, then you benefit on
> writes as well - you don't take as long to find DDT entries
> (or determine lack thereof) for the blocks you add or remove.
> 
> Or did I get your answer wrong? ;)

Agreed, ARC/L2ARC help in finding the DDT, but whenever you've got a snapshot 
destroy (happens every 15 minutes) you've got a lot of entries you need to 
write.  Those are all scattered about the pool...  Even if you can find them 
fast, it's still a bear.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread opensolarisisdeadlongliveopensolaris
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
> 
> Well, there is at least a couple of failure scenarios where
> copies>1 are good:
> 
> 1) A single-disk pool, as in a laptop. Noise on the bus,
> media degradation, or any other reason to misread or
> miswrite a block can result in a failed pool. 

How does mac/win/lin handle this situation?  (Not counting btrfs.)

Such noise might result in a temporarily faulted pool (blue screen of death) 
that is fully recovered after reboot.  Meanwhile you're always paying for it in 
terms of performance, and it's all solvable via pool redundancy.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Tomas Forsman
On 01 August, 2012 - opensolarisisdeadlongliveopensolaris sent me these 1,8K 
bytes:

> > From: Sa??o Kiselkov [mailto:skiselkov...@gmail.com]
> > Sent: Wednesday, August 01, 2012 9:56 AM
> > 
> > On 08/01/2012 03:35 PM, opensolarisisdeadlongliveopensolaris wrote:
> > >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > >> boun...@opensolaris.org] On Behalf Of Jim Klimov
> > >>
> > >> Availability of the DDT is IMHO crucial to a deduped pool, so
> > >> I won't be surprised to see it forced to triple copies.
> > >
> > > IMHO, the more important thing for dedup moving forward is to create an
> > option to dedicate a fast device (SSD or whatever) to the DDT.  So all those
> > little random IO operations never hit the rusty side of the pool.
> > 
> > That's something you can already do with an L2ARC. In the future I plan
> > on investigating implementing a set of more fine-grained ARC and L2ARC
> > policy tuning parameters that would give more control into the hands of
> > admins over how the ARC/L2ARC cache is used.
> 
> L2ARC is a read cache.  Hence the "R" and "C" in "L2ARC."

"Adaptive Replacement Cache", right.

> This means two major things:
> #1  Writes don't benefit, 
> and
> #2  There's no way to load the whole DDT into the cache anyway.  So you're 
> guaranteed to have performance degradation with the dedup.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Jim Klimov

2012-08-01 22:07, opensolarisisdeadlongliveopensolaris пишет:

L2ARC is a read cache.  Hence the "R" and "C" in "L2ARC."


"R" is replacement, but what the hell ;)


This means two major things:
#1  Writes don't benefit,
and
#2  There's no way to load the whole DDT into the cache anyway.  So you're 
guaranteed to have performance degradation with the dedup.


If the whole DDT does make it into the cache, or onto an SSD
storing an extra copy of all pool metadata, then searching
for a particular entry in DDT would be faster. When you write
(or delete) and need to update the counters in DDT, or even
ultimately remove an unreferenced entry, then you benefit on
writes as well - you don't take as long to find DDT entries
(or determine lack thereof) for the blocks you add or remove.

Or did I get your answer wrong? ;)

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread opensolarisisdeadlongliveopensolaris
> From: opensolarisisdeadlongliveopensolaris
> Sent: Wednesday, August 01, 2012 2:08 PM
>  
> L2ARC is a read cache.  Hence the "R" and "C" in "L2ARC."
> This means two major things:
> #1  Writes don't benefit,
> and
> #2  There's no way to load the whole DDT into the cache anyway.  So you're
> guaranteed to have performance degradation with the dedup.

In other words, the DDT is always written in rust (written in main pool).  You 
gain some performance by adding arc/l2arc/log devices, but it can only reduce 
the problem.  Not solved.

The problem would be solved if you could choose to dedicate an SSD mirror for 
DDT, and either allow the pool size to be limited by the amount of DDT storage 
available, or overflow into the main pool if the DDT device got full.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread opensolarisisdeadlongliveopensolaris
> From: Sašo Kiselkov [mailto:skiselkov...@gmail.com]
> Sent: Wednesday, August 01, 2012 9:56 AM
> 
> On 08/01/2012 03:35 PM, opensolarisisdeadlongliveopensolaris wrote:
> >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> >> boun...@opensolaris.org] On Behalf Of Jim Klimov
> >>
> >> Availability of the DDT is IMHO crucial to a deduped pool, so
> >> I won't be surprised to see it forced to triple copies.
> >
> > IMHO, the more important thing for dedup moving forward is to create an
> option to dedicate a fast device (SSD or whatever) to the DDT.  So all those
> little random IO operations never hit the rusty side of the pool.
> 
> That's something you can already do with an L2ARC. In the future I plan
> on investigating implementing a set of more fine-grained ARC and L2ARC
> policy tuning parameters that would give more control into the hands of
> admins over how the ARC/L2ARC cache is used.

L2ARC is a read cache.  Hence the "R" and "C" in "L2ARC."
This means two major things:
#1  Writes don't benefit, 
and
#2  There's no way to load the whole DDT into the cache anyway.  So you're 
guaranteed to have performance degradation with the dedup.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] online increase of zfs after LUN increase ?

2012-08-01 Thread Cindy Swearingen

Hi--

If the S10 patch is installed on this system...

Can you remind us if you ran the zpool online -e command after the
LUN is expanded and the autoexpand propery is set?

I hear that some storage that doesn't generate the correct codes
in response to a LUN expansion so you might need to run this command
even if autoexpand is set.

Thanks,

Cindy



On 07/26/12 07:04, Habony, Zsolt wrote:

There is bug what I mentioned:  SUNBUG:6430818  Solaris Does Not Automatically 
Handle an Increase in LUN Size
Patch for that is: 148098-03

Its readme says:
Synopsis: Obsoleted by: 147440-15 SunOS 5.10: scsi patch

Looking at current version 147440-21, there is reference for the incorporated 
patch, and for the bug id as well.

(from 148098-03)

6228435 undecoded command in var/adm/messages - Error for Command: undecoded 
cmd 0x5a
6241086 format should allow label adjustment when disk/LUN size changes
6430818 Solaris needs mechanism of dynamically increasing LUN size


-Original Message-
From: Hung-Sheng Tsao (LaoTsao) Ph.D [mailto:laot...@gmail.com]
Sent: 2012. július 26. 14:49
To: Habony, Zsolt
Cc: Cindy Swearingen; Sašo Kiselkov; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] online increase of zfs after LUN increase ?

imho, the 147440-21 does not list the bugs that solved by 148098- even through 
it obsoletes the 148098




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Pool Unavailable

2012-08-01 Thread Richard Elling
On Aug 1, 2012, at 8:04 AM, Jesse Jamez wrote:

> Hello,
> 
> I recently rebooted my workstation and the disk names changed causing my ZFS 
> pool to be unavailable.

What OS and release?

> 
> I did not make any hardware changes?  My first question is the obvious?  Did 
> I loose my data?  Can I recover it?

Yes, just import the pool.

> 
> What would cause the names to change? Delay in the order that the HBA brought 
> them up?

It depends on your OS and OBP (or BIOS).

> 
> How can I correct this problem going forward?

The currently imported pool configurations are recorded in the 
/etc/zfs/zpool.cache
file for Solaris-like OSes. At boot time, the system will try to import the 
pools in the
cache. If the cache contents no longer match reality for non-root pools, then 
the
safest action is to not automatically import the pool. An error message is 
displayed
and should point to a website that tells you how to correct this (NB, depending 
on the
OS, that URL may or may not exist at Oracle (nee Sun))
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Nigel W
On Wed, Aug 1, 2012 at 8:33 AM, Sašo Kiselkov  wrote:
> On 08/01/2012 04:14 PM, Jim Klimov wrote:
>> chances are that
>> some blocks of userdata might be more popular than a DDT block and
>> would push it out of L2ARC as well...
>
> Which is why I plan on investigating implementing some tunable policy
> module that would allow the administrator to get around this problem.
> E.g. administrator dedicates 50G of ARC space to metadata (which
> includes the DDT) or only the DDT specifically. My idea is still a bit
> fuzzy, but it revolves primarily around allocating and policing min and
> max quotas for a given ARC entry type. I'll start a separate discussion
> thread for this later on once I have everything organized in my mind
> about where I plan on taking this.
>

Yes. +1

The L2ARC as is it currently implemented is not terribly useful for
storing the DDT in anyway because each DDT entry is 376 bytes but the
L2ARC reference is 176 bytes, so best case you get just over double
the DDT entries in the L2ARC as what you would get into the ARC but
then you have also have no ARC left for anything else :(.

I think a fantastic idea for dealing with the DDT (and all other
metadata for that matter) would be an option to put (a copy of)
metadata exclusively on a SSD.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Pool Unavailable

2012-08-01 Thread Krunal Desai
On Aug 1, 2012, at 11:06, Jesse Jamez  wrote:

> Hello,
>
> I recently rebooted my workstation and the disk names changed causing my ZFS 
> pool to be unavailable.
>
> I did not make any hardware changes?  My first question is the obvious?  Did 
> I loose my data?  Can I recover it?
>
> What would cause the names to change? Delay in the order that the HBA brought 
> them up?
>
> How can I correct this problem going forward?
>
> Thanks - - - JesseJ

Perhaps some removable drives caused the change in drive names?
Regardless, I believe ZFS stores labels on each disk and is clever
enough to figure out what is what even if the operating system name
has changed.

If I recall correctly, a zpool export (if possible) followed by a
zpool import has always corrected this for me.

Barring an actual disk failure (which could have failed to enumerate
therefore throwing off naming) your data should be safe.

--khd (mobile)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Pool Unavailable

2012-08-01 Thread Jesse Jamez
Hello,

I recently rebooted my workstation and the disk names changed causing my
ZFS pool to be unavailable.

I did not make any hardware changes?  My first question is the obvious?
Did I loose my data?  Can I recover it?

What would cause the names to change? Delay in the order that the HBA
brought them up?

How can I correct this problem going forward?

Thanks - - - JesseJ
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Sašo Kiselkov
On 08/01/2012 04:14 PM, Jim Klimov wrote:
> 2012-08-01 17:55, Sašo Kiselkov пишет:
>> On 08/01/2012 03:35 PM, opensolarisisdeadlongliveopensolaris wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jim Klimov

 Availability of the DDT is IMHO crucial to a deduped pool, so
 I won't be surprised to see it forced to triple copies.
>>>
>>> IMHO, the more important thing for dedup moving forward is to create
>>> an option to dedicate a fast device (SSD or whatever) to the DDT.  So
>>> all those little random IO operations never hit the rusty side of the
>>> pool.
>>
>> That's something you can already do with an L2ARC. In the future I plan
>> on investigating implementing a set of more fine-grained ARC and L2ARC
>> policy tuning parameters that would give more control into the hands of
>> admins over how the ARC/L2ARC cache is used.
> 
> 
> Unfortunately, as of current implementations, L2ARC starts up cold.

Yes, that's by design, because the L2ARC is simply a secondary backing
store for ARC blocks. If the memory pointer isn't valid, chances are,
you'll still be able to find the block on the L2ARC devices. You can't
scan an L2ARC device and discover some usable structures, as there
aren't any. It's literally just a big pile of disk blocks and their
associated ARC headers only live in RAM.

> chances are that
> some blocks of userdata might be more popular than a DDT block and
> would push it out of L2ARC as well...

Which is why I plan on investigating implementing some tunable policy
module that would allow the administrator to get around this problem.
E.g. administrator dedicates 50G of ARC space to metadata (which
includes the DDT) or only the DDT specifically. My idea is still a bit
fuzzy, but it revolves primarily around allocating and policing min and
max quotas for a given ARC entry type. I'll start a separate discussion
thread for this later on once I have everything organized in my mind
about where I plan on taking this.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Jim Klimov

2012-08-01 17:55, Sašo Kiselkov пишет:

On 08/01/2012 03:35 PM, opensolarisisdeadlongliveopensolaris wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Jim Klimov

Availability of the DDT is IMHO crucial to a deduped pool, so
I won't be surprised to see it forced to triple copies.


IMHO, the more important thing for dedup moving forward is to create an option 
to dedicate a fast device (SSD or whatever) to the DDT.  So all those little 
random IO operations never hit the rusty side of the pool.


That's something you can already do with an L2ARC. In the future I plan
on investigating implementing a set of more fine-grained ARC and L2ARC
policy tuning parameters that would give more control into the hands of
admins over how the ARC/L2ARC cache is used.



Unfortunately, as of current implementations, L2ARC starts up cold.
That is, upon every import of the pool the L2ARC is empty, and the
DDT (as in the example above) would have to migrate into the cache
via read-from-rust to RAM ARC and expiration from the ARC. Getting
it to be hot and fast again takes some time, and chances are that
some blocks of userdata might be more popular than a DDT block and
would push it out of L2ARC as well...

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Jim Klimov

2012-08-01 17:35, opensolarisisdeadlongliveopensolaris пишет:

Personally, I've never been supportive of the whole "copies" idea.  If you need more than 
one redundant copy of some data, that's why you have pool redundancy.  You're just hurting 
performance by using "copies."  And protecting against failure conditions that are 
otherwise nearly nonexistent...  And just as easily solved (without performance penalty) via pool 
redundancy.


Well, there is at least a couple of failure scenarios where
copies>1 are good:

1) A single-disk pool, as in a laptop. Noise on the bus,
   media degradation, or any other reason to misread or
   miswrite a block can result in a failed pool. One of
   my older test boxes has an untrustworthy 80Gb HDD for
   its rpool, and the system did crash into an unbootable
   image with just half-a-dozen of CKSUM errors.
   Remaking the rpool with copies=2 enforced from the
   start and rsyncing the rootfs files back into the new
   pool - and this thing works well since then, despite
   finding several errors upon each weekly scrub.

2) The data pool on the same box experienced some errors
   where raidz2 failed to recreate a userdata block, thus
   invalidating a file despite having a 2-disk redundancy.
   There was some discussion of that on the list, and my
   ultimate guess is that the six disks' heads were over
   similar locations of the same file - i.e. during scrub -
   and a power surge or some similar event caused them to
   scramble portions of the disk pertaining to the same
   ZFS block. At least, this could have induced many
   enough errors to make raidz2 protection irrelevant.
   If the pool had copies=2, there would be another replica
   of the same block that would have been not corrupted
   by such assumed failure mechanism - because the disk
   heads were elsewhere.

Hmmm... now I wonder if ZFS checksum validation can try
permutations of should-be-identical sectors from different
copies of a block - in case both copies have received some
non-overlapping errors, and together contain enough data to
reconstruct a ZFS block (and rewrite both its copies now).

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to import the zpool

2012-08-01 Thread Hung-sheng Tsao
Hi
Try 
zpool import -nF tXstpool
To see 
If it can roll back to some. Good state
If you can afford some lost data
zpool  import -F  tXstpool



Sent from my iPhone

On Aug 1, 2012, at 3:21 AM, Suresh Kumar  wrote:

> Dear ZFS-Users,
> 
> I am using Solarisx86 10u10, All the devices which are belongs to my zpool 
> are in available state .
> But I am unable to import the zpool.
> 
> #zpool import tXstpool
> cannot import 'tXstpool': one or more devices is currently unavailable
> ==
> bash-3.2# zpool import
>   pool: tXstpool
> id: 13623426894836622462
>  state: UNAVAIL
> status: One or more devices are missing from the system.
> action: The pool cannot be imported. Attach the missing
> devices and try again.
>see: http://www.sun.com/msg/ZFS-8000-6X
> config:
> 
> tXstpool UNAVAIL  missing device
>   mirror-0   DEGRADED
> c2t210100E08BB2FC85d0s0  FAULTED  corrupted data
> c2t21E08B92FC85d2ONLINE
> 
> Additional devices are known to be part of this pool, though their
> exact configuration cannot be determined.
> =
> Any suggestion regarding this case is very helpful.
> 
> Regards,
> Suresh.
> 
>  
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Sašo Kiselkov
On 08/01/2012 03:35 PM, opensolarisisdeadlongliveopensolaris wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Jim Klimov
>>  
>> Availability of the DDT is IMHO crucial to a deduped pool, so
>> I won't be surprised to see it forced to triple copies. 
> 
> IMHO, the more important thing for dedup moving forward is to create an 
> option to dedicate a fast device (SSD or whatever) to the DDT.  So all those 
> little random IO operations never hit the rusty side of the pool.

That's something you can already do with an L2ARC. In the future I plan
on investigating implementing a set of more fine-grained ARC and L2ARC
policy tuning parameters that would give more control into the hands of
admins over how the ARC/L2ARC cache is used.

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread opensolarisisdeadlongliveopensolaris
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Jim Klimov
>  
> Availability of the DDT is IMHO crucial to a deduped pool, so
> I won't be surprised to see it forced to triple copies. 

Agreed, although, the DDT is also paramount to performance.  In theory, an 
online dedup'd pool could be much faster than non-dedup'd pools, or offline 
dedup'd pools.  So there's a lot of potential here - Lost potential at the 
present.

IMHO, the more important thing for dedup moving forward is to create an option 
to dedicate a fast device (SSD or whatever) to the DDT.  So all those little 
random IO operations never hit the rusty side of the pool.

Personally, I've never been supportive of the whole "copies" idea.  If you need 
more than one redundant copy of some data, that's why you have pool redundancy. 
 You're just hurting performance by using "copies."  And protecting against 
failure conditions that are otherwise nearly nonexistent...  And just as easily 
solved (without performance penalty) via pool redundancy.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Jim Klimov

2012-08-01 16:22, Sašo Kiselkov пишет:

On 08/01/2012 12:04 PM, Jim Klimov wrote:

Probably DDT is also stored with 2 or 3 copies of each block,
since it is metadata. It was not in the last ZFS on-disk spec
from 2006 that I found, for some apparent reason ;)



The idea of the pun was that the latest available full spec is
over half a decade old, alas. At least I failed to find any one
newer, when I searched last winter. And back in 2006 there was
no dedup nor any mention of it in the spec (surprising, huh? ;)

Hopefully with all the upcoming changes - including integration
of feature flags and new checksum and compression algorithms,
the consistent textual document of "Current ZFS On-Disk spec in
illumos(/FreeBSD/...)" would appear and be maintained up-to-date.



That's probably because it's extremely big (dozens, hundreds or even
thousands of GB).


Availability of the DDT is IMHO crucial to a deduped pool, so
I won't be surprised to see it forced to triple copies. Not
that it is very difficult to check with ZDB, though finding
the DDT "dataset" for inspection (when I last tried) was not
an obvious task.

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Sašo Kiselkov
On 08/01/2012 12:04 PM, Jim Klimov wrote:
> Probably DDT is also stored with 2 or 3 copies of each block,
> since it is metadata. It was not in the last ZFS on-disk spec
> from 2006 that I found, for some apparent reason ;)

That's probably because it's extremely big (dozens, hundreds or even
thousands of GB).

Cheers,
--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Jim Klimov

2012-07-31 17:55, opensolarisisdeadlongliveopensolaris пишет:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Nico Williams

The copies thing is a really only for laptops, where the likelihood of
redundancy is very low


ZFS also stores multiple copies of things that it considers "extra important."  
I'm not sure what exactly - uber block, or stuff like that...

When you set the "copies" property, you're just making it apply to other stuff, 
that otherwise would be only 1.


IIRC, the "copies" defaults are:
1 block for userdata
2 blocks for regular metadata (block-pointer tree)
3 blocks for higher-level metadata (metadata tree root, dataset
  definitions)

The "Uberblock" I am not so sure about, from the top of my head.
There is a record in the ZFS labels, and that is stored 4 times
on each leaf VDEV, and points to a ZFS block with the tree root
for the current (newest consistent flushed-to-pool) TXG number.
Which one of these concepts is named The 00bab10c - *that* I am
a bit vague about ;)

Probably DDT is also stored with 2 or 3 copies of each block,
since it is metadata. It was not in the last ZFS on-disk spec
from 2006 that I found, for some apparent reason ;)

Also, I am not sure whether bumping the copies attribute to,
say, "3" increases only the redundancy of userdata, or of
regular metadata as well.

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] known membership of the ZFS Working Group

2012-08-01 Thread Graham Perrin
I understand that some participants would prefer to keep their participation
non-public, and so I do not expect the Group to have a home page at this time.

In the absence of a web page, please can we list here the individual and 
organisational members who are *happy* for their membership to be known?

This should not defocus from technical discussion, neither it is intended to 
detract from this list. However I do see people wondering about membership – 
especially since the transfer of ZEVO – so a publicly available shortlist will 
be useful. 

Thank you. 



What's below is gleaned mostly from 
Illumos: the successor to the OpenSolaris community [LWN.net]
 (2011-06-02), highlights at 
. 

Elsewhere I see other well-known names mentioned, but it's not clear whether 
they're members of the Group. 

I have so far, in alphabetical order by forename or name of organisation: 

Adam Leventhal
Garrett D'Amore
Illumos
Matthew (Matt) Ahrens
Oracle
representatives of FreeBSD
representatives of Linux
representatives of OpenIndiana
representatives of Solaris
Ten's Complement LLC
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] unable to import the zpool

2012-08-01 Thread Suresh Kumar
Dear ZFS-Users,

I am using Solarisx86 10u10, All the devices which are belongs to my zpool
are in available state .
But I am unable to import the zpool.

#zpool import tXstpool
cannot import 'tXstpool': one or more devices is currently unavailable
==
bash-3.2# zpool import
  pool: tXstpool
id: 13623426894836622462
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
config:

tXstpool UNAVAIL  missing device
  mirror-0   DEGRADED
c2t210100E08BB2FC85d0s0  FAULTED  corrupted data
c2t21E08B92FC85d2ONLINE

Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.
=
Any suggestion regarding this case is very helpful.

Regards,
Suresh.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss