Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Stephan Budach
 
 I am always experiencing chksum errors while scrubbing my zpool(s), but
 I never experienced chksum errors while resilvering. Does anybody know
 why that would be? 

When you resilver, you're not reading all the data on all the drives.  Only 
just enough to resilver, which doesn't include all the data that was previously 
in-sync (maybe a little of it, but mostly not).  Even if you have a completely 
failed drive, replaced with a completely new empty drive, if you have a 3-way 
mirror, you only need to read one good copy of the data in order to write the 
resilver'd data onto the new drive.  So you could still be failing to detect 
cksum errors on the *other* side of the mirror, which wasn't read during the 
resilver.

What's more, when you resilver, the system is just going to write the target 
disk.  Not go back and verify every written block of the target disk.

So, think of a scrub as a complete, thorough, resilver whereas resilver is 
just a lightweight version, doing only the parts that are known to be out-of 
sync, and without subsequent read verification.


 This happens on all of my servers, Sun Fire 4170M2,
 Dell PE 650 and on any FC storage that I have.

While you apparently have been able to keep the system in production for a 
while, consider yourself lucky.  You have a real problem, and solving it 
probably won't be easy.  Your problem is either hardware, firmware, or drivers. 
 If you have a support contract on the Sun, I would recommend starting there.  
Because the Dell is definitely a configuration that you won't find official 
support for - just a lot of community contributors, who will likely not provide 
a super awesome answer for you super soon. 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jim Klimov
 
 And regarding the considerable activity - AFAIK there is little way
 for ZFS to reliably read and test TXGs newer than X 

My understanding is like this:  When you make a snapshot, you're just creating 
a named copy of the present latest TXG.  When you zfs send incremental from one 
snapshot to another, you're creating the delta between two TXG's, that happen 
to have names.  So when you break a mirror and resilver, it's exactly the same 
operation as an incremental zfs send, it needs to calculate the delta between 
the latest (older) TXG on the previously UNAVAIL device, up to the latest TXG 
on the current pool.  Yes this involves examining the meta tree structure, and 
yes the system will be very busy while that takes place.  But the work load is 
very small relative to whatever else you're likely to do with your pool during 
normal operation, because that's the nature of the meta tree structure ... very 
small relative to the rest of your data.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Nico Williams
 
 I've wanted a system where dedup applies only to blocks being written
 that have a good chance of being dups of others.
 
 I think one way to do this would be to keep a scalable Bloom filter
 (on disk) into which one inserts block hashes.
 
 To decide if a block needs dedup one would first check the Bloom
 filter, then if the block is in it, use the dedup code path, 

How is this different or better than the existing dedup architecture?  If you 
found that some block about to be written in fact matches the hash of an 
existing block on disk, then you've already determined it's a duplicate block, 
exactly as you would, if you had dedup enabled.  In that situation, gosh, it 
sure would be nice to have the extra information like reference count, and 
pointer to the duplicate block, which exists in the dedup table.  

In other words, exactly the way existing dedup is already architected.


 The nice thing about this is that Bloom filters can be sized to fit in
 main memory, and will be much smaller than the DDT.

If you're storing all the hashes of all the blocks, how is that going to be 
smaller than the DDT storing all the hashes of all the blocks?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Edward Harvey
So ... The way things presently are, ideally you would know in advance what 
stuff you were planning to write that has duplicate copies.  You could enable 
dedup, then write all the stuff that's highly duplicated, then turn off dedup 
and write all the non-duplicate stuff.  Obviously, however, this is a fairly 
implausible actual scenario.

In reality, while you're writing, you're going to have duplicate blocks mixed 
in with your non-duplicate blocks, which fundamentally means the system needs 
to be calculating the cksums and entering into DDT, even for the unique 
blocks...  Just because the first time the system sees each duplicate block, it 
doesn't yet know that it's going to be duplicated later.

But as you said, after data is written, and sits around for a while, the 
probability of duplicating unique blocks diminishes over time.  So they're just 
a burden.

I would think, the ideal situation would be to take your idea of un-dedup for 
unique blocks, and take it a step further.  Un-dedup unique blocks that are 
older than some configurable threshold.  Maybe you could have a command for a 
sysadmin to run, to scan the whole pool performing this operation, but it's the 
kind of maintenance that really should be done upon access, too.  Somebody goes 
back and reads a jpg from last year, system reads it and consequently loads the 
DDT entry, discovers that it's unique and has been for a long time, so throw 
out the DDT info.

But, by talking about it, we're just smoking pipe dreams.  Cuz we all know zfs 
is developmentally challenged now.  But one can dream...

finglonger


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iSCSI access patterns and possible improvements?

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: Richard Elling [mailto:richard.ell...@gmail.com]
 Sent: Saturday, January 19, 2013 5:39 PM
 
 the space allocation more closely resembles a variant
 of mirroring,
 like some vendors call RAID-1E

Awesome, thank you.   :-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Nico Williams
Bloom filters are very small, that's the difference.  You might only need a
few bits per block for a Bloom filter.  Compare to the size of a DDT entry.
 A Bloom filter could be cached entirely in main memory.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Nico Williams
 
 To decide if a block needs dedup one would first check the Bloom
 filter, then if the block is in it, use the dedup code path, else the
 non-dedup codepath and insert the block in the Bloom filter.  

Sorry, I didn't know what a Bloom filter was before I replied before - Now I've 
read the wikipedia article and am consequently an expert.   *sic*   ;-)

It sounds like, what you're describing...  The first time some data gets 
written, it will not produce a hit in the Bloom filter, so it will get written 
to disk without dedup.  But now it has an entry in the Bloom filter.  So the 
second time the data block gets written (the first duplicate) it will produce a 
hit in the Bloom filter, and consequently get a dedup DDT entry.  But since the 
system didn't dedup the first one, it means the second one still needs to be 
written to disk independently of the first one.  So in effect, you'll always 
miss the first duplicated block write, but you'll successfully dedup n-1 
duplicated blocks.  Which is entirely reasonable, although not strictly 
optimal.  And sometimes you'll get a false positive out of the Bloom filter, so 
sometimes you'll be running the dedup code on blocks which are actually unique, 
but with some intelligently selected parameters such as Bloom table size, you 
can get this probability to be reasonably small, like less tha
 n 1%.

In the wikipedia article, they say you can't remove an entry from the Bloom 
filter table, which would over time cause consistent increase of false positive 
probability (approaching 100% false positives) from the Bloom filter and 
consequently high probability of dedup'ing blocks that are actually unique; but 
with even a minimal amount of thinking about it, I'm quite sure that's a 
solvable implementation detail.  Instead of storing a single bit for each entry 
in the table, store a counter.  Every time you create a new entry in the table, 
increment the different locations; every time you remove an entry from the 
table, decrement.  Obviously a counter requires more bits than a bit, but it's 
a linear increase of size, exponential increase of utility, and within the 
implementation limits of available hardware.  But there may be a more 
intelligent way of accomplishing the same goal.  (Like I said, I've only 
thought about this minimally).

Meh, well.  Thanks for the interesting thought.  For whatever it's worth.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Tomas Forsman
On 19 January, 2013 - Jim Klimov sent me these 2,0K bytes:

 Hello all,

   While revising my home NAS which had dedup enabled before I gathered
 that its RAM capacity was too puny for the task, I found that there is
 some deduplication among the data bits I uploaded there (makes sense,
 since it holds backups of many of the computers I've worked on - some
 of my homedirs' contents were bound to intersect). However, a lot of
 the blocks are in fact unique - have entries in the DDT with count=1
 and the blkptr_t bit set. In fact they are not deduped, and with my
 pouring of backups complete - they are unlikely to ever become deduped.

Another RFE would be 'zfs dedup mypool/somefs' and basically go through
and do a one-shot dedup. Would be useful in various scenarios. Possibly
go through the entire pool at once, to make dedups intra-datasets (like
the real thing).

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors

2013-01-20 Thread Stephan Budach
Am 20.01.13 16:51, schrieb Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris):

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Stephan Budach

I am always experiencing chksum errors while scrubbing my zpool(s), but
I never experienced chksum errors while resilvering. Does anybody know
why that would be?

When you resilver, you're not reading all the data on all the drives.  Only 
just enough to resilver, which doesn't include all the data that was previously 
in-sync (maybe a little of it, but mostly not).  Even if you have a completely 
failed drive, replaced with a completely new empty drive, if you have a 3-way 
mirror, you only need to read one good copy of the data in order to write the 
resilver'd data onto the new drive.  So you could still be failing to detect 
cksum errors on the *other* side of the mirror, which wasn't read during the 
resilver.

What's more, when you resilver, the system is just going to write the target 
disk.  Not go back and verify every written block of the target disk.

So, think of a scrub as a complete, thorough, resilver whereas resilver is 
just a lightweight version, doing only the parts that are known to be out-of sync, and without 
subsequent read verification.
Well, I always used to issue a scrub after resilver, but since we 
completely re-designed our server room, things started to act up and 
each scrub would at least come up with chksum errors. On the Fire 4170 I 
only noticed these chksum errors, while on the Dell sometimes the whole 
thing broke down and ZFS would mark numerous disks as faulted.



This happens on all of my servers, Sun Fire 4170M2,
Dell PE 650 and on any FC storage that I have.

While you apparently have been able to keep the system in production for a 
while, consider yourself lucky.  You have a real problem, and solving it 
probably won't be easy.  Your problem is either hardware, firmware, or drivers. 
 If you have a support contract on the Sun, I would recommend starting there.  
Because the Dell is definitely a configuration that you won't find official 
support for - just a lot of community contributors, who will likely not provide 
a super awesome answer for you super soon.

I know, I dedicated quite some of my time to keep this setup up and 
running. I do have support coverage for my two Sun Solaris servers, but 
as you may have experienced as well, you're sometimes better off asking 
here first… ;)


I have gone over our SAN setup/topology and maybe I have found at leats 
one issue worth looking at: we do have five QLogic 5600 SanBoxes and one 
of then basically operates as a core switch, were all other ISLs are 
hooked up, That is, this switch has 4 ISLs and 12 storage array 
connects, while the Dell sits on another Sanbox and thus all traffic is 
routed through that switch.


I don't know, but maybe this a bit too much for this setup and the Dell 
hosts around 240 drives, which are mostly located on a neighbour switch. 
I will try and tweak this setup such as that the Dell gets a connection 
on that Sanbox directly which will vastly reduce the inter-switch-traffic.


I am also seeing these warnings in /var/adm/messages on either the Dell 
and the my new Sun Server X2:


Jan 20 18:22:10 solaris11b scsi: [ID 243001 kern.warning] WARNING: 
/pci@0,0/pci8086,3c08@3/pci1077,171@0,1/fp@0,0 (fcp0):
Jan 20 18:22:10 solaris11b  SCSI command to d_id=0x10601 lun=0x0 
failed, Bad FCP response values: rsvd1=0, rsvd2=0, sts-rsvd1=0, 
sts-rsvd2=0, rsplen=0, senselen=0
Jan 20 18:22:10 solaris11b scsi: [ID 243001 kern.warning] WARNING: 
/pci@0,0/pci8086,3c08@3/pci1077,171@0,1/fp@0,0 (fcp0):
Jan 20 18:22:10 solaris11b  SCSI command to d_id=0x30e01 lun=0x1 
failed, Bad FCP response values: rsvd1=0, rsvd2=0, sts-rsvd1=0, 
sts-rsvd2=0, rsplen=0, senselen=0


These are always targeted at LUNs on a remote Sanboxes…



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Jim Klimov

On 2013-01-20 19:55, Tomas Forsman wrote:

On 19 January, 2013 - Jim Klimov sent me these 2,0K bytes:


Hello all,

   While revising my home NAS which had dedup enabled before I gathered
that its RAM capacity was too puny for the task, I found that there is
some deduplication among the data bits I uploaded there (makes sense,
since it holds backups of many of the computers I've worked on - some
of my homedirs' contents were bound to intersect). However, a lot of
the blocks are in fact unique - have entries in the DDT with count=1
and the blkptr_t bit set. In fact they are not deduped, and with my
pouring of backups complete - they are unlikely to ever become deduped.


Another RFE would be 'zfs dedup mypool/somefs' and basically go through
and do a one-shot dedup. Would be useful in various scenarios. Possibly
go through the entire pool at once, to make dedups intra-datasets (like
the real thing).


Yes, but that was asked before =)

Actually, the pool's metadata does contain all the needed bits (i.e.
checksum and size of blocks) such that a scrub-like procedure could
try and find same blocks among unique ones (perhaps with a filter
of this block being referenced from a dataset that currently wants
dedup), throw one out and add a DDT entry to another.

On 2013-01-20 17:16, Edward Harvey wrote:
 So ... The way things presently are, ideally you would know in
 advance what stuff you were planning to write that has duplicate
 copies.  You could enable dedup, then write all the stuff that's
 highly duplicated, then turn off dedup and write all the
 non-duplicate stuff.  Obviously, however, this is a fairly
 implausible actual scenario.

Well, I guess I could script a solution that uses ZDB to dump the
blockpointer tree (about 100Gb of text on my system), and some
perl or sort/uniq/grep parsing over this huge text to find blocks
that are the same but not deduped - as well as those single-copy
deduped ones, and toggle the dedup property while rewriting the
block inside its parent file with DD.

This would all be within current ZFS's capabilities and ultimately
reach the goals of deduping pre-existing data as well as dropping
unique blocks from the DDT. It would certainly not be a real-time
solution (likely might take months on my box - just fetching the
BP tree took a couple of days) and would require more resources
than needed otherwise (rewrites of same userdata, storing and
parsing of addresses as text instead of binaries, etc.)

But I do see how this is doable even today even by a non-expert ;)
(Not sure I'd ever get around to actually doing this thus, though -
it is not a very clean solution nor a performant one).

As a bonus, however, this ZDB dump would also provide an answer
to a frequently-asked question: which files on my system intersect
or are the same - and have some/all blocks in common via dedup?
Knowledge of this answer might help admins with some policy
decisions, be it witch-hunt for hoarders of same files or some
pattern-making to determine which datasets should keep dedup=on...

My few cents,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors

2013-01-20 Thread Jim Klimov
On 2013-01-20 16:56, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Jim Klimov

And regarding the considerable activity - AFAIK there is little way
for ZFS to reliably read and test TXGs newer than X


My understanding is like this:  When you make a snapshot, you're just creating 
a named copy of the present latest TXG.  When you zfs send incremental from one 
snapshot to another, you're creating the delta between two TXG's, that happen 
to have names.  So when you break a mirror and resilver, it's exactly the same 
operation as an incremental zfs send, it needs to calculate the delta between 
the latest (older) TXG on the previously UNAVAIL device, up to the latest TXG 
on the current pool.  Yes this involves examining the meta tree structure, and 
yes the system will be very busy while that takes place.  But the work load is 
very small relative to whatever else you're likely to do with your pool during 
normal operation, because that's the nature of the meta tree structure ... very 
small relative to the rest of your data.


Hmmm... Given that many people use automatic snapshots, those do
provide us many roots for branches of block-pointer tree after a
certain TXG (creation of snapshot and the next live variant of
the dataset).

This might allow resilvering to quickly select only those branches
of the metadata tree that are known or assumed to have changed after
a disk was temporarily lost - and not go over datasets (snapshots)
that are known to have been committed and closed (became read-only)
while that disk was online.

I have no idea if this optimization does take place in ZFS code,
but it seems bound to be there... if not - a worthy RFE, IMHO ;)

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors

2013-01-20 Thread Jim Klimov

Did you try replacing the patch-cables and/or SFPs on the path
between servers and disks, or at least cleaning them? A speck
of dust (or, God forbid, a pixel of body fat from a fingerprint)
caught between the two optic cable cutoffs might cause any kind
of signal weirdness from time to time... and lead to improper
packets of that optic protocol.

Are there switch stats on whether it has seen media errors?

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Richard Elling
On Jan 20, 2013, at 8:16 AM, Edward Harvey imaginat...@nedharvey.com wrote:
 But, by talking about it, we're just smoking pipe dreams.  Cuz we all know 
 zfs is developmentally challenged now.  But one can dream...

I disagree the ZFS is developmentally challenged. There is more development
now than ever in every way: # of developers, companies, OSes, KLOCs, features.
Perhaps the level of maturity makes progress appear to be moving slower than 
it seems in early life?

 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Jim Klimov

On 2013-01-20 17:16, Edward Harvey wrote:

But, by talking about it, we're just smoking pipe dreams.  Cuz we all know zfs 
is developmentally challenged now.  But one can dream...


I beg to disagree. While most of my contribution was so far about
learning stuff and sharing with others, as well as planting some
new ideas and (hopefully, seen as constructively) doubting others -
including the implementation we have now - and I do have yet to
see someone pick up my ideas and turn them into code (or prove
why they are rubbish) -- overall I can't say that development
stagnated by some metric of stagnation or activity.

Yes, maybe there were more cool new things per year popping up
with Sun's concentrated engineering talent and financing, but now
it seems that most players - wherever they work now - took a pause
from the marathon, to refine what was done in the decade before.
And this is just as important as churning out innovations faster
than people can comprehend or audit or use them.

As a loud example of present active development - take the LZ4
quests completed by Saso recently. From what I gather, this is a
single man's job done on-line in the view of fellow list members
over a few months, almost like a reality-show; and I guess anyone
with enough concentration, time and devotion could do likewise.

I suspect many of my proposals to the list might also take some
half of a man-year to complete. Unfortunately for the community
and for part of myself, I now have some higher daily priorities
so that I likely won't sit down and code lots of stuff in the
nearest years (until that Priority goes to school, or so). Maybe
that's why I'm eager to suggest quests for brilliant coders here
who can complete the job better and faster than I ever would ;)
So I'm doing the next best things I can do to help the progress :)

And I don't believe this is in vain, that the development ceased
and my writings are only destined to be stuffed under the carpet.
Be it these RFEs or dome others, better and more useful, I believe
they shall be coded and published in common ZFS code. Sometime...

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Tim Cook
On Sun, Jan 20, 2013 at 6:19 PM, Richard Elling richard.ell...@gmail.comwrote:

 On Jan 20, 2013, at 8:16 AM, Edward Harvey imaginat...@nedharvey.com
 wrote:
  But, by talking about it, we're just smoking pipe dreams.  Cuz we all
 know zfs is developmentally challenged now.  But one can dream...

 I disagree the ZFS is developmentally challenged. There is more development
 now than ever in every way: # of developers, companies, OSes, KLOCs,
 features.
 Perhaps the level of maturity makes progress appear to be moving slower
 than
 it seems in early life?

  -- richard


Well, perhaps a part of it is marketing.   Maturity isn't really an excuse
for not having a long-term feature roadmap.  It seems as though maturity
in this case equals stagnation.  What are the features being worked on we
aren't aware of?  The big ones that come to mind that everyone else is
talking about for not just ZFS but openindiana as a whole and other storage
platforms would be:
1. SMB3 - hyper-v WILL be gaining market share over the next couple years,
not supporting it means giving up a sizeable portion of the market.  Not to
mention finally being able to run SQL (again) and Exchange on a fileshare.
2. VAAI support.
3. the long-sought bp-rewrite.
4. full drive encryption support.
5. tiering (although I'd argue caching is superior, it's still a checkbox).

There's obviously more, but those are just ones off the top of my head that
others are supporting/working on.  Again, it just feels like all the work
is going into fixing bugs and refining what is there, not adding new
features.  Obviously Saso personally added features, but overall there
don't seem to be a ton of announcements to the list about features that
have been added or are being actively worked on.  It feels like all these
companies are just adding niche functionality they need that may or may not
be getting pushed back to mainline.

/debbie-downer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Richard Elling
On Jan 20, 2013, at 4:51 PM, Tim Cook t...@cook.ms wrote:

 On Sun, Jan 20, 2013 at 6:19 PM, Richard Elling richard.ell...@gmail.com 
 wrote:
 On Jan 20, 2013, at 8:16 AM, Edward Harvey imaginat...@nedharvey.com wrote:
  But, by talking about it, we're just smoking pipe dreams.  Cuz we all know 
  zfs is developmentally challenged now.  But one can dream...
 
 I disagree the ZFS is developmentally challenged. There is more development
 now than ever in every way: # of developers, companies, OSes, KLOCs, features.
 Perhaps the level of maturity makes progress appear to be moving slower than
 it seems in early life?
 
  -- richard
 
 Well, perhaps a part of it is marketing.  

A lot of it is marketing :-/

 Maturity isn't really an excuse for not having a long-term feature roadmap.  
 It seems as though maturity in this case equals stagnation.  What are the 
 features being worked on we aren't aware of?

Most of the illumos-centric discussion is on the developer's list. The 
ZFSonLinux 
and BSD communities are also quite active. Almost none of the ZFS developers 
hang
out on this zfs-discuss@opensolaris.org anymore. In fact, I wonder why I'm 
still here...

  The big ones that come to mind that everyone else is talking about for not 
 just ZFS but openindiana as a whole and other storage platforms would be:
 1. SMB3 - hyper-v WILL be gaining market share over the next couple years, 
 not supporting it means giving up a sizeable portion of the market.  Not to 
 mention finally being able to run SQL (again) and Exchange on a fileshare.

I know of at least one illumos community company working on this. However, I do 
not
know their public plans.

 2. VAAI support.  

VAAI has 4 features, 3 of which have been in illumos for a long time. The 
remaining
feature (SCSI UNMAP) was done by Nexenta and exists in their NexentaStor 
product, 
but the CEO made a conscious (and unpopular) decision to keep that code from 
the 
community. Over the summer, another developer picked up the work in the 
community, 
but I've lost track of the progress and haven't seen an RTI yet.

 3. the long-sought bp-rewrite.

Go for it!

 4. full drive encryption support.

This is a key management issue mostly. Unfortunately, the open source code for
handling this (trousers) covers much more than keyed disks and can be unwieldy.
I'm not sure which distros picked up trousers, but it doesn't belong in the 
illumos-gate
and it doesn't expose itself to ZFS.

 5. tiering (although I'd argue caching is superior, it's still a checkbox).

You want to add tiering to the OS? That has been available for a long time via 
the
(defunct?) SAM-QFS project that actually delivered code
http://hub.opensolaris.org/bin/view/Project+samqfs/

If you want to add it to ZFS, that is a different conversation.
 -- richard

 
 There's obviously more, but those are just ones off the top of my head that 
 others are supporting/working on.  Again, it just feels like all the work is 
 going into fixing bugs and refining what is there, not adding new features.  
 Obviously Saso personally added features, but overall there don't seem to be 
 a ton of announcements to the list about features that have been added or are 
 being actively worked on.  It feels like all these companies are just adding 
 niche functionality they need that may or may not be getting pushed back to 
 mainline.
 
 /debbie-downer
 

--

richard.ell...@richardelling.com
+1-760-896-4422









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors

2013-01-20 Thread Stephan Budach

Am 21.01.13 00:21, schrieb Jim Klimov:

Did you try replacing the patch-cables and/or SFPs on the path
between servers and disks, or at least cleaning them? A speck
of dust (or, God forbid, a pixel of body fat from a fingerprint)
caught between the two optic cable cutoffs might cause any kind
of signal weirdness from time to time... and lead to improper
packets of that optic protocol.
I cleaned the patch cables that run from the Dell to its Sanbox, but not 
the other ones - especially not the ISLs, since this would almost 
interrupt our SAN.


Are there switch stats on whether it has seen media errors?
Has anybody gotton QLogic's SanSurfer to work with anything newer than 
Java 1.4.2? ;) I checked the logs on my switches and they don't seem to 
indicate such issues, but I am lacking the real-time monitoring that the 
old SanSurfer provides.


Stephan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss