There is no substitute for cord-yank tests - many
and often. The
weird part is, the ZFS design team simulated
millions of them.
So the full explanation remains to be uncovered?
We simulated power failure; we did not simulate disks
that simply
blow off write ordering. Any disk that
Hi,
There was a requirement to measure all the OSYNC writes.
Attached is a simple DTrace script which does this using the
fsinfo provider and fbt::fop_write.
I was wondering if this accurate enough or if I missed any other cases.
I am sure this can be improved in many ways.
Thanks and regards,
On Mon, 09 Feb 2009 01:46:01 PST
D. Eckert cont...@desystems.cc wrote:
after working for 1 month with ZFS on 2 external USB drives I have
experienced, that the all new zfs filesystem is the most unreliable
FS I have ever seen.
Since working with the zfs, I have lost datas from:
1 80 GB
On Mon, 09 Feb 2009 01:46:01 PST
D. Eckert cont...@desystems.cc wrote:
after working for 1 month with ZFS on 2 external
USB drives I have
experienced, that the all new zfs filesystem is the
most unreliable
FS I have ever seen.
Since working with the zfs, I have lost datas from:
What filesystem likes it when disks are pulled out from a LIVE
filesystem? Try that on UFS and you're f** up too.
Pulling a disk from a live filesystem is the same as pulling the power
from the computer. All modern filesystems can handle that just fine.
UFS with logging on do not even need
On Mon, Feb 09, 2009 at 11:56:54AM -0800, Gordon Johnson wrote:
I hope this thread catches someone's attention. I've reviewed the root pool
recovery guide as posted. It presupposes a certain level of network support,
for backup and restore, that many opensolaris users may not have.
I did
Need help on removing a faulted spare. We tried following with no
success. There is no resilvering active as shown below:
# zpool clear sybdump_pool c7t0d0 spare device
cannot clear errors for c7t0d0: device is reserved as a hot spare
# zpool remove sybdump_pool c7t0d0
# zpool status -xv
Hi,
There was a requirement to measure all the OSYNC writes.
Attached is a simple DTrace script which does this using the
fsinfo provider and fbt::fop_write.
I was wondering if this accurate enough or if I missed any other cases.
I am sure this can be improved in many ways.
Thanks and regards,
The good news is that ZFS is getting popular enough on consumer-grade
hardware. The bad news is that said hardware has a different set of
failure modes, so it takes a bit of work to become resilient to them.
This is pretty high on my short list.
So does this basically mean zfs rolls-back
Sanjeev wrote:
Hi,
There was a requirement to measure all the OSYNC writes.
Attached is a simple DTrace script which does this using the
fsinfo provider and fbt::fop_write.
I was wondering if this accurate enough or if I missed any other cases.
I am sure this can be improved in many ways.
However, I just want to state a warning, that ZFS is far from being that
what it
is promising, and so far from my sum of experience I can't recommend at all
to
use zfs on a professional system.
Or, perhaps, you've given ZFS disks which are so broken that they are
really unusable; it
Jeff, what do you mean by disks that simply blow off write ordering.?
My experience is that most enterprise disks are some flavor of SCSI, and
host SCSI drivers almost ALWAYS use simple queue tags, implying the
target is free to re-order the commands for performance. Are talking
about something
YES! I recently discovered that VirtualBox apparently defaults to
ignoring flushes, which would, if true, introduce a failure mode
generally absent from real hardware (and eventually resulting in
consistency problems quite unexpected to the user who carefully
configured her journaled
And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if there
was no shock, hit or any other event like that?
I have no information about your particular situation, but you have to
remember the ZFS uncovers problems that otherwise go unnoticed. Just
personally on my private
I have updated zilstat to add observability of the size of the sync
write. Details are at:
http://www.goldensrule.com/zilstat-intro
Enjoy!
-- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
on a UFS ore reiserfs such errors could be corrected.
In general, UFS has zero capability to actually fix real corruption in
any reliable way.
What you normally do with fsck is repairing *expected* inconsistencies
that the file system was *designed* to produce in the event of e.g. a
sudden
On 10-Feb-09, at 1:03 PM, Charles Binford wrote:
Jeff, what do you mean by disks that simply blow off write
ordering.?
My experience is that most enterprise disks are some flavor of
SCSI, and
host SCSI drivers almost ALWAYS use simple queue tags, implying the
target is free to re-order the
jb == Jeff Bonwick jeff.bonw...@sun.com writes:
jb We simulated power failure; we did not simulate disks that
jb simply blow off write ordering. Any disk that you'd ever
jb deploy in an enterprise or storage appliance context gets this
jb right.
Did you simulate power failure
All,
I've been following the thread titled 'ZFS: unreliable for professional use'
and I've learned a few things. Put simply, external devices don't behave like
internal ones.
From JB :
The good news is that ZFS is getting popular enough on consumer-grade
hardware. The bad news is that
g == Gino dandr...@gmail.com writes:
g we lost many zpools with multimillion$ EMC, Netapp and
g HDS arrays just simulating fc switches power fails.
g The problem is that ZFS can't properly recover itself.
I don't like what you call ``the problem''---I think it assumes too
ps == Peter Schuller peter.schul...@infidyne.com writes:
ps This is a recommendation I would give even when you purchase
ps non-cheap battery backed hardware RAID controllers (I won't
ps mention any names or details to avoid bashing as I'm sure it's
ps not specific to the
ps == Peter Schuller peter.schul...@infidyne.com writes:
ps A test I did was to write a minimalistic program that simply
ps appended one block (8k in this case), fsync():ing in between,
ps timing each fsync().
were you the one that suggested writing backwards to make the
difference
On 10 Feb 2009, at 18:35, Bryant Eadon wrote:
Given that ZFS is planned to be used in Snow Leopard, is it worth
setting something up for consumer grade appliance vendors to
'certify' against? (Ok, you play nice with ZFS by doing the right
things, etc.. ) Maybe you can give them a 'Gold
(..)
Dave made a mistake pulling out the drives with out exporting them first.
For sure also UFS/XFS/EXT4/.. doesn't like that kind of operations but only
with ZFS you risk to loose ALL your data.
that's the point!
(...)
I did that many times after performing the umount cmd with ufs/reiserfs
I disagree, see posting above.
ZFS just accepts it 2 or 3 times. after that, your data are passed away to
nirvana for no reason.
And it should be legal, to have an external USB drive with a ZFS. with all
respect, why should a user always care for redundancy, e. g. setup a mirror on
a single
On 2/10/2009 2:50 PM, D. Eckert wrote:
(..)
Dave made a mistake pulling out the drives with out exporting them first.
For sure also UFS/XFS/EXT4/.. doesn't like that kind of operations but only
with ZFS you risk to loose ALL your data.
that's the point!
(...)
I did that many times after
(...)
If anyone asks questions, they get no actual information, but a huge
amount of blame heaped on the sysadmin. Your post is a great example
of the typical way this problem is handled because it does both: deny
information and blame the sysadmin. Though I'm really picking on you
way too much
On Feb 9, 2009, at 7:06 PM, Jeff Bonwick wrote:
There is no substitute for cord-yank tests - many and often. The
weird part is, the ZFS design team simulated millions of them.
So the full explanation remains to be uncovered?
We simulated power failure; we did not simulate disks that simply
blow
On 2/10/2009 2:54 PM, D. Eckert wrote:
I disagree, see posting above.
ZFS just accepts it 2 or 3 times. after that, your data are passed away to
nirvana for no reason.
And it should be legal, to have an external USB drive with a ZFS. with all
respect, why should a user always care for
Hi,
i've followed this thread a bit and I think there are some correct
points on any side of the discussion, but here I see a misconception (at
least I think it is):
D. Eckert schrieb:
(..)
Dave made a mistake pulling out the drives with out exporting them first.
For sure also UFS/XFS/EXT4/..
On Tue, Feb 10, 2009 at 12:46 PM, Miles Nordin car...@ivy.net wrote:
It's likely other filesystems are affected by ``the problem'' as I
define it, just much less so. If that's the case, it'd be much better
IMHO to fix the real problem once and for all, and find it so that it
stays fixed,
On Tue, Feb 10, 2009 at 1:27 PM, Chris Ridd chrisr...@mac.com wrote:
On 10 Feb 2009, at 18:35, Bryant Eadon wrote:
Given that ZFS is planned to be used in Snow Leopard, is it worth setting
something up for consumer grade appliance vendors to 'certify' against?
(Ok, you play nice with ZFS
rs == Roman Shaposhnik r...@sun.com writes:
rs1. as a forensics tool that would let you retrieve as much
rs information as possible from a physically ill device
a nit, but I've never foudn fsck alone useful for this. Maybe for ``a
filesystem trashed by bad RAM/CPU/bugs'' it is
(...)
You don't move a pool with 'zfs umount', that only unmounts a single zfs
filesystem within a pool, but the pool is still active.. 'zpool export'
releases the pool from the OS, then 'zpool import' on the other machine.
(...)
with all respect: I never read such a non logic ridiculous .
I
(...)
Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0
spanning removable drives, you probably wouldn't have been so lucky.
(...)
we are not talking about a RAID 5 array or an LVM. We are talking about a
single FS setup as a zpool over the entire available disk space on an
D. Eckert wrote:
(...)
Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0
spanning removable drives, you probably wouldn't have been so lucky.
(...)
we are not talking about a RAID 5 array or an LVM. We are talking about a
single FS setup as a zpool over the entire available
D. Eckert wrote:
(...)
You don't move a pool with 'zfs umount', that only unmounts a single zfs
filesystem within a pool, but the pool is still active.. 'zpool export'
releases the pool from the OS, then 'zpool import' on the other machine.
(...)
with all respect: I never read such a non
On 10-Feb-09, at 1:05 PM, Peter Schuller wrote:
YES! I recently discovered that VirtualBox apparently defaults to
ignoring flushes, which would, if true, introduce a failure mode
generally absent from real hardware (and eventually resulting in
consistency problems quite unexpected to the user
Dear All,
Is there any way to figure out which piece is at fault? Sun SAS RAID
(Adaptec/Intel) controller is reporting that drives are good, but ZFS is not
happy about checksum errors. Is there any way to figure out which component
introduced the error?
Leonid
--
This message posted from
The good news is that ZFS is getting popular enough on consumer-grade
hardware. The bad news is that said hardware has a different set of
failure modes, so it takes a bit of work to become resilient to them.
This is pretty high on my short list.
One thing I'd like to see is an _easy_ option
DE - could you please post the output of your 'zpool umount usbhdd1'
command? I believe the output will prove useful to the point being
discussed below.
Charles
D. Eckert wrote:
(...)
You don't move a pool with 'zfs umount', that only unmounts a single zfs
filesystem within a pool, but the
On Tue, Feb 10, 2009 at 12:31:05PM -0800, D. Eckert wrote:
(...)
You don't move a pool with 'zfs umount', that only unmounts a single zfs
filesystem within a pool, but the pool is still active.. 'zpool export'
releases the pool from the OS, then 'zpool import' on the other machine.
(...)
I think you are not reading carefully enough, and I
can trace from your reply a typically American
arrogant behavior.
WE, THE PROUDEST AND infallibles on earth DID NEVER MAKE
a mistake. It is just the stupid user who did not read the
fucking manual carefully enough.
Hello? Did you
ps This is a recommendation I would give even when you purchase
ps non-cheap battery backed hardware RAID controllers (I won't
ps mention any names or details to avoid bashing as I'm sure it's
ps not specific to the particular vendor I had problems with most
ps recently).
I'll make a meta comment on the thread itself, not on the ZFS issue.
There is more bashing and broad accusations than it would normally happen on a
professional usage situation. Maybe a board admin can run a script on the ip
addresses logged and find a more subtle meaning... I don't know, I'm
ps A test I did was to write a minimalistic program that simply
ps appended one block (8k in this case), fsync():ing in between,
ps timing each fsync().
were you the one that suggested writing backwards to make the
difference bigger? I guess you found that trick
if you are interested in my IP Address: no problem:
83.236.164.80
it just exactly approves my assumption, that's best and easier for someone - if
he's in the right position - to adhere a big pavement on someone's mouth to
avoid hearing a legal critique instead of discussing out the problem to
Leonid,
You could use the fmdump -eV command to look for problems with these
disks. This command might generate a lot of output, but it should be
clear if the root cause is a problem accessing these devices.
I would also check /var/adm/messages for any driver-related messages.
Cindy
Leonid
tcook,
...
Regarding the GUI, I don't know how to disable it.
There are no virtual consoles, and unlike older
versions of SunOS and Solaris, it comes up in XDM
and there is no [apparent] way to get a shell
without running gnome. I am sure that there is, but
again, I come from the
de == D Eckert cont...@desystems.cc writes:
de from your reply a typically American arrogant behavior.
de WE, THE PROUDEST AND infallibles on earth DID NEVER MAKE a
de mistake.
Maybe I should speak up since I defended you at the start. To my
view:
REASONABLE:
* expect that
Mario Goebbels wrote:
The good news is that ZFS is getting popular enough on consumer-grade
hardware. The bad news is that said hardware has a different set of
failure modes, so it takes a bit of work to become resilient to them.
This is pretty high on my short list.
One thing I'd like
Peter Schuller wrote:
It would actually be nice in general I think, not just for ZFS, to
have some standard run this tool that will give you a check list of
successes/failures that specifically target storage
correctness. Though correctness cannot be proven, you can at least
test for common
On Tue, 10 Feb 2009 13:14:57 PST
D. Eckert cont...@desystems.cc wrote:
Hello? Did you already recognized the sound of the shot??
I learned my lesson well, and in future this won't happen
again, because we will no longer use zfs, but we have a legal
interest, to get back our data we stored in
Roman V. Shaposhnik wrote:
On Wed, 2009-02-11 at 09:49 +1300, Ian Collins wrote:
These posts do sound like someone who is blaming their parents after
breaking a new toy before reading the instructions.
It looks like there's a serious denial of the fact that bad things
do happen to
On February 10, 2009 1:14:57 PM -0800 D. Eckert cont...@desystems.cc
wrote:
I hope I've made myself very clear.
Very. Rarely has the adage what one says reveals more about the
speaker than the subject been more evident.
And as more postings we have to read in the sound of yours as more we
ps == Peter Schuller peter.schul...@infidyne.com writes:
ps This is something I'm interested in, since my preception so
ps far has been that there is only one. Some driver writer has
ps the opinion that flush cache means to flush the cache, while
ps the file system writer uses
Hi again everyone,
OK... I'm even more confused at what is happening here when I try to rejoin the
split zfs send file...
When I cat the split files and pipe through cksum, I get the same cksum as the
original (unsplit) zfs send snapshot:
#cat mypictures.zfssnap.split.a[a-d] |cksum
wellif you want a write barrier, you can issue a flush-cache and
wait for a reply before releasing writes behind the barrier. You will
get what you want by doing this for certain. so a flush-cache is more
forceful than a barrier, as long as you wait for the reply.
Yes, this is another
On Tue, 10 Feb 2009, Tim wrote:
You apparently have not used apple's disk. It's nothing remotely resembling
enterprise-type disk.
That is not true of Apple's only server system Xserve. It uses SAS
disks similar to the ones in the enterprise offerings of Sun, IBM,
etc, and at a similar
wellif you want a write barrier, you can issue a flush-cache and
wait for a reply before releasing writes behind the barrier. You will
get what you want by doing this for certain.
Not if the disk drive just *ignores* barrier and flush-cache commands
and returns success. Some consumer
On 10-Feb-09, at 7:41 PM, Jeff Bonwick wrote:
wellif you want a write barrier, you can issue a flush-cache and
wait for a reply before releasing writes behind the barrier. You
will
get what you want by doing this for certain.
Not if the disk drive just *ignores* barrier and
Could this be relevant? Notice sd_cache_control mismatch message. Thank
you everybody for any ideas or help. I really appreciate it.
Feb 06 2009 23:14:07.704531935 ereport.io.scsi.cmd.disk.dev.uderr
nvlist version: 0
class = ereport.io.scsi.cmd.disk.dev.uderr
ena =
jb == Jeff Bonwick jeff.bonw...@sun.com writes:
tt == Toby Thain t...@telegraphics.com.au writes:
jb Not if the disk drive just *ignores* barrier and flush-cache
jb commands and returns success. Some consumer drives really do
jb exactly that. That's the issue that people are
On February 10, 2009 4:41:35 PM -0800 Jeff Bonwick jeff.bonw...@sun.com
wrote:
Not if the disk drive just *ignores* barrier and flush-cache commands
and returns success. Some consumer drives really do exactly that.
ouch.
If it were possible to detect such disks, I'd add code to ZFS that
On 10-Feb-09, at 10:36 PM, Frank Cusack wrote:
On February 10, 2009 4:41:35 PM -0800 Jeff Bonwick
jeff.bonw...@sun.com wrote:
Not if the disk drive just *ignores* barrier and flush-cache commands
and returns success. Some consumer drives really do exactly that.
ouch.
If it were possible
We have seen some unfortunate miscommunication here, and misinterpretation.
This extends into differences of culture. One of the vocal person in here is
surely not 'Anti-xyz'; rather I sense his intense desire to further the
progress by pointing his finger to some potential wounds.
May I repeat
On Tue, Feb 10, 2009 at 4:14 PM, D. Eckert cont...@desystems.cc wrote:
I think you are not reading carefully enough, and I
can trace from your reply a typically American
arrogant behavior.
WE, THE PROUDEST AND infallibles on earth DID NEVER MAKE
a mistake. It is just the stupid user who did
Good. It looks like this thread can finally die. I received the
following in response to my message below:
This is an automatically generated Delivery Status Notification
Delivery to the following recipient failed permanently:
cont...@desystems.cc
Technical details of permanent failure:
Toby Thain wrote:
On 10-Feb-09, at 10:36 PM, Frank Cusack wrote:
On February 10, 2009 4:41:35 PM -0800 Jeff Bonwick
jeff.bonw...@sun.com wrote:
Not if the disk drive just *ignores* barrier and flush-cache commands
and returns success. Some consumer drives really do exactly that.
ouch.
If
In other words:
Dont feed the troll.
Greets
Jan Dreyer
zfs-discuss-boun...@opensolaris.org wrote :
Good. It looks like this thread can finally die. I received the
following in response to my message below:
This is an automatically generated Delivery Status Notification
Delivery
Fsck can only repair known faults; known
discrepancies in the meta data.
Since ZFS doesn't have such known discrepancies,
there's nothing to repair.
I'm rather tired of hearing this mantra.
If ZFS detects an error in part of its data structures, then there is clearly
something to repair.
71 matches
Mail list logo