ackup storage.
To tell the truth we had no corruption starting from snv_60 but can't consider
seriously using ZFS without "zfschk" :(
gino
(will buy Netapp/HDS until zfschk comes out)
--
This message posted from opensolaris.org
___
zfs-
s the failure mode described by Jeff.
> [b]These are both bugs in ZFS and will be fixed. [/b]
I totally agree these covers most of the corruptions we had in past.
Any news about that bugs in recent Nevada release?
Anyone can provide us a detailed procedure to "go back to previous ube
disk.
> I'll run tests with known-broken disks to determine
> how far back we
> need to go in practice -- I'll bet one txg is almost
> always enough.
>
> Jeff
Hi Jeff,
we just losed 2 pools on snv91.
Any news about your workaround to recover pools discarding last
f data ..
ZFS should be at least able to recover pools discarding last txg as you
suggested months ago. Any news about that?
thanks
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
t/import does what you want?
Dick,
Dave made a mistake pulling out the drives with out exporting them first.
For sure also UFS/XFS/EXT4/.. doesn't like that kind of operations but only
with ZFS you risk to loose ALL your data.
that's the point!
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
oteric parameters supplied
> to zdb.
> >
>
> This is CR 6667683
> http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
I think that would solve 99% of ZFS corruption problems!
Is there any EDT for this patch?
tnx
gino
--
This message posted from opensolaris.org
_
> >>>>> "g" == Gino writes:
>
> g> we lost many zpools with multimillion$ EMC,
> Netapp and
> g> HDS arrays just simulating fc switches power
> fails.
> g> The problem is that ZFS can't properly
> recover itself.
> I don'
> > > This is CR 6667683
> > >
> http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
> >
> > I think that would solve 99% of ZFS corruption
> problems!
>
> Based on the reports I've seen to date, I think
> you're right.
>
> > Is there any EDT for this patch?
>
> Well, because of this thread,
not been freed up!
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
1.34T
> 158G
> 4T /dr/netapp11bkpVOL34
> >>
> >> Space has not been freed up!
> >
> > Are there hidden files in that directory? Try "ls
> -la"
> > What happens when you export-import that pool?
>
> On Thu, September 10, 2009 04:27, Gino wrote:
>
> > # cd /dr/netapp11bkpVOL34
> > # rm -r *
> > # ls -la
> >
> > Now there are no files in /dr/netapp11bkpVOL34, but
> >
> > # zfs list|egrep netapp11bkpVOL34
> > dr/netapp11bkpV
Francois you're right!
We just found that it's happening only with files >100GB and S10U7.
We have no problem with SNV_101a.
gino
> Actually there is great chance that you are hitting
> this bug :
>
> "6792701 Removing large holey file does not free
> space&q
> Gino,
> I just had a similar experience and was able to
> import the pool when I
> added the readonly option (zpool import -f -o ro
> )
>
no way ... We still get a panic :(
gino
This message posted from opensolaris.org
_
Is there anyone interested in a kernel dump?
We are sill unable to import the corrupted zpool, even in readonly mode ..
Apr 5 22:27:34 SERVER142 ^Mpanic[cpu2]/thread=fffec9eef0e0:
Apr 5 22:27:34 SERVER142 genunix: [ID 603766 kern.notice] assertion failed:
ss->ss_start <= start (0x67b80
Great news, Pawel!
Waiting to test it :)
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Today we lost an other zpool!
Fortunately it was only a backup repository.
SERVER144@/# zpool import zpool3
internal error: unexpected error 5 at line 773 of ../common/libzfs_pool.c
this zpool was a RAID10 from 4 HDS LUN.
trying to import it into snv_60 (recovery mode) doesn't work.
1677983744, content: kernel
Apr 7 22:54:04 SERVER140 genunix: [ID 409368 kern.notice] ^M100% done: 644612
pages dumped, compression ratio 4.30,
Apr 7 22:54:04 SERVER140 genunix: [ID 851671 kern.notice] dump succeeded
gino
This message posted from opens
> Gino,
>
> Were you able to recover by setting zfs_recover?
>
Unfortunately no :(
Using zfs_recover not allowed us to recover any of the 5 corrupted zpool we had
..
Please note that we lost this pool after a panic caused by trying to import a
corrupted zpool!
tnx,
gino
> Gino,
>
> Can you send me the corefile from the zpool command?
This is the only case where we are unable to import a corrupted zpool but not
having a kernel panic:
SERVER144@/# zpool import zpool3
internal error: unexpected error 5 at line 773 of ../common/libzfs_pool.c
SERVER144@/
ed to have it just fail with an error message but please stop crashing the
kernel.
2) We need a way to recover a corrupted ZFS, trashing the last incompleted
transactions.
Please give us "zfsck" :)
Waiting for comments,
gino
This message posted
> 6322646 ZFS should gracefully handle all devices
> failing (when writing)
>
> Which is being worked on. Using a redundant
> configuration prevents this
> from happening.
What do you mean with "redundant"? All our servers has 2 or 4 HBAs, 2 or 4 fc
switches and storage arrays with redundant c
kernel.
>
> This is:
>
> 6322646 ZFS should gracefully handle all devices
> failing (when writing)
With S10U3 we are still getting kernel panics when trying to import a corrupted
zpool (RAID10)
gino
This message posted from opensolaris.org
__
few times:
-create a raid10 zpool (dual path luns)
-making a high writing load on that zpool
-disabling fc ports on both the fc switches
Each time we get a kernel panic, probably because of 6322646, and sometimes we
get a corrupted zpool.
gino
This message posted from opensolaris
un used on a zpool that has been exported 2 days ago ...
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
e problems. It doesn't matter if
> it's due to an I/O
> error or space map inconsistency or I/O error if we
> can't propagate the
> error.
Any EDT for 6322646 and 6417779 ?
6417779 ZFS: I/O failure (write on ...)
6322646 ZFS should gracefully handle all dev
load.
We are testing now snv60.
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello Robert,
it would be really intresting if you can add a HD RAID 10 lun with UFS to your
comparison.
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman
> On Mon, Apr 23, 2007 at 09:38:47AM -0700, Gino wrote:
> >
> > we had 5 corrupted zpool (on different servers and
> different SANs) !
> > With Solaris up to S10U3 and Nevada up to snv59 we
> are able to corrupt
> > easily a zpool only disconnecting a few times
s of
> 40-50 TB?
We are using a lot of EMC DAE2. Works well with ZFS.
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello Robert,
> G> We are using a lot of EMC DAE2. Works well with
> ZFS.
>
> Without head units?
Yes. Just make sure to format disks to 512 bytes per sector if they are from
EMC.
> Dual-pathed connections to hosts + MPxIO?
sure. Also we are using some Xyratex JBOD box
ZFS will panic during I/O failure if the zpool is not fully redundant.
So you need 2 hba, 2 switches and a RAID10 zpool to keep your server running.
Also upgrade to snv_60 or newer. Older release can corrupt your zpool!
gino
This message posted from opensolaris.org
Upgrade to SNV60 or better ..
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Same problem here (snv_60).
Robert, did you find any solutions?
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi,
your pool could be corrupted.
It was created with SNV_64?
Can you post the kernel panic you are getting?
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman
devices failing (when writing)
Also we found that our backup servers, using ZFS, after 40-60gg days of uptime
starts to show system cpu time > 50%, often using one or two cpus at 100%.
After a reboot, cpu system time go back to 7-11%.
gino
This message posted from opensolaris.
] (sd52):
Sep 3 15:20:10 fb2 SCSI transport failed: reason 'timeout': giving up
"cfgadm -al" or "devfsadm -C" didn't solve the problem.
After a reboot ZFS recognized the drive as failed and all worked well.
Do we need to restart Solaris after a drive fail
Hi Mark,
the drive (147GB, FC 2Gb) failed on a Xyratex JBOD.
Also in the past we had the same problem with a drive failed on a EMC CX JBOD.
Anyway I can't understand why rebooting Solaris solved out the situation ..
Thank you,
Gino
This message posted from opensolari
ZFS
> diagnosis engine was
> enhanced to look for per-vdev soft error rate
> discriminator (SERD) engines.
Richard, thank you for your detailed reply.
Unfortunately an other reason to stay with UFS in production ..
Gino
This message posted from opensolaris.org
__
has integrated ZFS on Solaris and
declared it as stable.
Sun Sales tell you to trash your old redundant arrays and go with jbod and
ZFS...
but don't tell you that you probably will need to reboot your SF25k because of
a disk failure!! :(
Gino
This message posted from opensolaris.org
___
ZFS with
S10U3 and than soon went back with UFS because of the same problems.
Sure, for our home server with cheap ata drives ZFS is unbeatable and free :)
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> On Tue, 2007-09-11 at 13:43 -0700, Gino wrote:
> > -ZFS+FC JBOD: failed hard disk need a reboot
> :(
> > (frankly unbelievable in 2007!)
>
> So, I've been using ZFS with some creaky old FC JBODs
> (A5200's) and old
> disks which have been failin
ink you're right.
The failed disk is still working but it has no space for bad sectors...
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> all the data.
Hi Paul,
may I ask you your medium file size? Have you done some optimization?
ZFS recordsize?
Your test included also writing 1 million files?
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
tell you
> the true we are keeping 2 large zpool in sync on each
> system because we fear an other zpool corruption.
> >
> >
> May I ask how you accomplish that?
During the day we sync pool1 with pool2, then we °umount pool2" duri
> Gino wrote:
> > The real problem is that ZFS should stop to force
> kernel panics.
> >
> I found these panics very annoying, too. And even
> more that the zpool
> was faulted afterwards. But my problem is that when
> someone asks me what
> ZFS should do ins
r central backup servers (for many applications,
systems, customers, ...)
We also manage a lot of other systems and always try to migrate customers to
Solaris because of stability, resource control, DTrace .. but found ZFS
disappointing at today (probably tomor
http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1 ??
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello,
upgrade to snv_60 or later if you care about your data :)
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Tomas Ögren wrote:
> > On 18 September, 2007 - Gino sent me these 0,3K
> bytes:
> >
> >> Hello,
> >> upgrade to snv_60 or later if you care about your
> data :)
> >
> > If there are known serious data loss bug fixes that
> have gone int
; released...
>
> FabOS is FabOS, the nature of the issue is not
> hardware related, it's software related. 2850 or
> 3850 makes no difference.
I think the same.
What other problems can make this errors happens?
gino
This message posted from opensolaris.org
__
> It's safe as long as the pool is safe... but we've
> lost multiple pools.
Hello Tim,
did you try SNV60+ or S10U4 ?
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
ert,
I saw in your post that you had problems with 3510JBOD and multipath.
We are building a poor-man filer with 2xDL360 and 4x3510JBOD and had no
problems so far.
Multipath is working fine for us ..
Can you tell me what you found?
tnx,
Gino
This
Same problem here after some patching :(((
42GB free in a 4.2TB zpool
We can't upgrade to U3 without planning it.
Is there any way to solve the problem?? remove latest patches?
Our uptime with ZFS is going very low ...
thanks
Gino
This message posted from opensolari
Hi All,
this morning one of our edge fc switches died. Thanks to multipath all the
nodes using that switch keeps running except the ONLY ONE still using ZFS (we
went back to UFS on all our production servers) !
In that one we had one path failed and than a CPU panic :(
Here is the logs:
Feb 23
Hi Jason,
saturday we made some tests and found that disabling a FC port under heavy load
(MPXio enabled) often takes to a panic. (using a RAID-Z !)
No problems with UFS ...
later,
Gino
This message posted from opensolaris.org
___
zfs-discuss
Hi Jason,
we done the tests using S10U2, two fc cards, MPXIO.
5 LUN in a raidZ group.
Each LUN was visible to both the fc card.
Gino
> Hi Gino,
>
> Was there more than one LUN in the RAID-Z using the
> port you disabled?
>
This message posted from
genunix: [ID 672855 kern.notice] syncing file
systems...
Feb 28 05:47:32 server141 genunix: [ID 733762 kern.notice] 1
Feb 28 05:47:33 server141 genunix: [ID 904073 kern.notice] done
What happened this time? Any suggest?
thanks,
gino
This message posted from opensolaris.org
0+ MB/s write performance
with 1 link, sometimes just around 40MB/s.
Using only one fc link give us always stable performance at 160MB/s.
Conclusion:
After a day of tests we are going to think that ZFS doesn't work well with
MPXIO.
awaiting fo
me problems with a JBOD (EMC DAE2), a Storageworks EVA and
an old Storageworks EMA.
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> What makes you think that these arrays work with
> mpxio? Every array does
> not automatically work.
They are working rock solid with mpxio and UFS!
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zf
ts too much io and causes the
> machine to do a bus reset.
This sound interesting to me!
Did you find that a scsi bus reset bring to a kernel panic?
What do you get on the logs?
Thanks,
Gino
This message posted from opensolaris.org
___
zfs-dis
Hi all.
One of our server had a panic and now can't mount the zpool anymore!
Here is what I get at boot:
Mar 21 11:09:17 SERVER142 ^Mpanic[cpu1]/thread=90878200:
Mar 21 11:09:17 SERVER142 genunix: [ID 603766 kern.notice] assertion failed:
ss->ss_start <= start (0x67b800 <= 0x67
9
70GB.
tnx,
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I forgot to mention we are using S10U2.
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On HDS arrays we set sd_max_throttle to 8.
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> Gino Ruopolo wrote:
> > On HDS arrays we set sd_max_throttle to 8.
>
> HDS provides an algorithm for estimating
> sd[d]_max_throttle in their
> planning docs. It will vary based on a number of
> different parameters.
> AFAIK, EMC just sets it to 20.
> -- rich
> Gino Ruopolo wrote:
> > Hi All,
> >
> > Last week we had a panic caused by ZFS and then we
> had a corrupted
> > zpool! Today we are doing some test with the same
> data, but on a
> > different server/storage array. While copying the
> data ... pani
Unfortunately we don't have experience with NexSAN.
HDS are quite conservative and with a value of 8 we run quite stable (with UFS).
Also we found that value appropiate for old HP EMA arrays (old units but very
very reliable! Digital products were rocks)
gino
This message posted
Hi Matt,
trying to import our corrupted zpool with snv_60 and 'set zfs:zfs_recover=1' in
/etc/system give us:
Apr 3 20:35:56 SERVER141 ^Mpanic[cpu3]/thread=fffec3860f20:
Apr 3 20:35:56 SERVER141 genunix: [ID 603766 kern.notice] assertion failed:
ss->ss_start <= start (0x67b800 <= 0x67
Other test, same setup.
SOLARIS10:
zpool/a filesystem containing over 10Millions subdirs each containing 10
files of about 1k
zpool/b empty filesystem
rsync -avx /zpool/a/* /zpool/b
time: 14 hours (iostat showing %b = 100 for each lun in the zpool)
FreeBSD:
/vol1/a dir
CPU load was really load under the tests.
During the tests on FreeBSD we found I/O on the stressed filessytem slow but
not freezed!
later,
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http
>
> Hi Gino,
>
> Can you post the 'zpool status' for each pool and
> 'zfs get all'
> for each fs; Any interesting data in the dmesg output
> ?
sure.
1) nothing on dmesg (are you thinking about shared IRQ?)
2) Only using one pool for tests:
# zpool st
> We use FSS, but CPU load was really load under the
> tests.
errata: We use FSS, but CPU load was really LOW under the tests.
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris
> Looks like you have compression turned on?
we made tests with compression on and off and found almost no difference.
CPU load was under 3% ...
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://
Update ...
iostat output during "zpool scrub"
extended device statistics
device r/sw/s Mr/s Mw/s wait actv svc_t %w %b
sd34 2.0 395.20.10.6 0.0 34.8 87.7 0 100
sd3521.0 312.21.22.9 0.0 26.0 78.0 0
> Update ...
>
> iostat output during "zpool scrub"
>
> extended device statistics
>
> w/s Mr/s Mw/s wait actv svc_t %w %b
> 34 2.0 395.20.10.6 0.0 34.8 87.7
> 0 100
> 3521.0 312.21.22.9 0.0 26.0 78.0
> 0 79
> 362
other example:
rsyncing from/to the same zpool:
device r/sw/s Mr/s Mw/s wait actv svc_t %w %b
c625.0 276.51.33.8 1.9 16.5 61.1 0 135
sd44 6.0 158.30.30.4 1.9 15.5 106.2 33 [b]100[/b]
sd45 6.0 37.10.31.1 0.0 0.3
because of the rsync I/O load.
Even if we're using FSS, Solaris seems unable to give a small amount of I/O
resource to ZONEX's activity ...
I know that FSS doesn't deal with I/O but I think Solaris should be smarter ..
To draw a comparison, FreeBSD Jail doesn't suffer from t
>
> More generally, I could suggest that we use an odd
> number of vdevs
> for raidz and an even number for mirrors and raidz2.
> Thoughts?
uhm ... we found serious performance problems also using a RAID-Z of 3 luns ...
Gino
This message posted from
'm not referring to "network I/O" but "storage I/O" ...
thanks,
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
that possible?
tnx,
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> Not right now (without a bunch of shell-scripting).
> I'm working on
> eing able to "send" a whole tree of filesystems &
> their snapshots.
> Would that do what you want?
Exactly! When you think that -really useful- feature will be available?
than
82 matches
Mail list logo