not been freed up!
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
!
Are there hidden files in that directory? Try ls
-la
What happens when you export-import that pool?
Also are there snapshots? zfs list -t all
Cheers,
Chris
no snapshot at all
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
On Thu, September 10, 2009 04:27, Gino wrote:
# cd /dr/netapp11bkpVOL34
# rm -r *
# ls -la
Now there are no files in /dr/netapp11bkpVOL34, but
# zfs list|egrep netapp11bkpVOL34
dr/netapp11bkpVOL34 1.34T
158G1.34T
netapp11bkpVOL34
Space has
Francois you're right!
We just found that it's happening only with files 100GB and S10U7.
We have no problem with SNV_101a.
gino
Actually there is great chance that you are hitting
this bug :
6792701 Removing large holey file does not free
space
To check run :
# zdb - name
that would solve 99% of ZFS corruption problems!
Is there any EDT for this patch?
tnx
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
g == Gino dandr...@gmail.com writes:
g we lost many zpools with multimillion$ EMC,
Netapp and
g HDS arrays just simulating fc switches power
fails.
g The problem is that ZFS can't properly
recover itself.
I don't like what you call ``the problem''---I think
it assumes too
much
This is CR 6667683
http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
I think that would solve 99% of ZFS corruption
problems!
Based on the reports I've seen to date, I think
you're right.
Is there any EDT for this patch?
Well, because of this thread, this has gone from
about that?
thanks
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
your data.
that's the point!
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
2 pools on snv91.
Any news about your workaround to recover pools discarding last txg?
thanks
gino
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs
storage.
To tell the truth we had no corruption starting from snv_60 but can't consider
seriously using ZFS without zfschk :(
gino
(will buy Netapp/HDS until zfschk comes out)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
with 3510JBOD and multipath.
We are building a poor-man filer with 2xDL360 and 4x3510JBOD and had no
problems so far.
Multipath is working fine for us ..
Can you tell me what you found?
tnx,
Gino
This message posted from opensolaris.org
___
zfs-discuss
to
http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1 ??
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Tue, 2007-09-11 at 13:43 -0700, Gino wrote:
-ZFS+FC JBOD: failed hard disk need a reboot
:(
(frankly unbelievable in 2007!)
So, I've been using ZFS with some creaky old FC JBODs
(A5200's) and old
disks which have been failing regularly and haven't
seen that; the worst
I've
disk is still working but it has no space for bad sectors...
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
model (many zpools and
datasets within them)
and still present the end users with a single view of
all the data.
Hi Paul,
may I ask you your medium file size? Have you done some optimization?
ZFS recordsize?
Your test included also writing 1 million files?
Gino
This message posted from
have 4 backup hosts. Soon we'll move to 10G network and we'll replicate on
different hosts, as you pointed out.
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman
Gino wrote:
The real problem is that ZFS should stop to force
kernel panics.
I found these panics very annoying, too. And even
more that the zpool
was faulted afterwards. But my problem is that when
someone asks me what
ZFS should do instead, I have no idea.
well, what about just
systems and always try to migrate customers to
Solaris because of stability, resource control, DTrace .. but found ZFS
disappointing at today (probably tomorrow it will be THE filesystem).
Gino
This message posted from opensolaris.org
___
zfs-discuss
home server with cheap ata drives ZFS is unbeatable and free :)
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
old redundant arrays and go with jbod and
ZFS...
but don't tell you that you probably will need to reboot your SF25k because of
a disk failure!! :(
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss
] (sd52):
Sep 3 15:20:10 fb2 SCSI transport failed: reason 'timeout': giving up
cfgadm -al or devfsadm -C didn't solve the problem.
After a reboot ZFS recognized the drive as failed and all worked well.
Do we need to restart Solaris after a drive failure??
Gino
This message posted from
Hi Mark,
the drive (147GB, FC 2Gb) failed on a Xyratex JBOD.
Also in the past we had the same problem with a drive failed on a EMC CX JBOD.
Anyway I can't understand why rebooting Solaris solved out the situation ..
Thank you,
Gino
This message posted from opensolaris.org
devices failing (when writing)
Also we found that our backup servers, using ZFS, after 40-60gg days of uptime
starts to show system cpu time 50%, often using one or two cpus at 100%.
After a reboot, cpu system time go back to 7-11%.
gino
This message posted from opensolaris.org
Hi,
your pool could be corrupted.
It was created with SNV_64?
Can you post the kernel panic you are getting?
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman
Same problem here (snv_60).
Robert, did you find any solutions?
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Upgrade to SNV60 or better ..
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
ZFS will panic during I/O failure if the zpool is not fully redundant.
So you need 2 hba, 2 switches and a RAID10 zpool to keep your server running.
Also upgrade to snv_60 or newer. Older release can corrupt your zpool!
gino
This message posted from opensolaris.org
Hello Robert,
G We are using a lot of EMC DAE2. Works well with
ZFS.
Without head units?
Yes. Just make sure to format disks to 512 bytes per sector if they are from
EMC.
Dual-pathed connections to hosts + MPxIO?
sure. Also we are using some Xyratex JBOD boxes.
gino
This message
are using a lot of EMC DAE2. Works well with ZFS.
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Mon, Apr 23, 2007 at 09:38:47AM -0700, Gino wrote:
we had 5 corrupted zpool (on different servers and
different SANs) !
With Solaris up to S10U3 and Nevada up to snv59 we
are able to corrupt
easily a zpool only disconnecting a few times one
or more luns of a
zpool under high i/o
Hello Robert,
it would be really intresting if you can add a HD RAID 10 lun with UFS to your
comparison.
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman
on a zpool that has been exported 2 days ago ...
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
gracefully handle all devices failing (when writing)
Isn't there a way to increase the timeout to have Solaris just hang
when a lun is not available and have it retry the i/o more times?
gino
This message posted from opensolaris.org
___
zfs-discuss
6322646 ZFS should gracefully handle all devices
failing (when writing)
Which is being worked on. Using a redundant
configuration prevents this
from happening.
What do you mean with redundant? All our servers has 2 or 4 HBAs, 2 or 4 fc
switches and storage arrays with redundant
gracefully handle all devices
failing (when writing)
With S10U3 we are still getting kernel panics when trying to import a corrupted
zpool (RAID10)
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
Each time we get a kernel panic, probably because of 6322646, and sometimes we
get a corrupted zpool.
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman
Gino,
Were you able to recover by setting zfs_recover?
Unfortunately no :(
Using zfs_recover not allowed us to recover any of the 5 corrupted zpool we had
..
Please note that we lost this pool after a panic caused by trying to import a
corrupted zpool!
tnx,
gino
This message posted
Gino,
Can you send me the corefile from the zpool command?
This is the only case where we are unable to import a corrupted zpool but not
having a kernel panic:
SERVER144@/# zpool import zpool3
internal error: unexpected error 5 at line 773 of ../common/libzfs_pool.c
SERVER144
it just fail with an error message but please stop crashing the
kernel.
2) We need a way to recover a corrupted ZFS, trashing the last incompleted
transactions.
Please give us zfsck :)
Waiting for comments,
gino
This message posted from opensolaris.org
Today we lost an other zpool!
Fortunately it was only a backup repository.
SERVER144@/# zpool import zpool3
internal error: unexpected error 5 at line 773 of ../common/libzfs_pool.c
this zpool was a RAID10 from 4 HDS LUN.
trying to import it into snv_60 (recovery mode) doesn't work.
gino
, content: kernel
Apr 7 22:54:04 SERVER140 genunix: [ID 409368 kern.notice] ^M100% done: 644612
pages dumped, compression ratio 4.30,
Apr 7 22:54:04 SERVER140 genunix: [ID 851671 kern.notice] dump succeeded
gino
This message posted from opensolaris.org
Great news, Pawel!
Waiting to test it :)
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Is there anyone interested in a kernel dump?
We are sill unable to import the corrupted zpool, even in readonly mode ..
Apr 5 22:27:34 SERVER142 ^Mpanic[cpu2]/thread=fffec9eef0e0:
Apr 5 22:27:34 SERVER142 genunix: [ID 603766 kern.notice] assertion failed:
ss-ss_start = start (0x67b800
Gino,
I just had a similar experience and was able to
import the pool when I
added the readonly option (zpool import -f -o ro
)
no way ... We still get a panic :(
gino
This message posted from opensolaris.org
___
zfs-discuss mailing
Hi Matt,
trying to import our corrupted zpool with snv_60 and 'set zfs:zfs_recover=1' in
/etc/system give us:
Apr 3 20:35:56 SERVER141 ^Mpanic[cpu3]/thread=fffec3860f20:
Apr 3 20:35:56 SERVER141 genunix: [ID 603766 kern.notice] assertion failed:
ss-ss_start = start (0x67b800 =
Unfortunately we don't have experience with NexSAN.
HDS are quite conservative and with a value of 8 we run quite stable (with UFS).
Also we found that value appropiate for old HP EMA arrays (old units but very
very reliable! Digital products were rocks)
gino
This message posted from
70GB.
tnx,
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
I forgot to mention we are using S10U2.
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On HDS arrays we set sd_max_throttle to 8.
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Gino Ruopolo wrote:
On HDS arrays we set sd_max_throttle to 8.
HDS provides an algorithm for estimating
sd[d]_max_throttle in their
planning docs. It will vary based on a number of
different parameters.
AFAIK, EMC just sets it to 20.
-- richard
you're right but after -a lot
Gino Ruopolo wrote:
Hi All,
Last week we had a panic caused by ZFS and then we
had a corrupted
zpool! Today we are doing some test with the same
data, but on a
different server/storage array. While copying the
data ... panic!
And again we had a corrupted zpool!!
This is bug
and causes the
machine to do a bus reset.
This sound interesting to me!
Did you find that a scsi bus reset bring to a kernel panic?
What do you get on the logs?
Thanks,
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss
and
an old Storageworks EMA.
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
, sometimes just around 40MB/s.
Using only one fc link give us always stable performance at 160MB/s.
Conclusion:
After a day of tests we are going to think that ZFS doesn't work well with
MPXIO.
awaiting for your comments,
Gino
This message posted from opensolaris.org
genunix: [ID 672855 kern.notice] syncing file
systems...
Feb 28 05:47:32 server141 genunix: [ID 733762 kern.notice] 1
Feb 28 05:47:33 server141 genunix: [ID 904073 kern.notice] done
What happened this time? Any suggest?
thanks,
gino
This message posted from opensolaris.org
Hi Jason,
we done the tests using S10U2, two fc cards, MPXIO.
5 LUN in a raidZ group.
Each LUN was visible to both the fc card.
Gino
Hi Gino,
Was there more than one LUN in the RAID-Z using the
port you disabled?
This message posted from opensolaris.org
Hi Jason,
saturday we made some tests and found that disabling a FC port under heavy load
(MPXio enabled) often takes to a panic. (using a RAID-Z !)
No problems with UFS ...
later,
Gino
This message posted from opensolaris.org
___
zfs-discuss
Hi All,
this morning one of our edge fc switches died. Thanks to multipath all the
nodes using that switch keeps running except the ONLY ONE still using ZFS (we
went back to UFS on all our production servers) !
In that one we had one path failed and than a CPU panic :(
Here is the logs:
Feb
Same problem here after some patching :(((
42GB free in a 4.2TB zpool
We can't upgrade to U3 without planning it.
Is there any way to solve the problem?? remove latest patches?
Our uptime with ZFS is going very low ...
thanks
Gino
This message posted from opensolaris.org
!). Is that possible?
tnx,
gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
...
thanks,
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
load.
Even if we're using FSS, Solaris seems unable to give a small amount of I/O
resource to ZONEX's activity ...
I know that FSS doesn't deal with I/O but I think Solaris should be smarter ..
To draw a comparison, FreeBSD Jail doesn't suffer from this problem ...
thank,
Gino
This message
other example:
rsyncing from/to the same zpool:
device r/sw/s Mr/s Mw/s wait actv svc_t %w %b
c625.0 276.51.33.8 1.9 16.5 61.1 0 135
sd44 6.0 158.30.30.4 1.9 15.5 106.2 33 [b]100[/b]
sd45 6.0 37.10.31.1 0.0
Update ...
iostat output during zpool scrub
extended device statistics
device r/sw/s Mr/s Mw/s wait actv svc_t %w %b
sd34 2.0 395.20.10.6 0.0 34.8 87.7 0 100
sd3521.0 312.21.22.9 0.0 26.0 78.0 0
Update ...
iostat output during zpool scrub
extended device statistics
w/s Mr/s Mw/s wait actv svc_t %w %b
34 2.0 395.20.10.6 0.0 34.8 87.7
0 100
3521.0 312.21.22.9 0.0 26.0 78.0
0 79
3620.01.0
Looks like you have compression turned on?
we made tests with compression on and off and found almost no difference.
CPU load was under 3% ...
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
, but CPU load was really load under the tests.
During the tests on FreeBSD we found I/O on the stressed filessytem slow but
not freezed!
later,
Gino
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http
Hi Gino,
Can you post the 'zpool status' for each pool and
'zfs get all'
for each fs; Any interesting data in the dmesg output
?
sure.
1) nothing on dmesg (are you thinking about shared IRQ?)
2) Only using one pool for tests:
# zpool status
pool: zpool1
state: ONLINE
scrub: none
We use FSS, but CPU load was really load under the
tests.
errata: We use FSS, but CPU load was really LOW under the tests.
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
Other test, same setup.
SOLARIS10:
zpool/a filesystem containing over 10Millions subdirs each containing 10
files of about 1k
zpool/b empty filesystem
rsync -avx /zpool/a/* /zpool/b
time: 14 hours (iostat showing %b = 100 for each lun in the zpool)
FreeBSD:
/vol1/a dir
71 matches
Mail list logo