The latter, we run these VMs over NFS anyway and had
ESXi boxes under test already. we were already
separating data exports from VM exports. We use
an in-house developed configuration management/bare
metal system which allows us to install new machines
pretty easily. In this case we just
How did your migration to ESXi go? Are you using it on the same hardware or
did you just switch that server to an NFS server and run the VMs on another
box?
The latter, we run these VMs over NFS anyway and had ESXi boxes under test
already. we were already separating data exports from VM
I'm running nv126 XvM right now. I haven't tried it
without XvM.
Without XvM we do not see these issues. We're running the VMs through NFS now
(using ESXi)...
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
I'm running nv126 XvM right now. I haven't tried
it
without XvM.
Without XvM we do not see these issues. We're running
the VMs through NFS now (using ESXi)...
Interesting. It sounds like it might be an XvM specific bug. I'm glad I
mentioned that in my bug report to Sun. Hopefully they
Travis Tabbal wrote:
I'm running nv126 XvM right now. I haven't tried
it
without XvM.
Without XvM we do not see these issues. We're running
the VMs through NFS now (using ESXi)...
Interesting. It sounds like it might be an XvM specific bug. I'm glad I mentioned that in my bug report to Sun.
We see the same issue on a x4540 Thor system with 500G disks:
lots of:
...
Nov 3 16:41:46 uva.nl scsi: [ID 107833 kern.warning] WARNING:
/p...@3c,0/pci10de,3...@f/pci1000,1...@0 (mpt5):
Nov 3 16:41:46 encore.science.uva.nl Disconnected command timeout for Target
7
...
This system is
I am also running 2 of the Supermicro cards. I just upgraded to b126 and it
seems improved. I am running a large file copy locally. I get these warnings in
the dmesg log. When I do, I/O seems to stall for about 60sec. It comes back up
fine, but it's very annoying. Any hints? I have 4 disks per
I'm having similar issues, with two AOC-USAS-L8i Supermicro 1068e
cards mpt2 and mpt3, running 1.26.00.00IT
It seems to only affect a specific revision of disk. (???)
sd67 Soft Errors: 0 Hard Errors: 127 Transport Errors: 3416
Vendor: ATA Product: WDC WD10EACS-00D Revision: 1A01
So, while we are working on resolving this issue with Sun, let me approach this
from the another perspective: what kind of controller/drive ratio would be the
minimum recommended to support a functional OpenSolaris-based archival
solution? Given the following:
- the vast majority of IO to the
Richard Elling
[richard.ell...@gmail.com] puolesta
Lähetetty: 24. lokakuuta 2009 7:36
Vastaanottaja: Adam Cheal
Kopio: zfs-discuss@opensolaris.org
Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile
ok, see below...
On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:
Here is example of the pool
The iostat I posted previously was from a system we had already tuned the
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in
actv per disk).
I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat
output showed busier disks (%b is higher,
Lähettäjä: zfs-discuss-boun...@opensolaris.org
[zfs-discuss-boun...@opensolaris.org] k#228;ytt#228;j#228;n Adam Cheal
[ach...@pnimedia.com] puolesta
Lähetetty: 24. lokakuuta 2009 12:49
Vastaanottaja: zfs-discuss@opensolaris.org
Aihe: Re: [zfs-discuss] SNV_125 MPT
On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal ach...@pnimedia.com wrote:
The iostat I posted previously was from a system we had already tuned the
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10
in actv per disk).
I reset this value in /etc/system to 7, rebooted,
On Sat, Oct 24, 2009 at 11:20 AM, Tim Cook t...@cook.ms wrote:
On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal ach...@pnimedia.com wrote:
The iostat I posted previously was from a system we had already tuned the
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10
in
more below...
On Oct 24, 2009, at 2:49 AM, Adam Cheal wrote:
The iostat I posted previously was from a system we had already
tuned the zfs:zfs_vdev_max_pending depth down to 10 (as visible by
the max of about 10 in actv per disk).
I reset this value in /etc/system to 7, rebooted, and
On 10/24/09 9:43 AM, Richard Elling wrote:
OK, here we see 4 I/Os pending outside of the host. The host has
sent them on and is waiting for them to return. This means they are
getting dropped either at the disk or somewhere between the disk
and the controller.
When this happens, the sd driver
On Sat, Oct 24, 2009 at 12:30 PM, Carson Gaspar car...@taltos.org wrote:
I saw this with my WD 500GB SATA disks (HDS725050KLA360) and LSI firmware
1.28.02.00 in IT mode, but I (almost?) always had exactly 1 stuck I/O.
Note that my disks were one per channel, no expanders. I have _not_ seen it
The controller connects to two disk shelves (expanders), one per port on the
card. If you look back in the thread, you'll see our zpool config has one vdev
per shelf. All of the disks are Western Digital (model WD1002FBYS-18A6B0) 1TB
7.2K, firmware rev. 03.00C06. Without actually matching up
Hi Cindy,
I have a couple of questions about this issue :
1. i have exactly the same LSI controller in another server running
opensolaris snv_101b, and so far no errors like this ones where
seen in the system
2. up to snv_118 i haven't seen any problems, only now within snv_125
Hi Adam,
How many disks and zpoo/zfs's do you have behind that LSI?
I have a system with 22 disks and 4 zpools with around 30 zfs's and so
far it works like a charm, even during heavy load. The opensolaris
release is snv_101b .
Bruno
Adam Cheal wrote:
Cindy: How can I view the bug report
Our config is:
OpenSolaris snv_118 x64
1 x LSISAS3801E controller
2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
Each of the two external ports on the LSI connects to a 23-disk JBOD. ZFS-wise
we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). Each zpool has
one ZFS
What bug# is this under? I'm having what I believe is the same problem. Is
it possible to just take the mpt driver from a prior build in the time
being?
The below is from the load the zpool scrub creates. This is on a dell t7400
workstation with a 1068E oemed lsi. I updated the firmware to the
Sorry, running snv_123, indiana
On Fri, Oct 23, 2009 at 11:16 AM, Jeremy f rysh...@gmail.com wrote:
What bug# is this under? I'm having what I believe is the same problem. Is
it possible to just take the mpt driver from a prior build in the time
being?
The below is from the load the zpool
Just submitted the bug yesterday, under advice of James, so I don't have a
number you can refer to you...the change request number is 6894775 if that
helps or is directly related to the future bugid.
From what I seen/read this problem has been around for awhile but only rears
its ugly head
Adam Cheal wrote:
Just submitted the bug yesterday, under advice of James, so I don't have a number you can
refer to you...the change request number is 6894775 if that helps or is
directly related to the future bugid.
From what I seen/read this problem has been around for awhile but only
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of hidden
problems found by Sun where the HBA resets, and due to market time
pressure the quick and dirty solution was to spread the load over
multiple HBA's instead of software fix?
Just my 2 cents..
Bruno
Adam Cheal wrote:
Just
Hi Cindy,
Thank you for the update, mas it seems like i can't see any information
specific to that bug.
I can only see bugs number 6702538 and 6615564, but according to their
history, they have been fixed quite some time ago.
Can you by any chance present the information about bug 6694909 ?
On Oct 23, 2009, at 1:48 PM, Bruno Sousa wrote:
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of
hidden problems found by Sun where the HBA resets, and due to
market time pressure the quick and dirty solution was to spread
the load over multiple HBA's instead of software fix?
On Fri, Oct 23, 2009 at 3:48 PM, Bruno Sousa bso...@epinfante.com wrote:
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of hidden
problems found by Sun where the HBA resets, and due to market time pressure
the quick and dirty solution was to spread the load over multiple HBA's
I don't think there was any intention on Sun's part to ignore the
problem...obviously their target market wants a performance-oriented box and
the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY channels
= 1 channel per drive = no contention for channels. The x4540 is a
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal ach...@pnimedia.com wrote:
I don't think there was any intention on Sun's part to ignore the
problem...obviously their target market wants a performance-oriented box and
the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY
channels
LSI's sales literature on that card specs 128 devices which I take with a few
hearty grains of salt. I agree that with all 46 drives pumping out streamed
data, the controller would be overworked BUT the drives will only deliver data
as fast as the OS tells them to. Just because the speedometer
On Oct 23, 2009, at 4:46 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal ach...@pnimedia.com
wrote:
I don't think there was any intention on Sun's part to ignore the
problem...obviously their target market wants a performance-oriented
box and the x4540 delivers that. Each
On Fri, Oct 23, 2009 at 7:17 PM, Adam Cheal ach...@pnimedia.com wrote:
LSI's sales literature on that card specs 128 devices which I take with a
few hearty grains of salt. I agree that with all 46 drives pumping out
streamed data, the controller would be overworked BUT the drives will only
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling richard.ell...@gmail.comwrote:
Tim has a valid point. By default, ZFS will queue 35 commands per disk.
For 46 disks that is 1,610 concurrent I/Os. Historically, it has proven to
be
relatively easy to crater performance or cause problems with
And therein lies the issue. The excessive load that causes the IO issues is
almost always generated locally from a scrub or a local recursive ls used to
warm up the SSD-based zpool cache with metadata. The regular network IO to the
box is minimal and is very read-centric; once we load the box
On Oct 23, 2009, at 5:32 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling richard.ell...@gmail.com
wrote:
Tim has a valid point. By default, ZFS will queue 35 commands per
disk.
For 46 disks that is 1,610 concurrent I/Os. Historically, it has
proven to be
relatively
Here is example of the pool config we use:
# zpool status
pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 2009
config:
NAME STATE READ WRITE CKSUM
pool002 ONLINE 0 0 0
raidz2 ONLINE
ok, see below...
On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:
Here is example of the pool config we use:
# zpool status
pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52
2009
config:
NAME STATE READ WRITE CKSUM
Hi Bruno,
I see some bugs associated with these messages (6694909) that point to
an LSI firmware upgrade that cause these harmless errors to display.
According to the 6694909 comments, this issue is documented in the
release notes.
As they are harmless, I wouldn't worry about them.
Maybe
Cindy: How can I view the bug report you referenced? Standard methods show my
the bug number is valid (6694909) but no content or notes. We are having
similar messages appear with snv_118 with a busy LSI controller, especially
during scrubbing, and I'd be interested to see what they mentioned
Adam Cheal wrote:
Cindy: How can I view the bug report you referenced? Standard methods
show my the bug number is valid (6694909) but no content or notes. We are
having similar messages appear with snv_118 with a busy LSI controller,
especially during scrubbing, and I'd be interested to see what
James: We are running Phase 16 on our LSISAS3801E's, and have also tried the
recently released Phase 17 but it didn't help. All firmware NVRAM settings are
default. Basically, when we put the disks behind this controller under load
(e.g. scrubbing, recursive ls on large ZFS filesystem) we get
Adam Cheal wrote:
James: We are running Phase 16 on our LSISAS3801E's, and have also tried
the recently released Phase 17 but it didn't help. All firmware NVRAM
settings are default. Basically, when we put the disks behind this
controller under load (e.g. scrubbing, recursive ls on large ZFS
I've filed the bug, but was unable to include the prtconf -v output as the
comments field only accepted 15000 chars total. Let me know if there is
anything else I can provide/do to help figure this problem out as it is
essentially preventing us from doing any kind of heavy IO to these pools,
On 10/22/09 4:07 PM, James C. McPherson wrote:
Adam Cheal wrote:
It seems to be timing out accessing a disk, retrying, giving up and then
doing a bus reset?
...
ugh. New bug time - bugs.opensolaris.org, please select
Solaris / kernel / driver-mpt. In addition to the error
messages and
46 matches
Mail list logo