Re: [zfs-discuss] Running on Dell hardware?

2011-01-12 Thread Ben Rockwood
If you're still having issues go into the BIOS and disable C-States, if you 
haven't already.  It is responsible for most of the problems with 11th Gen 
PowerEdge.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2011-01-12 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Ben Rockwood
 
 If you're still having issues go into the BIOS and disable C-States,
if you
 haven't already.  It is responsible for most of the problems with 11th Gen
 PowerEdge.

I did that with no benefit on my R710.  For me the main problem was the
broadcom NIC.  Needed to disable the NIC in bios, and add-on an Intel NIC
instead.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2011-01-03 Thread Stephan Budach

Am 22.12.10 18:47, schrieb Lasse Osterild:

I've just noticed that Dell has a 6.0.1 firmware upgrade available, at least for my 
R610's they do (they are about 3 months old).  Oddly enough it doesn't show up on 
support.dell.com when I search using my servicecode, but if I check through System 
Services / Lifecycle Controller it does find them.

Two of the same servers are running Ubuntu 10.04.1 and RHEL 5.4, several TB's 
of data have gone through the interfaces on those two boxes without a single 
glitch.

So has anyone tried 6.0.1 yet, or is it simply v4.x repackaged with a new 
version number?

  - Lasse

On 12/12/2010, at 14.39, Markus Kovero wrote:


Oh well.  Already, the weird crash has happened again.  So we're concluding
two things:
-1-  The broadcom nic is definitely the cause of the crash.
and
-2-  Even with the new upgrade downgrade, the problem is not solved.
So the solution is add-on intel nic, and disable broadcom integrated nic.

And if I may conclude my own findings.

random crashes and Broadcom issues are separate non-related problems afaik, 
we have some R710's with Broadcom nics that seem to be stable over several months and 
other R710's with cannot keep it together for even week or so. Both have identical 
fw/bios versions.

1) there is/was problem with Broadcom nics loosing network connectivity with 
every OS, including solaris, this was fixed by software patches in sol10 and 
sol11 express, and non official driver update was made for snv_134.
workaround for this issue was to disable c-states from bios under processor 
configuration.
2) there is somekind of unstability issue not related to nics with latest batch of R710 
series servers, crashes occur randomly, but seemed to get fixed in Solaris 11 Express, no 
idea on sol10 though. We have yet to test if this is somehow related to processor/memory 
configuration being used, mind you that software and firmware versions are identical on 
stable R710's and crashing ones.
3) there was also problems with system disk going missing suddenly (when using 
sas 6ir), I think it's somewhat related to problem 2), happens rarely though.
4) Solaris 11 Express and latest R710's introduced new Broadcom problem, random 
network hiccups. Disabling C-states does not help. Planning to open SR for 
this, seems very much different from original problem (OS is not aware what 
happens at all).

My solution for issues would be not to use R710 in anything more serious, it is 
definitely platform that has more problems than I'm interested in debugging for 
(:

Yours
Markus Kovero
Well a couple of weeks before christmas, I enabled the onboard bcom nics 
on my R610 again, to use them as IMPI ports - I didn't even use them in 
Sol11, but as of this morning, the system has entered the state again, 
in which a successful login to the system was not anymore possible.


Neither logging in to the local console was possible - the system didn't 
even prompt for the password.
So, just having the bcom nics present on the host seem to cause these 
troubles, even if Sol11 doesn't have to deal with the nics for anything.


Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2011-01-03 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Stephan Budach
 
 Well a couple of weeks before christmas, I enabled the onboard bcom nics
 on my R610 again, to use them as IMPI ports - I didn't even use them in

You don't have to enable the broadcom nic in order for them to do IPMI.  In
my R710, I went into BIOS, and disabled all the bcom nics.  The primary NIC
doesn't allow you to *fully* disable it.  It says something like Disabled
(OS)...  This means the OS can't see it, but it's still doing IPMI assuming
you configured IPMI in the BIOS interface (Ctrl-E)

It seems to work fine in this configuration.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2011-01-03 Thread Stephan Budach

Am 03.01.11 19:41, schrieb Edward Ned Harvey:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Stephan Budach

Well a couple of weeks before christmas, I enabled the onboard bcom nics
on my R610 again, to use them as IMPI ports - I didn't even use them in

You don't have to enable the broadcom nic in order for them to do IPMI.  In
my R710, I went into BIOS, and disabled all the bcom nics.  The primary NIC
doesn't allow you to *fully* disable it.  It says something like Disabled
(OS)...  This means the OS can't see it, but it's still doing IPMI assuming
you configured IPMI in the BIOS interface (Ctrl-E)

It seems to work fine in this configuration.


That's worth a try. I will check that tomorrow.

Thanks,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-12-22 Thread Lasse Osterild
I've just noticed that Dell has a 6.0.1 firmware upgrade available, at least 
for my R610's they do (they are about 3 months old).  Oddly enough it doesn't 
show up on support.dell.com when I search using my servicecode, but if I check 
through System Services / Lifecycle Controller it does find them.

Two of the same servers are running Ubuntu 10.04.1 and RHEL 5.4, several TB's 
of data have gone through the interfaces on those two boxes without a single 
glitch.

So has anyone tried 6.0.1 yet, or is it simply v4.x repackaged with a new 
version number?

 - Lasse
 
On 12/12/2010, at 14.39, Markus Kovero wrote:

 Oh well.  Already, the weird crash has happened again.  So we're concluding
 two things:
 -1-  The broadcom nic is definitely the cause of the crash.
 and
 -2-  Even with the new upgrade downgrade, the problem is not solved.
 
 So the solution is add-on intel nic, and disable broadcom integrated nic.
 
 And if I may conclude my own findings.
 
 random crashes and Broadcom issues are separate non-related problems afaik, 
 we have some R710's with Broadcom nics that seem to be stable over several 
 months and other R710's with cannot keep it together for even week or so. 
 Both have identical fw/bios versions.
 
 1) there is/was problem with Broadcom nics loosing network connectivity with 
 every OS, including solaris, this was fixed by software patches in sol10 and 
 sol11 express, and non official driver update was made for snv_134.
workaround for this issue was to disable c-states from bios under 
 processor configuration.
 2) there is somekind of unstability issue not related to nics with latest 
 batch of R710 series servers, crashes occur randomly, but seemed to get fixed 
 in Solaris 11 Express, no idea on sol10 though. We have yet to test if this 
 is somehow related to processor/memory configuration being used, mind you 
 that software and firmware versions are identical on stable R710's and 
 crashing ones.
 3) there was also problems with system disk going missing suddenly (when 
 using sas 6ir), I think it's somewhat related to problem 2), happens rarely 
 though.
 4) Solaris 11 Express and latest R710's introduced new Broadcom problem, 
 random network hiccups. Disabling C-states does not help. Planning to open SR 
 for this, seems very much different from original problem (OS is not aware 
 what happens at all).
 
 My solution for issues would be not to use R710 in anything more serious, it 
 is definitely platform that has more problems than I'm interested in 
 debugging for (:
 
 Yours
 Markus Kovero
 
 
 -
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-12-11 Thread Stephan Budach

Am 10.12.10 19:13, schrieb Edward Ned Harvey:

From: Edward Ned Harvey [mailto:sh...@nedharvey.com]

It has been over 3 weeks now, with no crashes, and  me doing everything I
can to get it to crash again.  So I'm going to call this one resolved...

All I did was disable the built-in Broadcom network cards, and buy an add-
on Intel network card (EXPI9400PT).

Wow, I can't believe this topic continues...

Yes, I am entirely confident now saying it was the fault of the bcom card.
However, if you recall, people who started with bcom firmware v4.x were
stable, then they upgraded to v5.x and became unstable, so they downgraded
and returned to stable.  Unfortunately for me, I have an R710, which shipped
with v5 factory installed, and there was no option to downgrade...

But a few days ago, Dell released a new firmware upgrade, from version 5.x
to 4.x.  That's right.  The new firmware is a downgrade to 4.

I am going to remove my intel add-on card, and resume using my integrated
broadcom nic.  I am quite certain the system will continue to be stable, and
at last we can call this issue resolved permanently.

Wow - that's interesting. I will certainly update my current bcom fw 
to get to 4.x.


Thanks for the heads-up.

Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-12-11 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
 
 But a few days ago, Dell released a new firmware upgrade, from version 5.x
 to 4.x.  That's right.  The new firmware is a downgrade to 4.
 
 I am going to remove my intel add-on card, and resume using my integrated
 broadcom nic.  I am quite certain the system will continue to be stable,
and
 at last we can call this issue resolved permanently.

Oh well.  Already, the weird crash has happened again.  So we're concluding
two things:
-1-  The broadcom nic is definitely the cause of the crash.
and
-2-  Even with the new upgrade downgrade, the problem is not solved.

So the solution is add-on intel nic, and disable broadcom integrated nic.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-12-10 Thread Edward Ned Harvey
 From: Edward Ned Harvey [mailto:sh...@nedharvey.com]
 
 It has been over 3 weeks now, with no crashes, and  me doing everything I
 can to get it to crash again.  So I'm going to call this one resolved...
 
 All I did was disable the built-in Broadcom network cards, and buy an add-
 on Intel network card (EXPI9400PT).  

Wow, I can't believe this topic continues...

Yes, I am entirely confident now saying it was the fault of the bcom card.
However, if you recall, people who started with bcom firmware v4.x were
stable, then they upgraded to v5.x and became unstable, so they downgraded
and returned to stable.  Unfortunately for me, I have an R710, which shipped
with v5 factory installed, and there was no option to downgrade...

But a few days ago, Dell released a new firmware upgrade, from version 5.x
to 4.x.  That's right.  The new firmware is a downgrade to 4. 

I am going to remove my intel add-on card, and resume using my integrated
broadcom nic.  I am quite certain the system will continue to be stable, and
at last we can call this issue resolved permanently.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-11-20 Thread Edward Ned Harvey
 From: Edward Ned Harvey [mailto:sh...@nedharvey.com]
 
 I have a Dell R710 which has been flaky for some time.  It crashes about
once
 per week.  I have literally replaced every piece of hardware in it, and
 reinstalled Sol 10u9 fresh and clean.

It has been over 3 weeks now, with no crashes, and  me doing everything I
can to get it to crash again.  So I'm going to call this one resolved...
Tentatively acknowledging the remote possibility that the problem could
still come back.

All I did was disable the built-in Broadcom network cards, and buy an add-on
Intel network card (EXPI9400PT).  It is worth noting, that the built-in bcom
card cannot be completely disabled if you want to use ipmi...  It's disabled
for OS only, but the iDRAC ipmi traffic still goes across the bcom
interface.  So now I have two network cables running to the machine, one of
which is only used for ipmi.  No big deal.  I had ports to spare on my
switch, and the system is stable.

It's the fault of the Broadcom card.  Rumor has it (from dell support
technician) that the bcom cards have been problematic in other OSes too...
It's not isolated to solaris.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-11-01 Thread Michael Sullivan
Congratulations Ed, and welcome to open systems…

Ah, but Nexenta is open and has no vendor lock-in.  That's what you probably 
should have done is bank everything on Illumos and Nexenta.  A winning 
combination by all accounts.

But then again, you could have used Linux on any hardware as well.  Then your 
hardware and software issues would probably be multiplied even more.

Cheers,

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Japan Mobile: +81-80-3202-2599
US Phone: +1-561-283-2034

On 23 Oct 2010, at 12:53 , Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Kyle McDonald
 
 I'm currently considering purchasing 1 or 2 Dell R515's.
 
 With up to 14 drives, and up to 64GB of RAM, it seems like it's well
 suited
 for a low-end ZFS server.
 
 I know this box is new, but I wonder if anyone out there has any
 experience with it?
 
 How about the H700 SAS controller?
 
 Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I
 want to put some SSD's in a box like this, but there's no way I'm
 going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
 they kidding?
 
 You are asking for a world of hurt.  You may luck out, and it may work
 great, thus saving you money.  Take my example for example ... I took the
 safe approach (as far as any non-sun hardware is concerned.)  I bought an
 officially supported dell server, with all dell blessed and solaris
 supported components, with support contracts on both the hardware and
 software, fully patched and updated on all fronts, and I am getting system
 failures approx once per week.  I have support tickets open with both dell
 and oracle right now ... Have no idea how it's all going to turn out.  But
 if you have a problem like mine, using unsupported hardware, you have no
 alternative.  You're up a tree full of bees, naked, with a hunter on the
 ground trying to shoot you.  And IMHO, I think the probability of having a
 problem like mine is higher when you use the unsupported hardware.  But of
 course there's no definable way to quantize that belief.
 
 My advice to you is:  buy the supported hardware, and the support contracts
 for both the hardware and software.  But of course, that's all just a
 calculated risk, and I doubt you're going to take my advice.  ;-)
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-11-01 Thread Brian Kolaci
I've been having the same problems, and it appears to be from a remote 
monitoring app that calls zpool status and/or zfs list.  I've also found 
problems with PERC and I'm finally replacing the PERC cards with SAS5/E 
controllers (which are much cheaper anyway).  Every time I reboot, the PERC 
tells me about a foreign import required so the PERC cards and ZFS just don't 
go together...


On Oct 24, 2010, at 8:14 PM, Edward Ned Harvey wrote:

 From: Stephan Budach [mailto:stephan.bud...@jvm.de]
 
 What sort of problems did you have with the bcom NICs in your R610?
 
 Well, basically the boxes would hang themselves up, after a week or so.
 And by hanging up, I mean becoming inaccessible by either the network
 via ssh or the local console. It seemed that, for some reason, the
 authentication didn't work anymore.
 
 That's precisely what I'm experiencing.  System still responds to ping.
 Anything that was already running in memory via network stays alive (cron
 jobs continue to run) but remote access is impossible (ssh, vnc, even local
 physical console...)  And eventually the system will stop completely.
 
 There's a high correlation between the problem and doing some sort of
 low-level storage operation (zpool import/export, MegaCli offline, zpool
 status, scrub, zfs send, etc)  So I thought the problem was somehow related
 to the perc or something ... Maybe there's a bug where the perc conflicts
 with the nic.  I don't care.  Swapping the NIC is cheap enough, I'll try it
 now, just to see if it works.
 
 Thanks for the suggestion...
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-26 Thread Jens Elkner
On Tue, Oct 26, 2010 at 08:06:53AM +1300, Ian Collins wrote:
 On 10/26/10 01:38 AM, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Ian Collins
 
 Sun hardware?  Then you get all your support from one vendor.
  
 +1
 
 Sun hardware costs more, but it's worth it, if you want to simply assume
 your stuff will work.  In my case, I'd say the sun hardware was approx 50%
 to 2x higher cost than the equivalent dell setup.
 

 I find that claim odd.  When ever we bought kit down here in NZ, Sun has 
 been the best on price.  Maybe that's changed under the new order.

Add about 50% to the last price list from Sun und you will get the price
it costs now ...

Have fun,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-26 Thread Markus Kovero

 Add about 50% to the last price list from Sun und you will get the price
 it costs now ...

Seems oracle does not want to sell its hardware so much, several month delays 
with sales rep providing prices and pricing nowhere close to its competitors.

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-26 Thread Eugen Leitl
On Tue, Oct 26, 2010 at 12:50:16PM +, Markus Kovero wrote:
 
  Add about 50% to the last price list from Sun und you will get the price
  it costs now ...
 
 Seems oracle does not want to sell its hardware so much, several 
 month delays with sales rep providing prices and pricing nowhere 
 close to its competitors.

Yeah, no more Sun hardware for us, either. Mostly Supermicro,
Dell, HP.

-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Markus Kovero
 That's precisely what I'm experiencing.  System still responds to ping.
 Anything that was already running in memory via network stays alive (cron
 jobs continue to run) but remote access is impossible (ssh, vnc, even local
 physical console...)  And eventually the system will stop completely.

Hi, Broadcom issues come out as loss of network connectivity, ie. system stops 
responding to ping.
This is different issue, it's like system runs out of memory or looses its 
system disks (which we have seen lately)

Yours
Markus Kovero

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Markus Kovero
 You are asking for a world of hurt.  You may luck out, and it may work
 great, thus saving you money.  Take my example for example ... I took the
 safe approach (as far as any non-sun hardware is concerned.)  I bought an
 officially supported dell server, with all dell blessed and solaris
 supported components, with support contracts on both the hardware and
 software, fully patched and updated on all fronts, and I am getting system
 failures approx once per week.  I have support tickets open with both dell
 and oracle right now ... Have no idea how it's all going to turn out.  But
 if you have a problem like mine, using unsupported hardware, you have no
 alternative.  You're up a tree full of bees, naked, with a hunter on the
 ground trying to shoot you.  And IMHO, I think the probability of having a
 problem like mine is higher when you use the unsupported hardware.  But of
 course there's no definable way to quantize that belief.

 My advice to you is:  buy the supported hardware, and the support contracts
 for both the hardware and software.  But of course, that's all just a
 calculated risk, and I doubt you're going to take my advice.  ;-)


Any other feasible alternatives for Dell hardware? Wondering, are these issues 
mostly related to Nehalem-architectural problems, eg. c-states.
So is there anything good in switching hw vendor? HP anyone? 

Yours
Markus Kovero


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Ian Collins

On 10/25/10 08:39 PM, Markus Kovero wrote:

You are asking for a world of hurt.  You may luck out, and it may work
great, thus saving you money.  Take my example for example ... I took the
safe approach (as far as any non-sun hardware is concerned.)  I bought an
officially supported dell server, with all dell blessed and solaris
supported components, with support contracts on both the hardware and
software, fully patched and updated on all fronts, and I am getting system
failures approx once per week.  I have support tickets open with both dell
and oracle right now ... Have no idea how it's all going to turn out.  But
if you have a problem like mine, using unsupported hardware, you have no
alternative.  You're up a tree full of bees, naked, with a hunter on the
ground trying to shoot you.  And IMHO, I think the probability of having a
problem like mine is higher when you use the unsupported hardware.  But of
course there's no definable way to quantize that belief.
 
My advice to you is:  buy the supported hardware, and the support contracts

for both the hardware and software.  But of course, that's all just a
calculated risk, and I doubt you're going to take my advice.  ;-)
 


Any other feasible alternatives for Dell hardware? Wondering, are these issues 
mostly related to Nehalem-architectural problems, eg. c-states.
So is there anything good in switching hw vendor? HP anyone?

   

Sun hardware?  Then you get all your support from one vendor.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Edward Ned Harvey
 From: Markus Kovero [mailto:markus.kov...@nebula.fi] 
 
 Any other feasible alternatives for Dell hardware? Wondering, are these
 issues mostly related to Nehalem-architectural problems, eg. c-states.
 So is there anything good in switching hw vendor? HP anyone?

In googling around etc ... Many people are having this type of problem, on
HP also.  So it's not just Dell.

Most people are able to fix or workaround it by disabling c-states in bios,
or fidgeting with their NIC (swap out broadcom in favor of intel, or
downgrade bcom firmware.)  But I already disabled c-states (didn't help) and
my system ships with a minimum bcom FW version 5, which means I can't
downgrade to v4 which sometimes solved the problem for people.

So again - I have support tickets open with dell  oracle...  Don't know the
result yet.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Ian Collins
 
 Sun hardware?  Then you get all your support from one vendor.

+1

Sun hardware costs more, but it's worth it, if you want to simply assume
your stuff will work.  In my case, I'd say the sun hardware was approx 50%
to 2x higher cost than the equivalent dell setup.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 


On 10/25/2010 3:39 AM, Markus Kovero wrote:

 Any other feasible alternatives for Dell hardware? Wondering, are these
issues mostly related to Nehalem-architectural problems, eg. c-states.
 So is there anything good in switching hw vendor? HP anyone?

Note that while it was a Dell I was asking about, it's an AMD opteron
system (the R515.)

I doubt with an architecture that different that the same 'c-states'
corner case will appear. Aren't there too many variables changing
between AMD and Intel to have the exact same problem?

Not there there won't be a different problem though. :)

  -Kyle

 Yours
 Markus Kovero


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMxYghAAoJEEADRM+bKN5wnEEH/iMYiNEjqRdEWMYMlzrXJV7G
1EqsmgC/10nwdVS+lxHQbeoXZ6AZltomkb42ckwLfR74BVwHTM8BBC2hmoaXVMAr
FeJzVPe61c8LF5M0RrVJ59gXpBJCjIps8mBli/7wqNYm5SyLAfu0DDD59kY54n75
QcvNvz6mNlXjmE2+kakcLbN3DMjCxRlQ4XgrGQrqwusoZL7LPFhwEy7f+rGp63PO
LW82RUIolVqRoNQ5Vg2iemaASkbJUKONppOV2J6FN30MQt8fyGL8SlkU1Fek/hgS
EbHZ1e8wgmrOKlcKxnMMH7yh296X8ICl990aWRbt6jxUDM+zeKRC3NceV+pmrSc=
=heKE
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread David Magda
On Mon, October 25, 2010 08:38, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Ian Collins

 Sun hardware?  Then you get all your support from one vendor.

 +1

 Sun hardware costs more, but it's worth it, if you want to simply assume
 your stuff will work.  In my case, I'd say the sun hardware was approx 50%
 to 2x higher cost than the equivalent dell setup.

I agree with the general sentiment, but it can get prohibitive if you also
have a sizable DEV/QA/STG environment that you want to have the same as
PRD.

I don't mind gold-plated support for PRD, but for the rest, it'd be handy
budget-wise if Oracle simply had a basic parts-only warranty for hardware,
and a patches-only support for software. It's all we need for the majority
of our environment, but it no longer seems available under Larry Ellison's
watch.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Ian Collins

On 10/26/10 01:38 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Ian Collins

Sun hardware?  Then you get all your support from one vendor.
 

+1

Sun hardware costs more, but it's worth it, if you want to simply assume
your stuff will work.  In my case, I'd say the sun hardware was approx 50%
to 2x higher cost than the equivalent dell setup.

   
I find that claim odd.  When ever we bought kit down here in NZ, Sun has 
been the best on price.  Maybe that's changed under the new order.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-25 Thread Stephan Budach

Am 25.10.10 21:06, schrieb Ian Collins:

On 10/26/10 01:38 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Ian Collins

Sun hardware?  Then you get all your support from one vendor.

+1

Sun hardware costs more, but it's worth it, if you want to simply assume
your stuff will work.  In my case, I'd say the sun hardware was 
approx 50%

to 2x higher cost than the equivalent dell setup.

I find that claim odd.  When ever we bought kit down here in NZ, Sun 
has been the best on price.  Maybe that's changed under the new order.




I am currently investigating on buying Sun/Oracle HW. If you take into 
account Oracle's license/support fees for non-Oracle HW, actually buying 
Solaris with Oracle HW may proof cheaper when calculated over three years.


We'll see.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-24 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Stephan Budach
 
 I actually have three Dell R610 boxes running OSol snv134 and since I
 switched from the internal Broadcom NICs to Intel ones, I didn't have
 any issue with them.

I am still using the built-in broadcom NICs in my R710 that's having
problems...

What sort of problems did you have with the bcom NICs in your R610?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-24 Thread Stephan Budach

Am 24.10.10 16:29, schrieb Edward Ned Harvey:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Stephan Budach

I actually have three Dell R610 boxes running OSol snv134 and since I
switched from the internal Broadcom NICs to Intel ones, I didn't have
any issue with them.

I am still using the built-in broadcom NICs in my R710 that's having
problems...

What sort of problems did you have with the bcom NICs in your R610?

Well, basically the boxes would hang themselves up, after a week or so. 
And by hanging up, I mean becoming inaccessible by either the network 
via ssh or the local console. It seemed that, for some reason, the 
authentication didn't work anymore.


Earlier versions of OSol also exhibited the problem, that the links of 
the broadcom NICs were reported as up (which they actually were, since 
the LED indiactors we on and also the switch reported that the Links 
were up), but no network traffic was going through. Disabeling/Enableing 
the ports didn't work and I had to reboot the host as well, but that was 
with 2009/06, I think.


Since I am still an OSol noob (well, kind of still), I decided to try 
the Intel NICs, which had never caused me any trouble in any other 
server and my boxes have been fine since.


--
Stephan Budach
Jung von Matt/it-services GmbH
Glashüttenstraße 79
20357 Hamburg

Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.bud...@jvm.de
Internet: http://www.jvm.com

Geschäftsführer: Ulrich Pallas, Frank Wilhelm
AG HH HRB 98380

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-24 Thread Edward Ned Harvey
 From: Stephan Budach [mailto:stephan.bud...@jvm.de]
 
  What sort of problems did you have with the bcom NICs in your R610?
 
 Well, basically the boxes would hang themselves up, after a week or so.
 And by hanging up, I mean becoming inaccessible by either the network
 via ssh or the local console. It seemed that, for some reason, the
 authentication didn't work anymore.

That's precisely what I'm experiencing.  System still responds to ping.
Anything that was already running in memory via network stays alive (cron
jobs continue to run) but remote access is impossible (ssh, vnc, even local
physical console...)  And eventually the system will stop completely.

There's a high correlation between the problem and doing some sort of
low-level storage operation (zpool import/export, MegaCli offline, zpool
status, scrub, zfs send, etc)  So I thought the problem was somehow related
to the perc or something ... Maybe there's a bug where the perc conflicts
with the nic.  I don't care.  Swapping the NIC is cheap enough, I'll try it
now, just to see if it works.

Thanks for the suggestion...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-23 Thread Henrik Johansen

'Tim Cook' wrote:

[... snip ... ]


Dell requires Dell branded drives as of roughly 8 months ago.  I don't
think there was ever an H700 firmware released that didn't require
this.  I'd bet you're going to waste a lot of money to get a drive the
system refuses to recognize.


This should no longer be an issue as Dell has abandoned that practice
because of customer pressure.


--Tim





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-23 Thread Stephan Budach
I actually have three Dell R610 boxes running OSol snv134 and since I 
switched from the internal Broadcom NICs to Intel ones, I didn't have 
any issue with them.


budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Eric D. Mudama

On Wed, Oct 13 at 15:44, Edward Ned Harvey wrote:

From: Henrik Johansen [mailto:hen...@scannet.dk]

The 10g models are stable - especially the R905's are real workhorses.


You would generally consider all your machines stable now?
Can you easily pdsh to all those machines?

kstat | grep current_cstate ; kstat | grep supported_max_cstates


Dell T610, machine has been stable since we got it (relative to the
failure modes you've mentioned)

current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1


--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Hi All,

I'm currently considering purchasing 1 or 2 Dell R515's.

With up to 14 drives, and up to 64GB of RAM, it seems like it's well
suited
for a low-end ZFS server.

I know this box is new, but I wonder if anyone out there has any
experience with it?

How about the H700 SAS controller?

Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I
want to put some SSD's in a box like this, but there's no way I'm
going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
they kidding?

  -Kyle

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMwiMEAAoJEEADRM+bKN5w5IkH/AjOBKmnEUHIsSbW44Tmo94o
83kISEBx/hRYhLzNEpFYOW6IBD3pqYDGQP7da4ULMdPBINCWE6zcUT83BTct6O0D
MSHJXacciOILIMMj6SM6+auvv9WloWwrbV/S+KsvkKoLxzhBafYkxZOEMJlkBwp1
Jpm/P3EoWpNLBasSHCCvKsGskZUDpIgVnzKrMkzXV6R5ROlgYlmFNPGlC/1kbL1Y
9DZrlKow0Ai0W5fCXjGSafZbzawa4SpBj02ES7CUQLvn45EhaRrSkneAM4dy1obo
Oif4c1Nt2c0yV5xa1tc4i84Vd2iy9LR6g5C+1Hm3UqAKjcwPEEEUyAYhQpsKAIA=
=DW76
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Kyle McDonald
 
 I'm currently considering purchasing 1 or 2 Dell R515's.
 
 With up to 14 drives, and up to 64GB of RAM, it seems like it's well
 suited
 for a low-end ZFS server.
 
 I know this box is new, but I wonder if anyone out there has any
 experience with it?
 
 How about the H700 SAS controller?
 
 Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I
 want to put some SSD's in a box like this, but there's no way I'm
 going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
 they kidding?

You are asking for a world of hurt.  You may luck out, and it may work
great, thus saving you money.  Take my example for example ... I took the
safe approach (as far as any non-sun hardware is concerned.)  I bought an
officially supported dell server, with all dell blessed and solaris
supported components, with support contracts on both the hardware and
software, fully patched and updated on all fronts, and I am getting system
failures approx once per week.  I have support tickets open with both dell
and oracle right now ... Have no idea how it's all going to turn out.  But
if you have a problem like mine, using unsupported hardware, you have no
alternative.  You're up a tree full of bees, naked, with a hunter on the
ground trying to shoot you.  And IMHO, I think the probability of having a
problem like mine is higher when you use the unsupported hardware.  But of
course there's no definable way to quantize that belief.

My advice to you is:  buy the supported hardware, and the support contracts
for both the hardware and software.  But of course, that's all just a
calculated risk, and I doubt you're going to take my advice.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Tim Cook
On Fri, Oct 22, 2010 at 10:53 PM, Edward Ned Harvey sh...@nedharvey.comwrote:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Kyle McDonald
 
  I'm currently considering purchasing 1 or 2 Dell R515's.
 
  With up to 14 drives, and up to 64GB of RAM, it seems like it's well
  suited
  for a low-end ZFS server.
 
  I know this box is new, but I wonder if anyone out there has any
  experience with it?
 
  How about the H700 SAS controller?
 
  Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I
  want to put some SSD's in a box like this, but there's no way I'm
  going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
  they kidding?

 You are asking for a world of hurt.  You may luck out, and it may work
 great, thus saving you money.  Take my example for example ... I took the
 safe approach (as far as any non-sun hardware is concerned.)  I bought an
 officially supported dell server, with all dell blessed and solaris
 supported components, with support contracts on both the hardware and
 software, fully patched and updated on all fronts, and I am getting system
 failures approx once per week.  I have support tickets open with both dell
 and oracle right now ... Have no idea how it's all going to turn out.  But
 if you have a problem like mine, using unsupported hardware, you have no
 alternative.  You're up a tree full of bees, naked, with a hunter on the
 ground trying to shoot you.  And IMHO, I think the probability of having a
 problem like mine is higher when you use the unsupported hardware.  But of
 course there's no definable way to quantize that belief.

 My advice to you is:  buy the supported hardware, and the support contracts
 for both the hardware and software.  But of course, that's all just a
 calculated risk, and I doubt you're going to take my advice.  ;-)




Dell requires Dell branded drives as of roughly 8 months ago.  I don't think
there was ever an H700 firmware released that didn't require this.  I'd bet
you're going to waste a lot of money to get a drive the system refuses to
recognize.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-14 Thread Henrik Johansen

'Edward Ned Harvey' wrote:

From: Henrik Johansen [mailto:hen...@scannet.dk]

The 10g models are stable - especially the R905's are real workhorses.


You would generally consider all your machines stable now?
Can you easily pdsh to all those machines?


Yes - the only problem child has been 1 R610 (the other 2 that we have
in production have not shown any signs of trouble)


kstat | grep current_cstate ; kstat | grep supported_max_cstates

I'd really love to see if some current_cstate is higher than
supported_max_cstates is an accurate indicator of system instability.


Here's a little sample from different machines : 


R610 #1

current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  0
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2

R610 #2

current_cstate  3
current_cstate  0
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  3
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2

PE2900

current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1

PER905 
current_cstate  1

current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0
supported_max_cstates   0

Re: [zfs-discuss] Running on Dell hardware?

2010-10-14 Thread Jiawei Zhao
We got a R710 + 3 MD1000s running zfs, with intel 10GE network card. 

There was a period of time that R710 freezing randomly, when we used osol b12x 
release. I checked in google and there were reports of freezes caused by a new 
mpt driver used in b12x release which could be the cause. Changed to nexenta 
based on b134, then the issue is gone, running very stable ever since. Plan to 
add 3 more MD1000s. All MD1000s are connected to SAS 5e card.

Not sure how is the mpt driver status in sol10u9.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Edward Ned Harvey
I have a Dell R710 which has been flaky for some time.  It crashes about
once per week.  I have literally replaced every piece of hardware in it, and
reinstalled Sol 10u9 fresh and clean.  

 

I am wondering if other people out there are using Dell hardware, with what
degree of success, and in what configuration?

 

The failure seems to be related to the perc 6i.  For some period around the
time of crash, the system still responds to ping, and anything currently in
memory or running from remote storage continues to function fine.  But new
processes that require the local storage ... Such as inbound ssh etc, or
even physical login at the console ... those are all hosed.  And eventually
the system stops responding to ping.  As soon as the problem starts, the
only recourse is power cycle.

 

I can't seem to reproduce the problem reliably, but it does happen
regularly.  Yesterday it happened several times in one day, but sometimes it
will go 2 weeks without a problem.

 

Again, just wondering what other people are using, and experiencing.  To see
if any more clues can be found to identify the cause.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Markus Kovero

 I have a Dell R710 which has been flaky for some time.  It crashes about once 
 per week.  I have literally replaced every piece of hardware in it, and 
 reinstalled Sol 10u9 fresh and clean.  
 I am wondering if other people out there are using Dell hardware, with what 
 degree of success, and in what configuration?
 The failure seems to be related to the perc 6i.  For some period around the 
 time of crash, the system still responds to ping, and anything currently in 
 memory or running from remote storage continues to function fine.  But new 
 processes that require the local storage ... Such as inbound ssh etc, or even 
 physical login at the console ... those are all hosed.  And eventually the 
 system stops responding to ping.  As soon as the problem starts, the only 
 recourse is power cycle.
 I can't seem to reproduce the problem reliably, but it does happen 
 regularly.  Yesterday it happened several times in one day, but sometimes it 
 will go 2 weeks without a problem.
 Again, just wondering what other people are using, and experiencing.  To see 
 if any more clues can be found to identify the cause.


Hi, we've been running opensolaris on Dell R710 with mixed results, some work 
better than others and we've been struggling with same issue as you are with 
latest servers.
I suspect somekind powersaving issue gone wrong, system disks goes to sleep and 
never wake up or something similar.
Personally, I cannot recommend using them with solaris, support is not even 
close to what it should be.

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Steve Radich, BitShop, Inc.
Do you have dedup on? Removing large files, zfs destroy a snapshot, or a zvol 
and you'll see hangs like you are describing.

Turn off dedup is best option..

If you want dedup get more ram, and more, and more, and..   add SSD cache 
device.. then it works ok usually.. 

Right now I'm fighting an outage due to zfs destroy zvol that hung everything, 
we thought we had identified what not to do but apparently not..

Dedup only works well with a lot of ram, otherwise the dedup table is read from 
disk (very slowly, especially during i/o) and some operations lock the whole 
server - it blocks other disk i/o.

Good luck,

Steve Radich
www.BitShop.com - Business Innovative Technology Shop
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Edward Ned Harvey
 From: Markus Kovero [mailto:markus.kov...@nebula.fi]
 Sent: Wednesday, October 13, 2010 10:43 AM
 
 Hi, we've been running opensolaris on Dell R710 with mixed results,
 some work better than others and we've been struggling with same issue
 as you are with latest servers.
 I suspect somekind powersaving issue gone wrong, system disks goes to
 sleep and never wake up or something similar.
 Personally, I cannot recommend using them with solaris, support is not
 even close to what it should be.

How consistent are your problems?  If you change something and things get
better or worse, will you be able to notice?

Right now, I think I have improved matters by changing the Perc to
WriteThrough instead of WriteBack.  Yesterday the system crashed several
times before I changed that, and afterward, I can't get it to crash at all.
But as I said before ... Sometimes the system goes 2 weeks without a
problem.

Do you have all your disks configured as individual disks?
Do you have any SSD?
WriteBack or WriteThrough?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Steve Radich, BitShop, Inc.
 
 Do you have dedup on? Removing large files, zfs destroy a snapshot, or
 a zvol and you'll see hangs like you are describing.

Thank you, but no.

I'm running sol 10u9, which does not have dedup yet, because dedup is not
yet considered stable for reasons like you mentioned.

I will admit, when dedup is available in sol 11, I do want it.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Markus Kovero

 How consistent are your problems?  If you change something and things get
 better or worse, will you be able to notice?

 Right now, I think I have improved matters by changing the Perc to
 WriteThrough instead of WriteBack.  Yesterday the system crashed several
 times before I changed that, and afterward, I can't get it to crash at all.
 But as I said before ... Sometimes the system goes 2 weeks without a
 problem.

 Do you have all your disks configured as individual disks?
 Do you have any SSD?
 WriteBack or WriteThrough?

I believe issues are not related to perc, as we use sas 6ir with system disks 
and disks are showing up as individual disks.
System has been crashing with and without (i/o) load, so far it's been running 
best with all extra pci-e cards removed (10Gbps nic, sas 5e controllers), 
uptime almost two days.
There's no apparent reason what triggers the crash, it did crash very 
frequently during one day and now it seems more stable. (sunspots anyone?)
We had SSD's at start, but removed them during testing, no effect there.
Somehow, all this is starting to remind me about Broadcom NIC issues. Different 
(not fully supported) hardware revision causing issues?

Yours
Markus Kovero

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Bruno Sousa

Hi ,

I have some Dell R710 and Dell R410 running OSOL (snv_130 or
snv_134) attached to a Supermicro chassis, and the PERC it's only used for
the root disks.
I did got some issues with this type of servers, but here's
what i did that made them quite stable :

 - disable virtualization support
in BIOS
 - disable C-STATE in Bios (CPU menu i think)

After those 2 issues
those servers became quite stable, but before it they hang without any
apparent reason...I use compression, some hosts have SSD's and i have no
dedup enable on all servers.

Good luck !

Bruno 

On Wed, 13 Oct 2010
10:56:48 -0400, Edward Ned Harvey
 wrote:
 From:
zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-

boun...@opensolaris.org] On Behalf Of Steve Radich, BitShop, Inc.
 
 Do
you have dedup on? Removing large files, zfs destroy a snapshot, or
 a
zvol and you'll see hangs like you are describing.
 
 Thank you, but
no.
 
 I'm running sol 10u9, which does not have dedup yet, because dedup
is not
 yet considered stable for reasons like you mentioned.
 
 I
will admit, when dedup is available in sol 11, I do want it. ;-)
 

___
 zfs-discuss mailing
list
 zfs-discuss@opensolaris.org

http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Bruno Sousa

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Bruno Sousa
The BROADCOM NIC was also a problem faced by me, and if you downgrade the
FW to the 4.x series everything is fine...
But i think there's a new updated driver somewhere...

Bruno

On Wed, 13 Oct 2010 14:58:32 +, Markus Kovero
markus.kov...@nebula.fi
wrote:
 How consistent are your problems?  If you change something and things
get
 better or worse, will you be able to notice?
 
 Right now, I think I have improved matters by changing the Perc to
 WriteThrough instead of WriteBack.  Yesterday the system crashed
several
 times before I changed that, and afterward, I can't get it to crash at
 all.
 But as I said before ... Sometimes the system goes 2 weeks without a
 problem.
 
 Do you have all your disks configured as individual disks?
 Do you have any SSD?
 WriteBack or WriteThrough?
 
 I believe issues are not related to perc, as we use sas 6ir with system
 disks and disks are showing up as individual disks.
 System has been crashing with and without (i/o) load, so far it's been
 running best with all extra pci-e cards removed (10Gbps nic, sas 5e
 controllers), uptime almost two days.
 There's no apparent reason what triggers the crash, it did crash very
 frequently during one day and now it seems more stable. (sunspots
anyone?)
 We had SSD's at start, but removed them during testing, no effect there.
 Somehow, all this is starting to remind me about Broadcom NIC issues.
 Different (not fully supported) hardware revision causing issues?
 
 Yours
 Markus Kovero
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Bruno Sousa

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Max Bruning
Hi Ed,
I have been using the Dell r710 for a while.  You might try
disabling c-states, as the problem you saw is identical to one I
was seeing (disk i/o stops working, other things are ok).  Since
disabling c-states, I haven't seen the problem again.

max


On Oct 13, 2010, at 4:56 PM, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Steve Radich, BitShop, Inc.
 
 Do you have dedup on? Removing large files, zfs destroy a snapshot, or
 a zvol and you'll see hangs like you are describing.
 
 Thank you, but no.
 
 I'm running sol 10u9, which does not have dedup yet, because dedup is not
 yet considered stable for reasons like you mentioned.
 
 I will admit, when dedup is available in sol 11, I do want it.  ;-)
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Eric D. Mudama

On Wed, Oct 13 at 10:13, Edward Ned Harvey wrote:

  I have a Dell R710 which has been flaky for some time.  It crashes about
  once per week.  I have literally replaced every piece of hardware in it,
  and reinstalled Sol 10u9 fresh and clean.



  I am wondering if other people out there are using Dell hardware, with
  what degree of success, and in what configuration?


Dell T610 w/ the default SAS 6/IR adapter has been working fine for us
for 18 months.  All issues have been software bugs in opensolaris so
far.

Not much of a data point, but I have no reason not to buy another Dell
server in the future.

Out of curiosity, did you run into this:
http://blogs.everycity.co.uk/alasdair/2010/06/broadcom-nics-dropping-out-on-solaris-10/

--eric


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Edward Ned Harvey
 From: edmud...@mail.bounceswoosh.org
 [mailto:edmud...@mail.bounceswoosh.org] On Behalf Of Eric D. Mudama
 
 Out of curiosity, did you run into this:
 http://blogs.everycity.co.uk/alasdair/2010/06/broadcom-nics-dropping-
 out-on-solaris-10/

I personally haven't had the broadcom problem.  When my system crashes,
surprisingly, it continues responding to ping, answers on port 22 (but you
can't ssh in), and if there are any cron jobs that run from NFS, they're
able to continue.  For some period of time, and eventually the whole thing
crashes.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Edward Ned Harvey
Dell R710 ... Solaris 10u9 ... With stability problems ...
Notice that I have several CPU's whose current_cstate is higher than the
supported_max_cstate.

Logically, that sounds like a bad thing.  But I can't seem to find
documentation that defines the meaning of supported_max_cstates, to verify
that this is a bad thing.

I'm looking for other people out there ... with and without problems ... to
try this too, and see if a current_cstate higher than the
supported_max_cstate might be a simple indicator of system instability.

kstat | grep current_cstate ; kstat | grep supported_max_cstate
current_cstate  1
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  1
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  0
current_cstate  3
current_cstate  3
current_cstate  3
current_cstate  1
current_cstate  3
current_cstate  3
current_cstate  3
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2
supported_max_cstates   2

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
 
 Dell R710 ... Solaris 10u9 ... With stability problems ...
 Notice that I have several CPU's whose current_cstate is higher than
 the
 supported_max_cstate.

One more data point:

Sun x4275 ... Solaris 10u6 fully updated (equivalent of 10u9??) ... No
problems ...
There are no current_cstate's higher than supported_max_cstate.

kstat | grep current_cstate ; kstat | grep supported_max_cstate
current_cstate  2
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  2
current_cstate  2
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3
supported_max_cstates   3

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Daniel Taylor


On 13 Oct 2010, at 18:30, Edward Ned Harvey wrote:


From: edmud...@mail.bounceswoosh.org
[mailto:edmud...@mail.bounceswoosh.org] On Behalf Of Eric D. Mudama

Out of curiosity, did you run into this:
http://blogs.everycity.co.uk/alasdair/2010/06/broadcom-nics-dropping-
out-on-solaris-10/


I personally haven't had the broadcom problem.  When my system  
crashes,
surprisingly, it continues responding to ping, answers on port 22  
(but you
can't ssh in), and if there are any cron jobs that run from NFS,  
they're
able to continue.  For some period of time, and eventually the whole  
thing

crashes.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I had that for months! Eventually I found that there was a memory leak  
with idmapd.


I now have a cron that restarts it every night, problem solved.

I only diagnosed the issue by emailing my self a 'top' output every 5  
minutes via cron and watching it slowly creep up.


It normally happens when I have allot of SMB traffic, there's a leak  
there somewhere!


- Daniel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Henrik Johansen

'Edward Ned Harvey' wrote:

I have a Dell R710 which has been flaky for some time.  It crashes
about once per week.  I have literally replaced every piece of hardware
in it, and reinstalled Sol 10u9 fresh and clean.

I am wondering if other people out there are using Dell hardware, with
what degree of success, and in what configuration?


We are running (Open)Solaris on lots of 10g servers (PE2900, PE1950, PE2950,
R905) and some 11g (R610 and soon some R815) with both PERC and non-PERC
controllers and lots of MD1000's.

The 10g models are stable - especially the R905's are real workhorses.

We have had only one 11g server (R610) which caused trouble. The box
froze at least once a week - after replacing almost the entire box I
switched from the old iscsitgt to COMSTAR and the box has been stable
since. Go figure ...

I might add that none of these machine use the onboard Broadcom nic's.


The failure seems to be related to the perc 6i.  For some period around
the time of crash, the system still responds to ping, and anything
currently in memory or running from remote storage continues to
function fine.  But new processes that require the local storage
... Such as inbound ssh etc, or even physical login at the console
... those are all hosed.  And eventually the system stops responding to
ping.  As soon as the problem starts, the only recourse is power cycle.

I can't seem to reproduce the problem reliably, but it does happen
regularly.  Yesterday it happened several times in one day, but
sometimes it will go 2 weeks without a problem.

Again, just wondering what other people are using, and experiencing.
To see if any more clues can be found to identify the cause.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-13 Thread Edward Ned Harvey
 From: Henrik Johansen [mailto:hen...@scannet.dk]
 
 The 10g models are stable - especially the R905's are real workhorses.

You would generally consider all your machines stable now?
Can you easily pdsh to all those machines?

kstat | grep current_cstate ; kstat | grep supported_max_cstates

I'd really love to see if some current_cstate is higher than
supported_max_cstates is an accurate indicator of system instability.

So far the two data points I have support this theory.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss