Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-05 Thread Bob Netherton

 1. Do you use set pool= anymore, now that the dedicated-cpu feature exists?

Until Oracle develops a more rational licensing scheme you should
expect this feature to be in use.   I may have many Oracle instances,
each in a separate zone, using the same pool.   The sampling on this
discussion list may not give you a good idea of its use.   Might
pose this question on your blog as well ?

That said, this requires manual configuration of the pool.   I don't
think it would be asking too much for customers using this feature
to also set up a boot time service (SMF or RC) to disable interrupts
on all CPUs in the pool.   If needed (may not always be needed).

 2. Is it sufficient to simply disable interrupts on a zone's pset?

I like your idea of turning off interrupts for dynamic resource pools
under zoneadm/rcapd control, and leaving it a configurable item.   I
would also think that when CPUs are removed from the pool that
interrupts should be turned back on unless given to a another
pool with interrupts=disabled.   I would hate for several zone
reboots to turn off interrupts to all CPUs :-(




Bob

___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-05 Thread Gael
On Wed, Mar 4, 2009 at 9:06 AM, Jeff Victor jeff.j.vic...@gmail.com wrote:


 Some questions:
 1. Do you use set pool= anymore, now that the dedicated-cpu feature
 exists?

We got over one hundred physical frames running zones here, covering nearly
all versions of Solaris 10, we are currently sticking to set pool until we
can get the whole environment upgraded. Before that, cannot afford to have
the whole team of admins handling zones differently depending on the OS
version. Headache...



 2. Is it sufficient to simply disable interrupts on a zone's pset?


In our case, we do pset only when licensing requires it (aka
oracle,datastage,sybase,borland apps) or when the applications behave poorly
and we keep hearing that by lack of budget/resources, the issue cannot be
addressed and without direct impact on the business itself, nothing will
change.

What about creating an IO pset, and then disabling the interrupt on
everything else while using it as a FSS pool or psets pools ? Very similar
to ldom I would think...

Regards


-- 
Gael Martinez
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-05 Thread Jeff Victor
Thanks for the great feedback Gael. Comments below.

On Thu, Mar 5, 2009 at 11:00 AM, Gael gael.marti...@gmail.com wrote:

 On Wed, Mar 4, 2009 at 9:06 AM, Jeff Victor jeff.j.vic...@gmail.com wrote:

 Some questions:
 1. Do you use set pool= anymore, now that the dedicated-cpu feature exists?

 We got over one hundred physical frames running zones here, covering nearly
 all versions of Solaris 10, we are currently sticking to set pool until we
 can get the whole environment upgraded. Before that, cannot afford to have
 the whole team of admins handling zones differently depending on the OS
 version. Headache...

It is now clear to me that this feature would need to support
disabling interrupts when a zone uses set pool=. Currently, all pool
attributes are configured using the pool tools (poolcfg, pooladm) and
I don't see any reason to not continue. When I write this up, it will
fulfill that need.

 2. Is it sufficient to simply disable interrupts on a zone's pset?

 In our case, we do pset only when licensing requires it (aka
 oracle,datastage,sybase,borland apps) or when the applications behave poorly
 and we keep hearing that by lack of budget/resources, the issue cannot be
 addressed and without direct impact on the business itself, nothing will
 change.

Gael, I realized that my question was vague. When you use a pool,
you're using a pset. Do you mean that you only use pools and psets
when licensing requires it?

Also, I couldn't tell how the comment responded to the question.

 What about creating an IO pset, and then disabling the interrupt on
 everything else while using it as a FSS pool or psets pools ? Very similar
 to ldom I would think...

Yes, that occurred to me, too. You can do that now, either with a pset
that's being used by a zone or with the default pset. But I'm not
convinced there's enough reason to separate an I/O pset from the
default pset. There's great potential for wasted CPU cycles.


--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-05 Thread Steve Lawrence
On Thu, Mar 05, 2009 at 01:22:25PM -0500, Jeff Victor wrote:
 Thanks for the great feedback Gael. Comments below.
 
 On Thu, Mar 5, 2009 at 11:00 AM, Gael gael.marti...@gmail.com wrote:
 
  On Wed, Mar 4, 2009 at 9:06 AM, Jeff Victor jeff.j.vic...@gmail.com wrote:
 
  Some questions:
  1. Do you use set pool= anymore, now that the dedicated-cpu feature 
  exists?
 
  We got over one hundred physical frames running zones here, covering nearly
  all versions of Solaris 10, we are currently sticking to set pool until we
  can get the whole environment upgraded. Before that, cannot afford to have
  the whole team of admins handling zones differently depending on the OS
  version. Headache...
 
 It is now clear to me that this feature would need to support
 disabling interrupts when a zone uses set pool=. Currently, all pool
 attributes are configured using the pool tools (poolcfg, pooladm) and
 I don't see any reason to not continue. When I write this up, it will
 fulfill that need.

Ae you proposing that we add support for pset-interrupt disposition config
to the pools framework?  Such as a property on a pool-pset
boolean pset.interrupts = false??

I think the right solution for pool= is this or similar.  It could also
be a string value, such as:

none  no interrupts handled on cpus in the pool-pset.
zone  Device interrupts for bound zones are serviced.
any   Any device interrupts can be dispatched to the pset.

Zonecfg could make use of these pool-pset properties to implement the
desired behavior for dedicated-cpu.

The default value should be any.  zonecfg should set zone for all
dedicated-cpu zones.  zoneadm could warn if pool= is set, the zone has
dedicated devices, zone the pset for that pool has not been configured to
be zone.

legacy psets (psrset) could be extended to support this property via some
new flags.

Ther other part of this is how to reconsile zonecfg and/or pools settings
for interrupts, with device-cpu mappings that are specified via dladm.
Currently, dladm allows the specification of a list of cpu ids.  Another
way to approach this would be to point dladm directly at the desired pool.

-Steve
 
  2. Is it sufficient to simply disable interrupts on a zone's pset?
 
  In our case, we do pset only when licensing requires it (aka
  oracle,datastage,sybase,borland apps) or when the applications behave poorly
  and we keep hearing that by lack of budget/resources, the issue cannot be
  addressed and without direct impact on the business itself, nothing will
  change.
 
 Gael, I realized that my question was vague. When you use a pool,
 you're using a pset. Do you mean that you only use pools and psets
 when licensing requires it?
 
 Also, I couldn't tell how the comment responded to the question.
 
  What about creating an IO pset, and then disabling the interrupt on
  everything else while using it as a FSS pool or psets pools ? Very similar
  to ldom I would think...
 
 Yes, that occurred to me, too. You can do that now, either with a pset
 that's being used by a zone or with the default pset. But I'm not
 convinced there's enough reason to separate an I/O pset from the
 default pset. There's great potential for wasted CPU cycles.
 
 
 --JeffV
 ___
 zones-discuss mailing list
 zones-discuss@opensolaris.org
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-05 Thread Jeff Victor
On Thu, Mar 5, 2009 at 1:48 PM, Steve Lawrence stephen.lawre...@sun.com wrote:
 On Thu, Mar 05, 2009 at 01:22:25PM -0500, Jeff Victor wrote:
 On Thu, Mar 5, 2009 at 11:00 AM, Gael gael.marti...@gmail.com wrote:
  On Wed, Mar 4, 2009 at 9:06 AM, Jeff Victor jeff.j.vic...@gmail.com 
  wrote:
 
  Some questions:
  1. Do you use set pool= anymore, now that the dedicated-cpu feature 
  exists?
 
 It is now clear to me that this feature would need to support
 disabling interrupts when a zone uses set pool=. Currently, all pool
 attributes are configured using the pool tools (poolcfg, pooladm) and
 I don't see any reason to not continue. When I write this up, it will
 fulfill that need.

 Ae you proposing that we add support for pset-interrupt disposition config
 to the pools framework?  Such as a property on a pool-pset
boolean pset.interrupts = false??

The short answer is yes.  BobN and I came to the same conclusion
just a few hours ago... :-)

CPUs already have cpu.status which can be  on-line, no-intr (LWPs but
no interrupt handlers), or off-line (no LWPs but still able to handle
interrupts). A pset.interrupts field would allow Solaris to set
cpu.status on CPUs as they enter the pset.  Zones could then use that
so we can increase their isolation. When a CPU re-enters the default
pset, it becomes able to handle interrupts again. When needed, intrd
will give it one (or more).



 I think the right solution for pool= is this or similar.  It could also
 be a string value, such as:

none  no interrupts handled on cpus in the pool-pset.
zone  Device interrupts for bound zones are serviced.
any   Any device interrupts can be dispatched to the pset.

I don't see how we could do zone in all situations - there isn't a
1:1 mapping between zone and device (except for exclusive-IP).

 Imagine zoneA and zoneB on a pset (psetAB) with pset.interrupts=zone.
Further, zoneA and zoneC share e1000g0, but zoneB doesn't. Finally,
zoneC has its own pset. Where does the interrupt handler for e1000g0
go - psetAB or psetC?

Or are you suggesting that interrupts from one device can be
intercepted and diverted to a CPU associated with a specific pset,
based on which process the interrupt is/should be associated with?

Or am I misunderstanding the description of zone?


 Zonecfg could make use of these pool-pset properties to implement the
 desired behavior for dedicated-cpu.

Exactly.

 The default value should be any.  zonecfg should set zone for all
 dedicated-cpu zones.  zoneadm could warn if pool= is set, the zone has
 dedicated devices, zone the pset for that pool has not been configured to
 be zone.

The only devices we can be sure are dedicated for the boot-session of
a zone are NICs. So this whole segregate the interrupts per zone/pset
combo will be limited at best. It would be nice if we could
generalize it like you say, but I don't think it's workable yet.

 legacy psets (psrset) could be extended to support this property via some new 
 flags.

 Ther other part of this is how to reconsile zonecfg and/or pools settings
 for interrupts, with device-cpu mappings that are specified via dladm.
 Currently, dladm allows the specification of a list of cpu ids.  Another
 way to approach this would be to point dladm directly at the desired pool.

Which currently are you on? :-)  I'm on NV94 and I don't see
anything like that in dladm(1M)

I'm beginning to think this is really a two-phase project:
* Phase 1: make it easier to disable interrupts on a zone's pset (one
configured with the pool property or dedicated-cpu resource)
* Phase 2: optimize this by enabling a zone's pset to handle
interrupts from a device which is exclusively bound to this zone.

I think that most people that need any of this only need Phase 1.
Philosophically, shifting interrupt handlers into the default pset is
consistent with the original zones principles: hardware is part of the
platform, not part of a zone. So I'm not even convinced that we should
be allowing zones' psets to selectively attract  interrupt handlers.


Great conversation!

--JeffV

  2. Is it sufficient to simply disable interrupts on a zone's pset?
 
  In our case, we do pset only when licensing requires it (aka
  oracle,datastage,sybase,borland apps) or when the applications behave 
  poorly
  and we keep hearing that by lack of budget/resources, the issue cannot be
  addressed and without direct impact on the business itself, nothing will
  change.

 Gael, I realized that my question was vague. When you use a pool,
 you're using a pset. Do you mean that you only use pools and psets
 when licensing requires it?

 Also, I couldn't tell how the comment responded to the question.

  What about creating an IO pset, and then disabling the interrupt on
  everything else while using it as a FSS pool or psets pools ? Very similar
  to ldom I would think...

 Yes, that occurred to me, too. You can do that now, either with a pset
 that's being used by a zone or with the 

Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-05 Thread Steve Lawrence
On Thu, Mar 05, 2009 at 04:12:19PM -0500, Jeff Victor wrote:
 On Thu, Mar 5, 2009 at 1:48 PM, Steve Lawrence stephen.lawre...@sun.com 
 wrote:
  On Thu, Mar 05, 2009 at 01:22:25PM -0500, Jeff Victor wrote:
  On Thu, Mar 5, 2009 at 11:00 AM, Gael gael.marti...@gmail.com wrote:
   On Wed, Mar 4, 2009 at 9:06 AM, Jeff Victor jeff.j.vic...@gmail.com 
   wrote:
  
   Some questions:
   1. Do you use set pool= anymore, now that the dedicated-cpu feature 
   exists?
  
  It is now clear to me that this feature would need to support
  disabling interrupts when a zone uses set pool=. Currently, all pool
  attributes are configured using the pool tools (poolcfg, pooladm) and
  I don't see any reason to not continue. When I write this up, it will
  fulfill that need.
 
  Ae you proposing that we add support for pset-interrupt disposition config
  to the pools framework?  Such as a property on a pool-pset
 boolean pset.interrupts = false??
 
 The short answer is yes.  BobN and I came to the same conclusion
 just a few hours ago... :-)
 
 CPUs already have cpu.status which can be  on-line, no-intr (LWPs but
 no interrupt handlers), or off-line (no LWPs but still able to handle
 interrupts). A pset.interrupts field would allow Solaris to set
 cpu.status on CPUs as they enter the pset.  Zones could then use that
 so we can increase their isolation. When a CPU re-enters the default
 pset, it becomes able to handle interrupts again. When needed, intrd
 will give it one (or more).
 
 
 
  I think the right solution for pool= is this or similar.  It could also
  be a string value, such as:
 
 none  no interrupts handled on cpus in the pool-pset.
 zone  Device interrupts for bound zones are serviced.
 any   Any device interrupts can be dispatched to the pset.
 
 I don't see how we could do zone in all situations - there isn't a
 1:1 mapping between zone and device (except for exclusive-IP).
 
  Imagine zoneA and zoneB on a pset (psetAB) with pset.interrupts=zone.
 Further, zoneA and zoneC share e1000g0, but zoneB doesn't. Finally,
 zoneC has its own pset. Where does the interrupt handler for e1000g0
 go - psetAB or psetC?

I was thinking in the exclusive case.  For shared stack zones, the devices
would all be bound to the global zone's (aka default) pset.

 
 Or are you suggesting that interrupts from one device can be
 intercepted and diverted to a CPU associated with a specific pset,
 based on which process the interrupt is/should be associated with?

No, although I'm not sure what is configurable for vnics.  It may be possible
for shared stack zones using a exclusive vnic (not exclusive stack) to have
some of the vnic workload bound to it's pset.

 
 Or am I misunderstanding the description of zone?
 
 
  Zonecfg could make use of these pool-pset properties to implement the
  desired behavior for dedicated-cpu.
 
 Exactly.
 
  The default value should be any.  zonecfg should set zone for all
  dedicated-cpu zones.  zoneadm could warn if pool= is set, the zone has
  dedicated devices, zone the pset for that pool has not been configured to
  be zone.
 
 The only devices we can be sure are dedicated for the boot-session of
 a zone are NICs. So this whole segregate the interrupts per zone/pset
 combo will be limited at best. It would be nice if we could
 generalize it like you say, but I don't think it's workable yet.

Agreed.  This is really just for network devices at this point.

 
  legacy psets (psrset) could be extended to support this property via some 
  new flags.
 
  Ther other part of this is how to reconsile zonecfg and/or pools settings
  for interrupts, with device-cpu mappings that are specified via dladm.
  Currently, dladm allows the specification of a list of cpu ids.  Another
  way to approach this would be to point dladm directly at the desired pool.
 
 Which currently are you on? :-)  I'm on NV94 and I don't see
 anything like that in dladm(1M)

Crossbow when into 105.

http://blogs.sun.com/nitin/entry/resource_allocation_for_network_processing

 
 I'm beginning to think this is really a two-phase project:
 * Phase 1: make it easier to disable interrupts on a zone's pset (one
 configured with the pool property or dedicated-cpu resource)
 * Phase 2: optimize this by enabling a zone's pset to handle
 interrupts from a device which is exclusively bound to this zone.

As long as phase one is compatable with phase two, meaning that this
case such as this one are properly defined:
1. pool mypool has property interrupts=disabled.
2. Zone has pool=mypool
3. Zone property stating to bind network interrupts to pool.

One solution would be to alow this config, and bind the net interrupts to
mypool anyway.  Another would be to only allow auto-net-binding in zonecfg
when using dedicated-cpu.

 
 I think that most people that need any of this only need Phase 1.

Agreed.

 Philosophically, shifting interrupt handlers into the default pset is
 consistent with the 

Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-04 Thread Jeff Victor
I have received several private comments expressing interest in this
topic, so I'd like to generate more discussion and attempt to focus on
a solution that meets most or all of the needs.

Summary of problem:
-
A zone can be configured so that its processes do not run on the CPUs
in the default pset, but in a different pset. The zone can have
exclusive access to those CPUs, or one or more other zones can be
configured to share that pset. Zone configuration is not aware of
interrupt handlers.

When Solaris boots, it must assign each device's interrupt handler to
a CPU. It does so without knowledge of psets or zones.

The lack of awareness of integration between zones and interrupt
handlers leads to situations where heavy CPU utilization in one zone
can lead to performance or performance-related problems in other
zones. For example, the interrupt handler for a network interface may
be assigned to a CPU that is later also assigned to a zone which
doesn't use that NIC. This can cause dropped network packets that are
to/from zones which are not using that CPU.

These problems violate the main goal of zones: workload isolation.
-

The first order of magnitude solution is to simply disable
interrupts on zones which are assigned to non-default psets. This is
often effective, but in practice requires custom scripts. Management
of those scripts across a data center can be burdensome or even
overwhelming.

In addition, that solution may not meet the goal of workload
isolation. A system could be configured with multiple zones that have
separate (exclusive) NICs  and CPUs. Disabling interrupts on the
zones' psets will move all interrupt handling into the default pset.
Solaris might assign all of the NIC interrupt handlers to one of those
CPUs. Network activity generated by one zone could interfere with the
ability to quickly handle network traffic associated with a different
zone.

Therefore, it might be desirable to configure an exclusive-IP zone so
that the interrupt handler for its NIC(s) are assigned to CPUs in that
zone's pset.

Here are some possible solutions:
1. Add to zonecfg a property which requires that a zone's CPUs not
handle interrupts. The syntax could be simple:

zonecfg -z myzone
set interrupts=disabled
exit

If the zone is configured to run in the default pset, 'verify' shoudl
fail, and the zone should refuse to boot. It's not clear what should
happen if the zone is booting into a shared pset that allready has
zones *and* interrupt handlers.

2. Place an interrupt property in the dedicated-cpu feature.

zonecfg -z myzone
add dedicated-cpu
 set ncpus=4
 set interrupts=disabled
end
exit

That syntax doesn't handle zones which use set pool=.

3. Associate an interrupt property with the exclusive-ip feature to
allow the user to specify that all non-network interrupt handlers
should be moved to the default pset, and interrupts for this zone's
NIC should be handled by this zone's pset.

zonecfg -z myzone
set ip-type=exclusive
add net
  set physlcal=e1000g0
  set interrupts=enabled
  end
exit

Another NIC in that zone would have a separate 'interrupts' property.
Its interrupts could also be handled by this zone's pset or by the
default pset.

Some questions:
1. Do you use set pool= anymore, now that the dedicated-cpu feature exists?

2. Is it sufficient to simply disable interrupts on a zone's pset?

3. Are there any other devices which (A) can be assigned exclusively
to a zone (via 'set match') and generate enough interrupts to cause
problems?

4. Implementing (1) or (2) should be relatively simple. Choice (3)
might be significantly more effort, and might delay any of this
functionality. Which is better: more granular configuration of
interrupt handling or faster relief? (Either way, I wouldn't expect
Sun to do this during CY2009. However, if you have sufficient interest
and ability... :-) ).

--JeffV

On Tue, Mar 3, 2009 at 11:26 PM, Jeff Victor jeff.j.vic...@gmail.com wrote:
 On Tue, Mar 3, 2009 at 8:39 PM, Gael gael.marti...@gmail.com wrote:

 Many thanks to  Bob Netherton and Jeff for their quick help on that painful 
 issue.
 The solution was to use psrset -f on the heavily used pset.
 It is fully supported and a recommended situation when CPU starvation causes
 interrupts not to be serviced in time and they get lost.   Credit goes to 
 Rickey Weisner for this tip.

 I have monitored that zone today for multiple hours without seeing any
 packet loss while it was cranking up its cpu usage...
 Jeff, following a previous mail today, as a fervent customer ;), I would
 love to see that feature directly accessible thru the zone configuration to
 avoid having to create a script and a dirty workaround to enable that
 feature on boot. Is there a RFE # out there that I can be added to thru Sun
 Support ? Got a case opened on that issue.

 Yes, the CR is 6199531 - Device interrupts not bound to cpus
 configured within a nonglobal zone

 Please ask your contact in Sun Service to add an SR for you.

 Will 

Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-03 Thread Jeff Victor
Hello Gael,

On Mon, Mar 2, 2009 at 10:08 PM, Gael gael.marti...@gmail.com wrote:
 Hello

 Got a zone running SAS with cpu capping enabled using a processor set as we
 see a few processes using quite a bit of cpu there too often.

Is that zone assigned to a resource pool, or is it using the
dedicated-cpus feature?

 When the process is running (chewing 100% of its pset), the frame nic (server 
 is a E2900 with a ce interface) is dropping 20-30 % of its packets
 causing a headache.

My first guess is that the NICs interrupts are going to a CPU that the
zone is using, and the CPU doesn't have enough power to run the zone's
workload *and* be an effective NIC interrupt handler.

Please run the intrstat command as root in the global zone, to
determine which CPU is handling interrupts for that NIC. Also, check
which CPU(s) that zone can use.

Please let us know what you learn from those.


 Doesn't appear to be a network load issue. Not a lot happening there visibly.

 With Solaris 10 u4 or u6, what elegant way would you recommend to avoid that
 disruption caused by a single zone ?

 Regards

 --
 Gael


 ___
 zones-discuss mailing list
 zones-discuss@opensolaris.org




-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-03 Thread Gael
Many thanks to  Bob Netherton and Jeff for their quick help on that painful
issue.
The solution was to use psrset -f on the heavily used pset.

It is fully supported and a recommended situation when CPU starvation causes
interrupts not to be serviced in
time and they get lost.   Credit goes to Rickey Weisner for this tip.

I have monitored that zone today for multiple hours without seeing any
packet loss while it was cranking up its cpu usage...

Jeff, following a previous mail today, as a fervent customer ;), I would
love to see that feature directly accessible thru the zone configuration to
avoid having to create a script and a dirty workaround to enable that
feature on boot. Is there a RFE # out there that I can be added to thru Sun
Support ? Got a case opened on that issue.

Will continue to monitor the situation for a few days, and if I see anything
wrong, I will update that thread

Again, thanks !

Regards

On Tue, Mar 3, 2009 at 2:19 PM, Jeff Victor jeff.j.vic...@gmail.com wrote:

 Hello Gael,

 On Mon, Mar 2, 2009 at 10:08 PM, Gael gael.marti...@gmail.com wrote:
  Hello
 
  Got a zone running SAS with cpu capping enabled using a processor set as
 we
  see a few processes using quite a bit of cpu there too often.

 Is that zone assigned to a resource pool, or is it using the
 dedicated-cpus feature?

  When the process is running (chewing 100% of its pset), the frame nic
 (server is a E2900 with a ce interface) is dropping 20-30 % of its packets
  causing a headache.

 My first guess is that the NICs interrupts are going to a CPU that the
 zone is using, and the CPU doesn't have enough power to run the zone's
 workload *and* be an effective NIC interrupt handler.

 Please run the intrstat command as root in the global zone, to
 determine which CPU is handling interrupts for that NIC. Also, check
 which CPU(s) that zone can use.

 Please let us know what you learn from those.


  Doesn't appear to be a network load issue. Not a lot happening there
 visibly.
 
  With Solaris 10 u4 or u6, what elegant way would you recommend to avoid
 that
  disruption caused by a single zone ?
 
  Regards
 
  --
  Gael
 
 
  ___
  zones-discuss mailing list
  zones-discuss@opensolaris.org
 



 --
 --JeffV




-- 
Gael Martinez
___
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] Zone in a pset with high load generating high packet loss at the frame level

2009-03-03 Thread Jeff Victor
On Tue, Mar 3, 2009 at 8:39 PM, Gael gael.marti...@gmail.com wrote:

 Many thanks to  Bob Netherton and Jeff for their quick help on that painful 
 issue.
 The solution was to use psrset -f on the heavily used pset.
 It is fully supported and a recommended situation when CPU starvation causes
 interrupts not to be serviced in time and they get lost.   Credit goes to 
 Rickey Weisner for this tip.

 I have monitored that zone today for multiple hours without seeing any
 packet loss while it was cranking up its cpu usage...
 Jeff, following a previous mail today, as a fervent customer ;), I would
 love to see that feature directly accessible thru the zone configuration to
 avoid having to create a script and a dirty workaround to enable that
 feature on boot. Is there a RFE # out there that I can be added to thru Sun
 Support ? Got a case opened on that issue.

Yes, the CR is 6199531 - Device interrupts not bound to cpus
configured within a nonglobal zone

Please ask your contact in Sun Service to add an SR for you.

 Will continue to monitor the situation for a few days, and if I see anything 
 wrong, I will update that thread
 Again, thanks !
 Regards

 On Tue, Mar 3, 2009 at 2:19 PM, Jeff Victor jeff.j.vic...@gmail.com wrote:
 - Show quoted text -

 Hello Gael,

 On Mon, Mar 2, 2009 at 10:08 PM, Gael gael.marti...@gmail.com wrote:
  Hello
 
  Got a zone running SAS with cpu capping enabled using a processor set as we
  see a few processes using quite a bit of cpu there too often.

 Is that zone assigned to a resource pool, or is it using the
 dedicated-cpus feature?

  When the process is running (chewing 100% of its pset), the frame nic
  (server is a E2900 with a ce interface) is dropping 20-30 % of its packets
  causing a headache.

 My first guess is that the NICs interrupts are going to a CPU that the
 zone is using, and the CPU doesn't have enough power to run the zone's
 workload *and* be an effective NIC interrupt handler.

 Please run the intrstat command as root in the global zone, to
 determine which CPU is handling interrupts for that NIC. Also, check
 which CPU(s) that zone can use.

 Please let us know what you learn from those.

  Doesn't appear to be a network load issue. Not a lot happening there 
  visibly.
 
  With Solaris 10 u4 or u6, what elegant way would you recommend to avoid 
  that
  disruption caused by a single zone ?



-- 
--JeffV
___
zones-discuss mailing list
zones-discuss@opensolaris.org