FYI, I just committed what passed a large scale multi-run rinv and rvitals
tests with and without cache data for a thousand nodes.

The '1 code' errors may take a minute to quiet down, those are due to the
IMMs having used up all session slots due to unclosed sessions.



From:   Bruce M Potter/Poughkeepsie/IBM@IBMUS
To:     xCAT Users Mailing list <[email protected]>,
Date:   11/04/2012 10:06 AM
Subject:        Re: [xcat-user] Error: 1 code on opening RMCP+ session (was:
            xCAT 2.7.5 is released)



I know jarrod has made several changes to the ipmi code after the 2.7.5
build.  For now you could try grabbing the latest version of these 2 files
in the 2.7 branch and see if it helps your problem:

cd /opt/xcat/lib/perl/xCAT
mv IPMI.pm IPMI.pm.orig
wget
http://svn.code.sf.net/p/xcat/code/xcat-core/branches/2.7/xCAT-server/lib/perl/xCAT/IPMI.pm


cd /opt/xcat/lib/perl/xCAT_plugin
mv ipmi.pm ipmi.pm.orig
wget
http://svn.code.sf.net/p/xcat/code/xcat-core/branches/2.7/xCAT-server/lib/xcat/plugins/ipmi.pm


If you try this, let me know if it fixes your problem.  Hopefully tomorrow
jarrod can look at the error.

Bruce Potter        STSM, Linux & AIX Cluster Development, IBM,
Poughkeepsie, NY
Email: [email protected]    Phone:  external: 845-433-7073, internal: TL
293-7073


Inactive hide details for Stuart Barkley ---11/03/2012 07:14:27 PM---On
Mon, 29 Oct 2012 at 14:39 -0000, Lissa Valletta wrote: Stuart Barkley
---11/03/2012 07:14:27 PM---On Mon, 29 Oct 2012 at 14:39 -0000, Lissa
Valletta wrote: > xCAT 2.7.5 release is now available on t

From: Stuart Barkley <[email protected]>
To: xCAT Users Mailing list <[email protected]>,
Date: 11/03/2012 07:14 PM
Subject: [xcat-user] Error: 1 code on opening RMCP+ session (was: xCAT
2.7.5 is released)



On Mon, 29 Oct 2012 at 14:39 -0000, Lissa Valletta wrote:

> xCAT 2.7.5 release is now available on the download page.

I have installed 2.7.5 on two of our IBM clusters and am seeing a new
problem with the IPMI support.

I'm getting a significant number of new RMCP+ errors and eventually
timeouts.  I haven't done a lot of testing yet, but with 120 to 260
nodes the errors occur on nearly each request.  With a smaller number
of nodes (~20) I don't see these errors.

I note that there are significant changes between 2.7.4 and 2.7.5 in
2.7/xCAT-server/lib/perl/xCAT/IPMI.pm regarding timeouts and IPMI
login state transitions.  I didn't study the changes or the file
revision history closely.

One one column of a dx360 M2 iDataPlex cluster:

   # date; rvitals rack1a led | xcoll
   Sat Nov  3 18:28:55 EDT 2012
   mc036: Error: 1 code on opening RMCP+ session
   mc025: Error: 1 code on opening RMCP+ session
   mc033: Error: 1 code on opening RMCP+ session
   mc025: Error: 1 code on opening RMCP+ session
   mc036: Error: 1 code on opening RMCP+ session
   mc033: Error: 1 code on opening RMCP+ session
   mc025: Error: 1 code on opening RMCP+ session
   mc036: Error: 1 code on opening RMCP+ session
   mc033: Error: 1 code on opening RMCP+ session
   mc033: Error: 1 code on opening RMCP+ session
   mc025: Error: 1 code on opening RMCP+ session
   mc036: Error: 1 code on opening RMCP+ session
   mc025: Error: 1 code on opening RMCP+ session
   mc033: Error: 1 code on opening RMCP+ session
   mc036: Error: 1 code on opening RMCP+ session
   mc025: Error: 1 code on opening RMCP+ session
   mc036: Error: 1 code on opening RMCP+ session
   mc033: Error: 1 code on opening RMCP+ session
   mc033: Error: 1 code on opening RMCP+ session
   mc025: Error: 1 code on opening RMCP+ session
   mc025: Error: timeout
   mc033: Error: timeout
   ====================================

mc024,mc003,mc018,mc002,mc001,mc016,mc036,mc026,mc017,mc039,mc006,mc008,mc032,mc038,mc007,mc037,mc029,mc009,mc012,mc042,mc034,mc023,mc030,mc035,mc020,mc013,mc041,mc005,mc015,mc027,mc010,mc014,mc040,mc011,mc028,mc021,mc031,mc019,mc022,mc004

   ====================================
   No active error LEDs detected

   #

   On 110 x3650 M2 servers:

   # date; rvitals bc-compute led | xcoll
   Sat Nov  3 18:31:55 EDT 2012
   bc050: Error: 1 code on opening RMCP+ session
   bc036: Error: 1 code on opening RMCP+ session
   bc050: Error: 1 code on opening RMCP+ session
   bc036: Error: 1 code on opening RMCP+ session
   bc042: Error: 1 code on opening RMCP+ session
   bc042: Error: 1 code on opening RMCP+ session
   bc050: Error: 1 code on opening RMCP+ session
   bc036: Error: 1 code on opening RMCP+ session
   bc042: Error: 1 code on opening RMCP+ session
   bc050: Error: 1 code on opening RMCP+ session
   bc036: Error: 1 code on opening RMCP+ session
   bc042: Error: 1 code on opening RMCP+ session
   bc050: Error: 1 code on opening RMCP+ session
   bc036: Error: 1 code on opening RMCP+ session
   bc042: Error: 1 code on opening RMCP+ session
   bc050: Error: 1 code on opening RMCP+ session
   bc036: Error: 1 code on opening RMCP+ session
   bc042: Error: 1 code on opening RMCP+ session
   bc050: Error: 1 code on opening RMCP+ session
   bc036: Error: 1 code on opening RMCP+ session
   bc042: Error: 1 code on opening RMCP+ session
   bc050: Error: timeout
   bc036: Error: timeout
   bc042: Error: timeout
   ====================================

T06,S11,S10,T10,bc023,bc033,bc021,bc027,bc044,bc022,bc017,bc054,bc057,bc061,bc039,bc063,bc059,bc029,bc052,bc047,bc019,bc048,bc041,bc018,bc026,bc040,bc056,bc053,bc035,bc051,bc032,bc049,bc028,bc058,bc062,bc043,bc030,bc045,bc025,bc024,bc038,bc034,bc046,bc031,bc060,bc055,bc020,bc037

   ====================================
   No active error LEDs detected

   #

Each time problems are reported on different nodes.

Stuart Barkley
--
I've never been lost; I was once bewildered for three days, but never lost!
                                       --  Daniel Boone

------------------------------------------------------------------------------

LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

------------------------------------------------------------------------------

LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

<<inline: graycol.gif>>

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to