On Mon, 29 Oct 2012 at 14:39 -0000, Lissa Valletta wrote:
> xCAT 2.7.5 release is now available on the download page.
I have installed 2.7.5 on two of our IBM clusters and am seeing a new
problem with the IPMI support.
I'm getting a significant number of new RMCP+ errors and eventually
timeouts. I haven't done a lot of testing yet, but with 120 to 260
nodes the errors occur on nearly each request. With a smaller number
of nodes (~20) I don't see these errors.
I note that there are significant changes between 2.7.4 and 2.7.5 in
2.7/xCAT-server/lib/perl/xCAT/IPMI.pm regarding timeouts and IPMI
login state transitions. I didn't study the changes or the file
revision history closely.
One one column of a dx360 M2 iDataPlex cluster:
# date; rvitals rack1a led | xcoll
Sat Nov 3 18:28:55 EDT 2012
mc036: Error: 1 code on opening RMCP+ session
mc025: Error: 1 code on opening RMCP+ session
mc033: Error: 1 code on opening RMCP+ session
mc025: Error: 1 code on opening RMCP+ session
mc036: Error: 1 code on opening RMCP+ session
mc033: Error: 1 code on opening RMCP+ session
mc025: Error: 1 code on opening RMCP+ session
mc036: Error: 1 code on opening RMCP+ session
mc033: Error: 1 code on opening RMCP+ session
mc033: Error: 1 code on opening RMCP+ session
mc025: Error: 1 code on opening RMCP+ session
mc036: Error: 1 code on opening RMCP+ session
mc025: Error: 1 code on opening RMCP+ session
mc033: Error: 1 code on opening RMCP+ session
mc036: Error: 1 code on opening RMCP+ session
mc025: Error: 1 code on opening RMCP+ session
mc036: Error: 1 code on opening RMCP+ session
mc033: Error: 1 code on opening RMCP+ session
mc033: Error: 1 code on opening RMCP+ session
mc025: Error: 1 code on opening RMCP+ session
mc025: Error: timeout
mc033: Error: timeout
====================================
mc024,mc003,mc018,mc002,mc001,mc016,mc036,mc026,mc017,mc039,mc006,mc008,mc032,mc038,mc007,mc037,mc029,mc009,mc012,mc042,mc034,mc023,mc030,mc035,mc020,mc013,mc041,mc005,mc015,mc027,mc010,mc014,mc040,mc011,mc028,mc021,mc031,mc019,mc022,mc004
====================================
No active error LEDs detected
#
On 110 x3650 M2 servers:
# date; rvitals bc-compute led | xcoll
Sat Nov 3 18:31:55 EDT 2012
bc050: Error: 1 code on opening RMCP+ session
bc036: Error: 1 code on opening RMCP+ session
bc050: Error: 1 code on opening RMCP+ session
bc036: Error: 1 code on opening RMCP+ session
bc042: Error: 1 code on opening RMCP+ session
bc042: Error: 1 code on opening RMCP+ session
bc050: Error: 1 code on opening RMCP+ session
bc036: Error: 1 code on opening RMCP+ session
bc042: Error: 1 code on opening RMCP+ session
bc050: Error: 1 code on opening RMCP+ session
bc036: Error: 1 code on opening RMCP+ session
bc042: Error: 1 code on opening RMCP+ session
bc050: Error: 1 code on opening RMCP+ session
bc036: Error: 1 code on opening RMCP+ session
bc042: Error: 1 code on opening RMCP+ session
bc050: Error: 1 code on opening RMCP+ session
bc036: Error: 1 code on opening RMCP+ session
bc042: Error: 1 code on opening RMCP+ session
bc050: Error: 1 code on opening RMCP+ session
bc036: Error: 1 code on opening RMCP+ session
bc042: Error: 1 code on opening RMCP+ session
bc050: Error: timeout
bc036: Error: timeout
bc042: Error: timeout
====================================
T06,S11,S10,T10,bc023,bc033,bc021,bc027,bc044,bc022,bc017,bc054,bc057,bc061,bc039,bc063,bc059,bc029,bc052,bc047,bc019,bc048,bc041,bc018,bc026,bc040,bc056,bc053,bc035,bc051,bc032,bc049,bc028,bc058,bc062,bc043,bc030,bc045,bc025,bc024,bc038,bc034,bc046,bc031,bc060,bc055,bc020,bc037
====================================
No active error LEDs detected
#
Each time problems are reported on different nodes.
Stuart Barkley
--
I've never been lost; I was once bewildered for three days, but never lost!
-- Daniel Boone
------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user