I know jarrod has made several changes to the ipmi code after the 2.7.5 build. For now you could try grabbing the latest version of these 2 files in the 2.7 branch and see if it helps your problem:
cd /opt/xcat/lib/perl/xCAT mv IPMI.pm IPMI.pm.orig wget http://svn.code.sf.net/p/xcat/code/xcat-core/branches/2.7/xCAT-server/lib/perl/xCAT/IPMI.pm cd /opt/xcat/lib/perl/xCAT_plugin mv ipmi.pm ipmi.pm.orig wget http://svn.code.sf.net/p/xcat/code/xcat-core/branches/2.7/xCAT-server/lib/xcat/plugins/ipmi.pm If you try this, let me know if it fixes your problem. Hopefully tomorrow jarrod can look at the error. Bruce Potter STSM, Linux & AIX Cluster Development, IBM, Poughkeepsie, NY Email: [email protected] Phone: external: 845-433-7073, internal: TL 293-7073 From: Stuart Barkley <[email protected]> To: xCAT Users Mailing list <[email protected]>, Date: 11/03/2012 07:14 PM Subject: [xcat-user] Error: 1 code on opening RMCP+ session (was: xCAT 2.7.5 is released) On Mon, 29 Oct 2012 at 14:39 -0000, Lissa Valletta wrote: > xCAT 2.7.5 release is now available on the download page. I have installed 2.7.5 on two of our IBM clusters and am seeing a new problem with the IPMI support. I'm getting a significant number of new RMCP+ errors and eventually timeouts. I haven't done a lot of testing yet, but with 120 to 260 nodes the errors occur on nearly each request. With a smaller number of nodes (~20) I don't see these errors. I note that there are significant changes between 2.7.4 and 2.7.5 in 2.7/xCAT-server/lib/perl/xCAT/IPMI.pm regarding timeouts and IPMI login state transitions. I didn't study the changes or the file revision history closely. One one column of a dx360 M2 iDataPlex cluster: # date; rvitals rack1a led | xcoll Sat Nov 3 18:28:55 EDT 2012 mc036: Error: 1 code on opening RMCP+ session mc025: Error: 1 code on opening RMCP+ session mc033: Error: 1 code on opening RMCP+ session mc025: Error: 1 code on opening RMCP+ session mc036: Error: 1 code on opening RMCP+ session mc033: Error: 1 code on opening RMCP+ session mc025: Error: 1 code on opening RMCP+ session mc036: Error: 1 code on opening RMCP+ session mc033: Error: 1 code on opening RMCP+ session mc033: Error: 1 code on opening RMCP+ session mc025: Error: 1 code on opening RMCP+ session mc036: Error: 1 code on opening RMCP+ session mc025: Error: 1 code on opening RMCP+ session mc033: Error: 1 code on opening RMCP+ session mc036: Error: 1 code on opening RMCP+ session mc025: Error: 1 code on opening RMCP+ session mc036: Error: 1 code on opening RMCP+ session mc033: Error: 1 code on opening RMCP+ session mc033: Error: 1 code on opening RMCP+ session mc025: Error: 1 code on opening RMCP+ session mc025: Error: timeout mc033: Error: timeout ==================================== mc024,mc003,mc018,mc002,mc001,mc016,mc036,mc026,mc017,mc039,mc006,mc008,mc032,mc038,mc007,mc037,mc029,mc009,mc012,mc042,mc034,mc023,mc030,mc035,mc020,mc013,mc041,mc005,mc015,mc027,mc010,mc014,mc040,mc011,mc028,mc021,mc031,mc019,mc022,mc004 ==================================== No active error LEDs detected # On 110 x3650 M2 servers: # date; rvitals bc-compute led | xcoll Sat Nov 3 18:31:55 EDT 2012 bc050: Error: 1 code on opening RMCP+ session bc036: Error: 1 code on opening RMCP+ session bc050: Error: 1 code on opening RMCP+ session bc036: Error: 1 code on opening RMCP+ session bc042: Error: 1 code on opening RMCP+ session bc042: Error: 1 code on opening RMCP+ session bc050: Error: 1 code on opening RMCP+ session bc036: Error: 1 code on opening RMCP+ session bc042: Error: 1 code on opening RMCP+ session bc050: Error: 1 code on opening RMCP+ session bc036: Error: 1 code on opening RMCP+ session bc042: Error: 1 code on opening RMCP+ session bc050: Error: 1 code on opening RMCP+ session bc036: Error: 1 code on opening RMCP+ session bc042: Error: 1 code on opening RMCP+ session bc050: Error: 1 code on opening RMCP+ session bc036: Error: 1 code on opening RMCP+ session bc042: Error: 1 code on opening RMCP+ session bc050: Error: 1 code on opening RMCP+ session bc036: Error: 1 code on opening RMCP+ session bc042: Error: 1 code on opening RMCP+ session bc050: Error: timeout bc036: Error: timeout bc042: Error: timeout ==================================== T06,S11,S10,T10,bc023,bc033,bc021,bc027,bc044,bc022,bc017,bc054,bc057,bc061,bc039,bc063,bc059,bc029,bc052,bc047,bc019,bc048,bc041,bc018,bc026,bc040,bc056,bc053,bc035,bc051,bc032,bc049,bc028,bc058,bc062,bc043,bc030,bc045,bc025,bc024,bc038,bc034,bc046,bc031,bc060,bc055,bc020,bc037 ==================================== No active error LEDs detected # Each time problems are reported on different nodes. Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone ------------------------------------------------------------------------------ LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
<<inline: graycol.gif>>
------------------------------------------------------------------------------ LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
