Frankly, I haven’t had the opportunity to test as high as I would like.  So far 
I can only first hand vouch for 500 with console logging enabled, which is 
about where we felt comfortable with conserver in general.

From: banuchka [mailto:tyrche...@gmail.com]
Sent: Wednesday, May 03, 2017 3:14 PM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

Jarrod, what do you think about max/stable number of servers(with full ipmi 
logging) is fine for one Confluent instance?
--
banuchka

On 3 May 2017 at 20:01:41, banuchka 
(tyrche...@gmail.com<mailto:tyrche...@gmail.com>) wrote:
Tomorrow I’ll try to an one(2, 3, 4) more instances of Confluent and move part 
of servers there until the same behaviour on new instance(-s).


On 3 May 2017 at 19:19:26, Jarrod Johnson 
(jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>) wrote:
Hmm, and there isn’t anything like conserver or another confluent trying to run 
at the same time to the same node?

From: banuchka [mailto:tyrche...@gmail.com<mailto:tyrche...@gmail.com>]
Sent: Wednesday, May 03, 2017 2:10 PM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

Hi,

one more strange thing about confluent:

May  3 12:57:28 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 12:57:26 console 
connected]
May  3 13:02:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:02:06 console 
disconnected]
May  3 13:10:32 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:10:30 console 
connected]
May  3 13:12:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:12:06 console 
disconnected]
May  3 13:21:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:21:06 console 
connected]
May  3 13:22:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:22:03 console 
disconnected]
May  3 13:26:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:26:00 console 
connected]
May  3 13:32:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:32:03 console 
disconnected]
May  3 13:33:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:33:15 console 
connected]
May  3 14:22:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:22:00 console 
disconnected]
May  3 14:23:11 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:23:09 console 
connected]
May  3 14:32:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:32:00 console 
disconnected]
May  3 14:39:44 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:39:42 console 
connected]
May  3 14:52:07 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:52:05 console 
disconnected]
May  3 14:52:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:52:15 console 
connected]
May  3 15:02:15 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:02:13 console 
disconnected]
May  3 15:06:40 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:06:38 console 
connected]
May  3 15:12:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:12:15 console 
disconnected]
May  3 15:15:30 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:15:28 console 
connected]
May  3 15:22:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:22:15 console 
disconnected]
May  3 15:30:28 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:30:26 console 
connected]
May  3 15:32:21 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:32:19 console 
disconnected]
May  3 15:36:42 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:36:40 console 
connected]
May  3 15:41:59 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:41:57 console 
disconnected]
May  3 15:45:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:45:15 console 
connected]
May  3 15:51:59 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:51:57 console 
disconnected]
May  3 15:57:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:57:03 console 
connected]
May  3 17:22:12 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:22:10 console 
disconnected]
May  3 17:26:38 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:26:36 console 
connected]
May  3 17:32:15 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:32:13 console 
disconnected]
May  3 17:41:26 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:41:24 console 
connected]
May  3 17:42:01 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:41:59 console 
disconnected]
May  3 17:49:32 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:49:30 console 
connected]
May  3 17:52:07 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:05 console 
disconnected]
May  3 17:52:42 xcat-sn1.mlan confluent[4102]: audit :May 03 17:52:40 
{"operation": "start", "allowed": true, "target": 
"/nodes/unreg25/console/session", "user": "xcat_console"}
May  3 17:52:42 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:40 
connection by xcat_console]
May  3 17:52:45 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:43 console 
disconnected]
May  3 17:56:09 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:56:07 console 
connected]

it isn’t Dell BMC…

I think i’ve wrote about that behaviour here before, anyway. Times here are so 
random doesn’t look like a timeout issue in some place.

Need an advice before rolling back :) Thanks


On 14 April 2017 at 20:59:04, Jarrod Johnson 
(jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>) wrote:
Yeah, there will be a bit push in the coming weeks it will have at least an 
‘events’ log along with a lot more function.

Then some more fleshed out documentation (beyond the preliminary stuff on 
hpc.lenovo.com<http://hpc.lenovo.com>).

Let me know if the firmware exploration works out.  That particular change line 
suggests firmware upgrades, but it is possible they could have some high BMC 
cpu usage that could manifest in such a way.  The ‘works with ipmitool’ though 
has me scratching my head.

From: banuchka [mailto:tyrche...@gmail.com]
Sent: Friday, April 14, 2017 2:54 PM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

Last idea doesn’t work for me. So by the way idea as is is working great – 
confluent does disconnect/connect after time in constant. But for now it is 
100% correct to say – it is a problem with IDRAC fw.
from release notes for last fw:
===
- Fix for occasional iDRAC unresponsiveness caused by upgrades via Firmware 
RACADM or
have an active SOL or SSH sessions while firmware upgrade is in progress.
===
I’m not sure, but maybe its something like i have here. So did the upgrade on 
few hosts and give them plenty of time to show me results.
Thanks for your answers, help and time… it is very interesting quest :)

Bit more about Confluent:
- Interesting ambitions
- Python VS Perl, thats good
- I think log files(not just trace, stderr, stdout) and documentation(source on 
Github is the best doc o know, but…) are things that i would like to be in 
Confluent


On 14 April 2017 at 19:27:20, Jarrod Johnson 
(jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>) wrote:
Very interested in the outcome.  And thank you for working through it.  Also 
interested what you have liked, would like, and have disliked about confluent.

From: banuchka [mailto:tyrche...@gmail.com]
Sent: Friday, April 14, 2017 12:01 PM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

Thank you Jarrod, i’ll try to add patch and let you know after. Hope 90 minutes 
is enough, yes.


On 14 April 2017 at 16:57:24, Jarrod Johnson 
(jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>) wrote:
Hmm, this is going to be very difficult to root cause (I only have Lenovo 
equipment as one might expect).

I’m loathe to do a workaround, but in console.py (find /usr –name console.py) , 
might be interesting to see how a change like the following:
diff --git a/pyghmi/ipmi/console.py b/pyghmi/ipmi/console.py
index 95e8551..a5f6062 100644
--- a/pyghmi/ipmi/console.py
+++ b/pyghmi/ipmi/console.py
@@ -42,6 +42,7 @@ class Console(object):
     def __init__(self, bmc, userid, password,
                  iohandler, port=623,
                  force=False, kg=None):
+        self.keepalivecount = 0
         self.keepaliveid = None
         self.connected = False
         self.broken = False
@@ -70,6 +71,7 @@ class Console(object):
         if 'error' in response:
             self._print_error(response['error'])
             return
+        self.keepalivecount = 0
         #Send activate sol payload directive
         #netfn= 6 (application)
         #command = 0x48 (activate payload)
@@ -150,11 +152,12 @@ class Console(object):
             return
         currowner = struct.unpack(
             "<I", struct.pack('4B', *response['data'][:4]))
-        if currowner[0] != self.ipmi_session.sessionid:
+        if currowner[0] != self.ipmi_session.sessionid or  self.keepalivecount 
> 180:
             # the session is deactivated or active for something else
             self.activated = False
             self._print_error('SOL deactivated')
             return
+        self.keepalivecount += 1
         # ok, still here, that means session is alive, but another
         # common issue is firmware messing with mux on reboot
         # this would be a nice thing to check, but the serial channel

If it would pan out, should cause the console session to disconnect itself 
roughly every 90 minutes and trigger reconnect (is 90 minutes short enough in 
your case?)  Would require a service confluent restart to see if it had the 
desired effect.

Sorry I haven’t tested and can’t think of root cause, but going to take some 
time off for the weekend.

I would be curious if the same ipmitool is running a day later than a check 
(e.g. if ipmitool is exiting and getting restarted).  I don’t have the time at 
the moment to see if they do some other interesting thing to avoid the behavior.

From: banuchka [mailto:tyrche...@gmail.com]
Sent: Friday, April 14, 2017 11:45 AM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

cloud53.ulan:/home/banuchka # ipmitool sol info 1
Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to 0x01
Set in progress                 : set-complete
Enabled                         : true
Force Encryption                : true
Force Authentication            : false
Privilege Level                 : ADMINISTRATOR
Character Accumulate Level (ms) : 50
Character Send Threshold        : 255
Retry Count
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to