Anything in the /var/log/confluent neighborhood when that restart happens? It doesn't happen if given a noderange?
-----Original Message----- From: banuchka <tyrche...@gmail.com> To: xcat-user@lists.sourceforge.net <xcat-user@lists.sourceforge.net>, Jarrod Johnson <jjohns...@lenovo.com> Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h. Date: Wed, 19 Apr 2017 14:56:12 +0100 Bad news :) On 19 April 2017 at 14:55:45, Jarrod Johnson (jjohns...@lenovo.com) wrote: > Confluent shouldn’t shut down or even restart… > > From: banuchka [mailto:tyrche...@gmail.com] > Sent: Wednesday, April 19, 2017 9:51 AM > To: xcat-user@lists.sourceforge.net; Jarrod Johnson > Subject: RE: [xcat-user] Confluent as console server. Consoles hangs > ~after 24h. > > And one more thing about Confluent: > is it expected behaviour when i did “makeconfluent” / “makeconfluent > -l”(confluent service is running) to regenerate nodes/add new nodes > confluent is shutting down…? > So for now I did some wrapper for that procedure(makeconfluent -d for > unneeded nodes, makeconfluent nodelist for new nodes). > > On 19 April 2017 at 14:41:30, Jarrod Johnson (jjohns...@lenovo.com) > wrote: > Ok, also were those login/logouts always there, or only after that > ‘try to suicide every 90 minutes’ experiment? > > From: banuchka [mailto:tyrche...@gmail.com] > Sent: Wednesday, April 19, 2017 9:38 AM > To: xcat-user@lists.sourceforge.net; Jarrod Johnson > Subject: Re: [xcat-user] Confluent as console server. Consoles hangs > ~after 24h. > > Bit follow up: > experiment with nodehealth+echo > /dev/console + rcons didn’t hang > console… maybe it need more time. Ill save it running inside tmux > session for bit long time. > > On 19 April 2017 at 14:07:58, banuchka (tyrche...@gmail.com) wrote: > Thanks Jarrod, I already have few “plugins” for old Sun servers > without SOL so it isn’t a big problem to create another one. > I really appreciate your help. > As one more thing I’m trying to fix all BaudRates on servers, because > as i can see on DRAC there are minimum 3 places with that setting(Im > not sure this is a problem, but it’s not a good practice to read and > write on different speed). > I’ll try your advice as well and let you know. > > On 19 April 2017 at 13:59:59, Jarrod Johnson (jjohns...@lenovo.com) > wrote: > I appreciate all the patience and help, let me know if you had a > request about making a shell plugin. The interface is not exactly > fleshed out ('CONFLUENT_NODE' is the only variable that makes it). If > the approach helps, I can accelerate a syntax for a shell module to > request more variables from the configuration (e.g. > CONFLUENT_HARDWAREMANAGEMENT_MANAGER SECRET_HARDWARMANAGEMENTUSER, > etc). > > In case you have a question, here's one example: > # cat > /opt/confluent.backup/lib/python/confluent/plugins/console/xcatkvm.sh > > #!/bin/bash > exec /opt/xcat/share/xcat/cons/kvm $CONFLUENT_NODE > > > As an aside, would you be able to do one more experiment? Start > confluent up, verify console is working, then run nodehealth a few > times against the node and see if it triggers the bad state? > Especially if you have some cron job that involves some node* > commands, > imitate that. I was trying to think about things that would be > different between ipmitool and pyghmi, and the one thing that occurs > to > me is that in pyghmi we try to multiplex commands and serial over the > same session to limit session consumption. In ipmitool, it's just SOL > (apart from an occasional 'get device id' for keepalive), so I'm > wondering if some timing or large volume of ipmi commands on a > session > with active sol session could mess up their BMC SOL session. > > Unfortunately, I don't have the resources to help chase this since I > can't reproduce it on our equipment, so all I can do is guessing > based > on comparative analysis. > > -----Original Message----- > From: banuchka <tyrche...@gmail.com> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, Jarrod > J > ohnson <jjohns...@lenovo.com> > Subject: RE: [xcat-user] Confluent as console server. Consoles hangs > ~after 24h. > Date: Wed, 19 Apr 2017 11:32:58 +0100 > > Hi, > > I’m trying to use plugin for confluent with simple "ipmitool sol > activate” (placed here > /opt/confluent/lib/python/confluent/plugins/console/). It is last > attempt to understand whats going on here. > FW upgrade didn’t help me globally. > With current setup with pyghmi i see lots of “log on/log off” > messages > in BMC’s logs that doesn’t happen when im using ipmitool. > I’m out of ideas right now... > > On 14 April 2017 at 20:59:04, Jarrod Johnson (jjohns...@lenovo.com) > wrote: > > Yeah, there will be a bit push in the coming weeks it will have at > > least an ‘events’ log along with a lot more function. > > > > Then some more fleshed out documentation (beyond the preliminary > > stuff on hpc.lenovo.com). > > > > Let me know if the firmware exploration works out. That particular > > change line suggests firmware upgrades, but it is possible they > could > > have some high BMC cpu usage that could manifest in such a way. > The > > ‘works with ipmitool’ though has me scratching my head. > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 2:54 PM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: RE: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > Last idea doesn’t work for me. So by the way idea as is is working > > great – confluent does disconnect/connect after time in constant. > But > > for now it is 100% correct to say – it is a problem with IDRAC fw. > > from release notes for last fw: > > === > > - Fix for occasional iDRAC unresponsiveness caused by upgrades via > > Firmware RACADM or > > have an active SOL or SSH sessions while firmware upgrade is in > > progress. > > === > > I’m not sure, but maybe its something like i have here. So did the > > upgrade on few hosts and give them plenty of time to show me > results. > > Thanks for your answers, help and time… it is very interesting > quest > > :) > > > > Bit more about Confluent: > > - Interesting ambitions > > - Python VS Perl, thats good > > - I think log files(not just trace, stderr, stdout) and > > documentation(source on Github is the best doc o know, but…) are > > things that i would like to be in Confluent > > > > On 14 April 2017 at 19:27:20, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > Very interested in the outcome. And thank you for working through > > it. Also interested what you have liked, would like, and have > > disliked about confluent. > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 12:01 PM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: RE: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > Thank you Jarrod, i’ll try to add patch and let you know after. > Hope > > 90 minutes is enough, yes. > > > > On 14 April 2017 at 16:57:24, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > Hmm, this is going to be very difficult to root cause (I only have > > Lenovo equipment as one might expect). > > > > I’m loathe to do a workaround, but in console.py (find /usr –name > > console.py) , might be interesting to see how a change like the > > following: > > diff --git a/pyghmi/ipmi/console.py b/pyghmi/ipmi/console.py > > index 95e8551..a5f6062 100644 > > --- a/pyghmi/ipmi/console.py > > +++ b/pyghmi/ipmi/console.py > > @@ -42,6 +42,7 @@ class Console(object): > > def __init__(self, bmc, userid, password, > > iohandler, port=623, > > force=False, kg=None): > > + self.keepalivecount = 0 > > self.keepaliveid = None > > self.connected = False > > self.broken = False > > @@ -70,6 +71,7 @@ class Console(object): > > if 'error' in response: > > self._print_error(response['error']) > > return > > + self.keepalivecount = 0 > > #Send activate sol payload directive > > #netfn= 6 (application) > > #command = 0x48 (activate payload) > > @@ -150,11 +152,12 @@ class Console(object): > > return > > currowner = struct.unpack( > > "<I", struct.pack('4B', *response['data'][:4])) > > - if currowner[0] != self.ipmi_session.sessionid: > > + if currowner[0] != self.ipmi_session.sessionid or > > self.keepalivecount > 180: > > # the session is deactivated or active for something > > else > > self.activated = False > > self._print_error('SOL deactivated') > > return > > + self.keepalivecount += 1 > > # ok, still here, that means session is alive, but another > > # common issue is firmware messing with mux on reboot > > # this would be a nice thing to check, but the serial > > channel > > > > If it would pan out, should cause the console session to disconnect > > itself roughly every 90 minutes and trigger reconnect (is 90 > minutes > > short enough in your case?) Would require a service confluent > > restart to see if it had the desired effect. > > > > Sorry I haven’t tested and can’t think of root cause, but going to > > take some time off for the weekend. > > > > I would be curious if the same ipmitool is running a day later than > a > > check (e.g. if ipmitool is exiting and getting restarted). I don’t > > have the time at the moment to see if they do some other > interesting > > thing to avoid the behavior. > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 11:45 AM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: RE: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > cloud53.ulan:/home/banuchka # ipmitool sol info 1 > > Info: SOL parameter 'Payload Channel (7)' not supported - > defaulting > > to 0x01 > > Set in progress : set-complete > > Enabled : true > > Force Encryption : true > > Force Authentication : false > > Privilege Level : ADMINISTRATOR > > Character Accumulate Level (ms) : 50 > > Character Send Threshold : 255 > > Retry Count : 7 > > Retry Interval (ms) : 480 > > Volatile Bit Rate (kbps) : 38.4 > > Non-Volatile Bit Rate (kbps) : 115.2 > > Payload Channel : 1 (0x01) > > Payload Port : 623 > > cloud53.ulan:/home/banuchka # ipmitool sol set volatile-bit-rate > > 115.2 1 > > cloud53.ulan:/home/banuchka # ipmitool sol info 1 > > Info: SOL parameter 'Payload Channel (7)' not supported - > defaulting > > to 0x01 > > Set in progress : set-complete > > Enabled : true > > Force Encryption : true > > Force Authentication : false > > Privilege Level : ADMINISTRATOR > > Character Accumulate Level (ms) : 50 > > Character Send Threshold : 255 > > Retry Count : 7 > > Retry Interval (ms) : 480 > > Volatile Bit Rate (kbps) : 115.2 > > Non-Volatile Bit Rate (kbps) : 115.2 > > Payload Channel : 1 (0x01) > > Payload Port : 623 > > cloud53.ulan:/home/banuchka # echo 123 > /dev/console > > > > and nothing happened > > > > in the console’s log > > — > > [04/14 12:49:12 console disconnected][04/14 12:49:29 console > > connected][04/14 13:01:02 console disconnected][04/14 13:01:02 > > console connected][04/14 13:03:54 console disconnected][04/14 > > 13:04:15 console connected][04/14 13:38:37 console connected][04/14 > > 15:31:47 console disconnected][04/14 15:36:24 console > > connected][04/14 15:42:08 connection by xcat_console] > > --- > > > > On 14 April 2017 at 16:39:35, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > If you do have any in corrupted state, would be interested to see > > what happens if you do: > > ipmitool sol set volatile-bit-rate 115.2 1 > > > > > > To change the volatile bit rate to match the non-volatile bit rate > > and see if the corruption goes away. > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 11:36 AM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: RE: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > 115200 > > > > idracadm7 get iDRAC.IPMISerial > > [Key=iDRAC.Embedded.1#IPMISerial.1] > > BaudRate=115200 > > ChanPrivLimit=4 > > ConnectionMode=Terminal > > DeleteControl=Disabled > > EchoControl=Enabled > > FlowControl=RTS/CTS > > HandshakeControl=Enabled > > InputNewLineSeq=1 > > LineEdit=Enabled > > NewLineSeq=CR-LF > > > > that is strange, right > > > > On 14 April 2017 at 16:31:27, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > Hmm, what’s the baud rate the console is actually running at? Odd > to > > see the volatile and non volatile bit rates not be the same. > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 11:28 AM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: RE: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > > > > > On 14 April 2017 at 16:15:16, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > And to be clear, the corruption only starts after a long period of > > time of being continuously connected? > > Yes, that is correct > > > > > > I might be interested in seeing ipmitool sol info 1 output against > a > > system while it is working versus showing corrupted info. > > corrupted: > > # ipmitool -I lanplus -H cloud2manage -U root -a sol info 1 > > Password: > > Info: SOL parameter 'Payload Channel (7)' not supported - > defaulting > > to 0x01 > > Set in progress : set-complete > > Enabled : true > > Force Encryption : true > > Force Authentication : false > > Privilege Level : ADMINISTRATOR > > Character Accumulate Level (ms) : 50 > > Character Send Threshold : 255 > > Retry Count : 7 > > Retry Interval (ms) : 480 > > Volatile Bit Rate (kbps) : 38.4 > > Non-Volatile Bit Rate (kbps) : 115.2 > > Payload Channel : 1 (0x01) > > Payload Port : 623 > > > > Working: > > # ipmitool -I lanplus -H cloud2manage -U root -a sol info 1 > > Password: > > Info: SOL parameter 'Payload Channel (7)' not supported - > defaulting > > to 0x01 > > Set in progress : set-complete > > Enabled : true > > Force Encryption : true > > Force Authentication : false > > Privilege Level : ADMINISTRATOR > > Character Accumulate Level (ms) : 50 > > Character Send Threshold : 255 > > Retry Count : 7 > > Retry Interval (ms) : 480 > > Volatile Bit Rate (kbps) : 38.4 > > Non-Volatile Bit Rate (kbps) : 115.2 > > Payload Channel : 1 (0x01) > > Payload Port : 623 > > > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 11:09 AM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: RE: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > Yes, reopen causes it to work again, without any garbage… so looks > > like normal console :) > > Hit <enter> causes at first garbage output(�� Por�lo) and *normal > > console* before... > > > > On 14 April 2017 at 16:02:09, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > So reopen causes it to work again, and before, it’s not *hung*, but > > erratic with garbage characters and occasional blips of sanity? > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 11:00 AM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: RE: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > Reopen console did the trick as well... > > > > On 14 April 2017 at 15:54:03, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > ‘ctrl-e, then c, then o’ to reconnect. > > > > Was conserver ondemand or full logging? > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 10:52 AM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: RE: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > Console starts showing garbage after <enter> inside rcons. > > What do you mean when said “restarting console”? > > Console continue its work after: > > - <enter> inside rcons/confetty > > - bmc reset (console disconnected/console connected) > > > > You’re absolutely right with ipmitool and conserver with the same > > servers we were out of such troubles. > > On 14 April 2017 at 15:47:14, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > So the console starts showing garbage? Restarting the console > causes > > the garbage to go away? > > > > You said that ipmitool with a certain configuration did not trigger > > this? > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 9:29 AM > > To: xCAT Users Mailing list; Jarrod Johnson > > Subject: Re: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > I’m out of ideas, let me show you all i see. > > > > Inside rcons i see: > > > > MONITORING_TEST dbb54 1492160401 <= last message i’ve sent from OS > > (more complex log below) > > > > tcpdump(keepalive?): > > > > 13:23:42.342886 IP (tos 0x0, ttl 64, id 16448, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:23:42.345504 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 108) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 80 > > > > … > > > > 13:24:09.422491 IP (tos 0x0, ttl 64, id 17060, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:24:09.425045 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 108) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 80 > > > > Hit <enter> in rcons: > > --- > > MONITORING_TEST dbb54 1492160401 > > > > �� > > Por� > > — > > > > tcpdump: > > 13:24:35.727671 IP (tos 0x0, ttl 64, id 19582, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:24:35.731533 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 108) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 80 > > 13:24:47.390367 IP (tos 0x0, ttl 64, id 20347, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:24:47.392799 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 92) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 64 > > 13:24:47.408312 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 108) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 80 > > 13:24:47.409797 IP (tos 0x0, ttl 64, id 20349, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:25:03.127774 IP (tos 0x0, ttl 64, id 21818, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:25:03.131561 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 108) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 80 > > 13:25:27.269696 IP (tos 0x0, ttl 64, id 26284, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:25:27.272204 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 108) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 80 > > 13:25:47.410313 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 92) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 64 > > 13:25:47.413754 IP (tos 0x0, ttl 64, id 28210, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:25:48.709947 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 204) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 176 > > 13:25:48.712033 IP (tos 0x0, ttl 64, id 28355, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:25:52.564080 IP (tos 0x0, ttl 64, id 29103, offset 0, flags > [DF], > > proto UDP (17), length 92) > > 10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, > length > > 64 > > 13:25:52.566810 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], > > proto UDP (17), length 108) > > 10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, > length > > 80 > > > > and Magic, rcons: > > --- > > Por�lo]0;console: dbb54 [13:25] > > > > > > dbb54 login: > > --- > > > > On 14 April 2017 at 12:42:03, Jarrod Johnson (jjohns...@lenovo.com) > > wrote: > > If you ctrl-e, c, o, does it restore the console after the time? > > > > Can you tell that it goes after exactly 24hours on the dot? > > > > When console hung, does ‘ipmitool sol activate’ say ‘session > already > > active’? > > Yes, > > # ipmitool -I lanplus -H 10.10.106.155 -U root -a sol activate > > Password: > > Info: SOL payload already active on another session > > > > > > Does /var/log/confluent/consoles/<nodename> have any interesting > > events crop up? > > [04/13 15:17:21 console connected] > > … many our own messages > > ^MMONITORING_TEST dbb54 1492160401 | <== This is the last message > > from OS/ # date -d@1492160401 (Fri Apr 14 09:00:01 UTC 2017) > > ^M > > [04/14 09:05:13 console connected] > > [04/14 09:11:59 console connected] > > [04/14 09:13:38 console disconnected] > > [04/14 09:14:54 console connected] > > [04/14 10:15:13 connection by xcat_console] > > [04/14 10:15:14 disconnection by xcat_console] > > [04/14 13:14:30 connection by xcat_console] > > > > > > Pyghmi will do keepalive as well, and if that’s the problem, it > > should be much shorter than 24 hours. In fact, it should be > checking > > if the SOL payload is active and owned by confluent specifically > > every couple of minutes. > > yes, thats correct > > > > > > From: banuchka [mailto:tyrche...@gmail.com] > > Sent: Friday, April 14, 2017 5:55 AM > > To: xcat-user@lists.sourceforge.net > > Subject: Re: [xcat-user] Confluent as console server. Consoles > hangs > > ~after 24h. > > > > My last reply was incorrect. Problems still here. Im trying to find > > something usefull inbetween confluent/pyghmi... > > Confluent restart solves hangs/reopen all connections. > > I think it isnt the best option to restart confluent 1 or 2 times > in > > 24h. > > > > -- > > banuchka > > On 13 April 2017 at 17:03:19, banuchka (tyrche...@gmail.com) wrote: > > It is Dell’s related problem, not 100% but… > > Confluent from current master is doing things well :) > > Thanks for pretty nice tool “confluentdbutil". > > > > On 13 April 2017 at 11:30:14, banuchka (tyrche...@gmail.com) wrote: > > Looks like that problem was before… The fix was to use ipmitool > with > > keepalive(one from xcat repos). > > Here pyghmi is used maybe that the reason? > > > > On 13 April 2017 at 08:22:28, banuchka (tyrche...@gmail.com) wrote: > > Hi, > > > > Im trying to completely migrate from conserver to confluent, but > > catch strange behaviour. > > Some of my consoles hangs ~after 24, so no any new messages in > their > > logs or in rcons. > > I send messages with timestamp from OS >/dev/console every 30-60min > > and take a look on them for monitoring purposes(consoles > availability > > monitoring). > > I can open rcons and hit enter, after few secs console is waking > > up(strange). I didnt see it happen with conserver or maybe im > > wrong... > > Some details: > > - as i can see the bigest part of consoles with hangs behaviour are > > Dell idrac. Doesnt matter which type of RacSerial or IPMISerial is > in > > use. > > - racreset hard/ipmitool bmc reset didnt do the things > > - hit enter to console wake it up(for example with expect i can > send > > \r\n\f, but it looks bad) > > - i didnt try to clean confluent's conf and restart it. Not sure it > > may help. > > - HP consoles works well, same ipmi > > - few consoles with custom pluging works good as well > > > > So maybe my question is not about confluent, but if some of you > have > > some knowledge about same problems please share it! ;) > > > > -- > > banuchka > > -- > > banuchka > > -- > > banuchka > > ------------------------------------------------------------------- > > ----------- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot________ > __ > > _____________________________________ > > xCAT-user mailing list > > xCAT-user@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > > > > > -- > > banuchka > > -- > > banuchka > > -- > > banuchka > > -- > > banuchka > > > > > > -- > > banuchka > > -- > > banuchka > > -- > > banuchka > > -- > > banuchka > > -- > > banuchka > > -- > banuchka > -- > banuchka > -- > banuchka > -- > banuchka -- banuchka ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user