Jarrod, what do you think about max/stable number of servers(with full ipmi
logging) is fine for one Confluent instance?

-- 
banuchka

On 3 May 2017 at 20:01:41, banuchka (tyrche...@gmail.com) wrote:

> Tomorrow I’ll try to an one(2, 3, 4) more instances of Confluent and move
> part of servers there until the same behaviour on new instance(-s).
>
> On 3 May 2017 at 19:19:26, Jarrod Johnson (jjohns...@lenovo.com) wrote:
>
> Hmm, and there isn’t anything like conserver or another confluent trying
> to run at the same time to the same node?
>
>
>
> *From:* banuchka [mailto:tyrche...@gmail.com]
> *Sent:* Wednesday, May 03, 2017 2:10 PM
> *To:* xCAT Users Mailing list; Jarrod Johnson
> *Subject:* RE: [xcat-user] Confluent as console server. Consoles hangs
> ~after 24h.
>
>
>
> Hi,
>
>
>
> one more strange thing about confluent:
>
>
>
> May  3 12:57:28 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 12:57:26
> console connected]
>
> May  3 13:02:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:02:06
> console disconnected]
>
> May  3 13:10:32 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:10:30
> console connected]
>
> May  3 13:12:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:12:06
> console disconnected]
>
> May  3 13:21:08 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:21:06
> console connected]
>
> May  3 13:22:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:22:03
> console disconnected]
>
> May  3 13:26:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:26:00
> console connected]
>
> May  3 13:32:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:32:03
> console disconnected]
>
> May  3 13:33:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 13:33:15
> console connected]
>
> May  3 14:22:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:22:00
> console disconnected]
>
> May  3 14:23:11 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:23:09
> console connected]
>
> May  3 14:32:02 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:32:00
> console disconnected]
>
> May  3 14:39:44 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:39:42
> console connected]
>
> May  3 14:52:07 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:52:05
> console disconnected]
>
> May  3 14:52:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 14:52:15
> console connected]
>
> May  3 15:02:15 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:02:13
> console disconnected]
>
> May  3 15:06:40 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:06:38
> console connected]
>
> May  3 15:12:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:12:15
> console disconnected]
>
> May  3 15:15:30 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:15:28
> console connected]
>
> May  3 15:22:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:22:15
> console disconnected]
>
> May  3 15:30:28 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:30:26
> console connected]
>
> May  3 15:32:21 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:32:19
> console disconnected]
>
> May  3 15:36:42 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:36:40
> console connected]
>
> May  3 15:41:59 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:41:57
> console disconnected]
>
> May  3 15:45:17 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:45:15
> console connected]
>
> May  3 15:51:59 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:51:57
> console disconnected]
>
> May  3 15:57:05 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 15:57:03
> console connected]
>
> May  3 17:22:12 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:22:10
> console disconnected]
>
> May  3 17:26:38 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:26:36
> console connected]
>
> May  3 17:32:15 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:32:13
> console disconnected]
>
> May  3 17:41:26 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:41:24
> console connected]
>
> May  3 17:42:01 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:41:59
> console disconnected]
>
> May  3 17:49:32 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:49:30
> console connected]
>
> May  3 17:52:07 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:05
> console disconnected]
>
> May  3 17:52:42 xcat-sn1.mlan confluent[4102]: audit :May 03 17:52:40
> {"operation": "start", "allowed": true, "target":
> "/nodes/unreg25/console/session", "user": "xcat_console"}
>
> May  3 17:52:42 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:40
> connection by xcat_console]
>
> May  3 17:52:45 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:52:43
> console disconnected]
>
> May  3 17:56:09 xcat-sn1.mlan confluent[4102]: unreg25 :[05/03 17:56:07
> console connected]
>
>
>
> it isn’t Dell BMC…
>
>
>
> I think i’ve wrote about that behaviour here before, anyway. Times here
> are so random doesn’t look like a timeout issue in some place.
>
>
>
> Need an advice before rolling back :) Thanks
>
>
>
> On 14 April 2017 at 20:59:04, Jarrod Johnson (jjohns...@lenovo.com) wrote:
>
> Yeah, there will be a bit push in the coming weeks it will have at least
> an ‘events’ log along with a lot more function.
>
>
>
> Then some more fleshed out documentation (beyond the preliminary stuff on
> hpc.lenovo.com).
>
>
>
> Let me know if the firmware exploration works out.  That particular change
> line suggests firmware upgrades, but it is possible they could have some
> high BMC cpu usage that could manifest in such a way.  The ‘works with
> ipmitool’ though has me scratching my head.
>
>
>
> *From:* banuchka [mailto:tyrche...@gmail.com <tyrche...@gmail.com>]
> *Sent:* Friday, April 14, 2017 2:54 PM
> *To:* xCAT Users Mailing list; Jarrod Johnson
> *Subject:* RE: [xcat-user] Confluent as console server. Consoles hangs
> ~after 24h.
>
>
>
> Last idea doesn’t work for me. So by the way idea as is is working great –
> confluent does disconnect/connect after time in constant. But for now it is
> 100% correct to say – it is a problem with IDRAC fw.
>
> from release notes for last fw:
>
> ===
>
> - Fix for occasional iDRAC unresponsiveness caused by upgrades via
> Firmware RACADM or
> have an active SOL or SSH sessions while firmware upgrade is in progress.
>
> ===
>
> I’m not sure, but maybe its something like i have here. So did the upgrade
> on few hosts and give them plenty of time to show me results.
>
> Thanks for your answers, help and time… it is very interesting quest :)
>
>
>
> Bit more about Confluent:
>
> - Interesting ambitions
>
> - Python VS Perl, thats good
>
> - I think log files(not just trace, stderr, stdout) and
> documentation(source on Github is the best doc o know, but…) are things
> that i would like to be in Confluent
>
>
>
> On 14 April 2017 at 19:27:20, Jarrod Johnson (jjohns...@lenovo.com) wrote:
>
> Very interested in the outcome.  And thank you for working through it.
> Also interested what you have liked, would like, and have disliked about
> confluent.
>
>
>
> *From:* banuchka [mailto:tyrche...@gmail.com <tyrche...@gmail.com>]
> *Sent:* Friday, April 14, 2017 12:01 PM
> *To:* xCAT Users Mailing list; Jarrod Johnson
> *Subject:* RE: [xcat-user] Confluent as console server. Consoles hangs
> ~after 24h.
>
>
>
> Thank you Jarrod, i’ll try to add patch and let you know after. Hope 90
> minutes is enough, yes.
>
>
>
> On 14 April 2017 at 16:57:24, Jarrod Johnson (jjohns...@lenovo.com) wrote:
>
> Hmm, this is going to be very difficult to root cause (I only have Lenovo
> equipment as one might expect).
>
>
>
> I’m loathe to do a workaround, but in console.py (find /usr –name
> console.py) , might be interesting to see how a change like the following:
>
> *diff --git a/pyghmi/ipmi/console.py b/pyghmi/ipmi/console.py*
>
> *index 95e8551..a5f6062 100644*
>
> *--- a/pyghmi/ipmi/console.py*
>
> *+++ b/pyghmi/ipmi/console.py*
>
> @@ -42,6 +42,7 @@ class Console(object):
>
>      def __init__(self, bmc, userid, password,
>
>                   iohandler, port=623,
>
>                   force=False, kg=None):
>
> +        self.keepalivecount = 0
>
>          self.keepaliveid = None
>
>          self.connected = False
>
>          self.broken = False
>
> @@ -70,6 +71,7 @@ class Console(object):
>
>          if 'error' in response:
>
>              self._print_error(response['error'])
>
>              return
>
> +        self.keepalivecount = 0
>
>          #Send activate sol payload directive
>
>          #netfn= 6 (application)
>
>          #command = 0x48 (activate payload)
>
> @@ -150,11 +152,12 @@ class Console(object):
>
>              return
>
>          currowner = struct.unpack(
>
>              "<I", struct.pack('4B', *response['data'][:4]))
>
> -        if currowner[0] != self.ipmi_session.sessionid:
>
> +        if currowner[0] != self.ipmi_session.sessionid or
> self.keepalivecount > 180:
>
>              # the session is deactivated or active for something else
>
>              self.activated = False
>
>              self._print_error('SOL deactivated')
>
>              return
>
> +        self.keepalivecount += 1
>
>          # ok, still here, that means session is alive, but another
>
>          # common issue is firmware messing with mux on reboot
>
>          # this would be a nice thing to check, but the serial channel
>
>
>
> If it would pan out, should cause the console session to disconnect itself
> roughly every 90 minutes and trigger reconnect (is 90 minutes short enough
> in your case?)  Would require a service confluent restart to see if it had
> the desired effect.
>
>
>
> Sorry I haven’t tested and can’t think of root cause, but going to take
> some time off for the weekend.
>
>
>
> I would be curious if the same ipmitool is running a day later than a
> check (e.g. if ipmitool is exiting and getting restarted).  I don’t have
> the time at the moment to see if they do some other interesting thing to
> avoid the behavior.
>
>
>
> *From:* banuchka [mailto:tyrche...@gmail.com <tyrche...@gmail.com>]
> *Sent:* Friday, April 14, 2017 11:45 AM
> *To:* xCAT Users Mailing list; Jarrod Johnson
> *Subject:* RE: [xcat-user] Confluent as console server. Consoles hangs
> ~after 24h.
>
>
>
> cloud53.ulan:/home/banuchka # ipmitool sol info 1
>
> Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to
> 0x01
>
> Set in progress                 : set-complete
>
> Enabled                         : true
>
> Force Encryption                : true
>
> Force Authentication            : false
>
> Privilege Level                 : ADMINISTRATOR
>
> Character Accumulate Level (ms) : 50
>
> Character Send Threshold        : 255
>
> Retry Count
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to