On 14 April 2017 at 16:15:16, Jarrod Johnson (jjohns...@lenovo.com) wrote:

And to be clear, the corruption only starts after a long period of time of 
being continuously connected?

Yes, that is correct


 

I might be interested in seeing ipmitool sol info 1 output against a system 
while it is working versus showing corrupted info.

corrupted:

# ipmitool -I lanplus -H cloud2manage -U root -a sol info 1

Password:

Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to 0x01

Set in progress                 : set-complete

Enabled                         : true

Force Encryption                : true

Force Authentication            : false

Privilege Level                 : ADMINISTRATOR

Character Accumulate Level (ms) : 50

Character Send Threshold        : 255

Retry Count                     : 7

Retry Interval (ms)             : 480

Volatile Bit Rate (kbps)        : 38.4

Non-Volatile Bit Rate (kbps)    : 115.2

Payload Channel                 : 1 (0x01)

Payload Port                    : 623



Working:

# ipmitool -I lanplus -H cloud2manage -U root -a sol info 1

Password:

Info: SOL parameter 'Payload Channel (7)' not supported - defaulting to 0x01

Set in progress                 : set-complete

Enabled                         : true

Force Encryption                : true

Force Authentication            : false

Privilege Level                 : ADMINISTRATOR

Character Accumulate Level (ms) : 50

Character Send Threshold        : 255

Retry Count                     : 7

Retry Interval (ms)             : 480

Volatile Bit Rate (kbps)        : 38.4

Non-Volatile Bit Rate (kbps)    : 115.2

Payload Channel                 : 1 (0x01)

Payload Port                    : 623


 

From: banuchka [mailto:tyrche...@gmail.com] 
Sent: Friday, April 14, 2017 11:09 AM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

 

Yes, reopen causes it to work again,  without any garbage… so looks like normal 
console :)

Hit <enter> causes at first garbage output(�� Por�lo) and *normal console* 
before...

 

On 14 April 2017 at 16:02:09, Jarrod Johnson (jjohns...@lenovo.com) wrote:

So reopen causes it to work again, and before, it’s not *hung*, but erratic 
with garbage characters and occasional blips of sanity?

 

From: banuchka [mailto:tyrche...@gmail.com] 
Sent: Friday, April 14, 2017 11:00 AM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

 

Reopen console did the trick as well...

 

On 14 April 2017 at 15:54:03, Jarrod Johnson (jjohns...@lenovo.com) wrote:

‘ctrl-e, then c, then o’ to reconnect.

 

Was conserver ondemand or full logging?

 

From: banuchka [mailto:tyrche...@gmail.com] 
Sent: Friday, April 14, 2017 10:52 AM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: RE: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

 

Console starts showing garbage after <enter> inside rcons.

What do you mean when said “restarting console”?

Console continue its work after:

- <enter> inside rcons/confetty

- bmc reset (console disconnected/console connected)

 

You’re absolutely right with ipmitool and conserver with the same servers we 
were out of such troubles.

On 14 April 2017 at 15:47:14, Jarrod Johnson (jjohns...@lenovo.com) wrote:

So the console starts showing garbage?  Restarting the console causes the 
garbage to go away?

 

You said that ipmitool with a certain configuration did not trigger this?

 

From: banuchka [mailto:tyrche...@gmail.com] 
Sent: Friday, April 14, 2017 9:29 AM
To: xCAT Users Mailing list; Jarrod Johnson
Subject: Re: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

 

I’m out of ideas, let me show you all i see.

 

Inside rcons i see:

 

MONITORING_TEST dbb54 1492160401 <= last message i’ve sent from OS (more 
complex log below)

 

tcpdump(keepalive?):

 

13:23:42.342886 IP (tos 0x0, ttl 64, id 16448, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:23:42.345504 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 108)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80

 

…

 

13:24:09.422491 IP (tos 0x0, ttl 64, id 17060, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:24:09.425045 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 108)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80

 

Hit <enter> in rcons:

---

MONITORING_TEST dbb54 1492160401

 

��

  Por�

—

 

tcpdump:

13:24:35.727671 IP (tos 0x0, ttl 64, id 19582, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:24:35.731533 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 108)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80

13:24:47.390367 IP (tos 0x0, ttl 64, id 20347, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:24:47.392799 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 64

13:24:47.408312 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 108)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80

13:24:47.409797 IP (tos 0x0, ttl 64, id 20349, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:25:03.127774 IP (tos 0x0, ttl 64, id 21818, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:25:03.131561 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 108)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80

13:25:27.269696 IP (tos 0x0, ttl 64, id 26284, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:25:27.272204 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 108)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80

13:25:47.410313 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 64

13:25:47.413754 IP (tos 0x0, ttl 64, id 28210, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:25:48.709947 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 204)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 176

13:25:48.712033 IP (tos 0x0, ttl 64, id 28355, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:25:52.564080 IP (tos 0x0, ttl 64, id 29103, offset 0, flags [DF], proto UDP 
(17), length 92)

    10.10.114.30.36790 > 10.10.106.155.623: [udp sum ok] UDP, length 64

13:25:52.566810 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto UDP 
(17), length 108)

    10.10.106.155.623 > 10.10.114.30.36790: [udp sum ok] UDP, length 80

 

and Magic, rcons:

---

  Por�lo]0;console: dbb54 [13:25]

 

 

dbb54 login:

---

 

On 14 April 2017 at 12:42:03, Jarrod Johnson (jjohns...@lenovo.com) wrote:

If you ctrl-e, c, o, does it restore the console after the time?

 

Can you tell that it goes after exactly 24hours on the dot?

 

When console hung, does ‘ipmitool sol activate’ say ‘session already active’?

Yes, 

# ipmitool -I lanplus -H 10.10.106.155 -U root -a sol activate

Password:

Info: SOL payload already active on another session


 

Does /var/log/confluent/consoles/<nodename> have any interesting events crop up?

[04/13 15:17:21 console connected]

… many our own messages

^MMONITORING_TEST dbb54 1492160401 | <== This is the last message from OS/ # 
date -d@1492160401 (Fri Apr 14 09:00:01 UTC 2017)

^M

[04/14 09:05:13 console connected]

[04/14 09:11:59 console connected]

[04/14 09:13:38 console disconnected]

[04/14 09:14:54 console connected]

[04/14 10:15:13 connection by xcat_console]

[04/14 10:15:14 disconnection by xcat_console]

[04/14 13:14:30 connection by xcat_console]


 

Pyghmi will do keepalive as well, and if that’s the problem, it should be much 
shorter than 24 hours.  In fact, it should be checking if the SOL payload is 
active and owned by confluent specifically every couple of minutes.

yes, thats correct


 

From: banuchka [mailto:tyrche...@gmail.com] 
Sent: Friday, April 14, 2017 5:55 AM
To: xcat-user@lists.sourceforge.net
Subject: Re: [xcat-user] Confluent as console server. Consoles hangs ~after 24h.

 

My last reply was incorrect. Problems still here. Im trying to find something 
usefull inbetween confluent/pyghmi...

Confluent restart solves hangs/reopen all connections.

I think it isnt the best option to restart confluent 1 or 2 times in 24h.

-- 
banuchka

On 13 April 2017 at 17:03:19, banuchka (tyrche...@gmail.com) wrote:

It is Dell’s related problem, not 100% but…

Confluent from current master is doing things well :) 

Thanks for pretty nice tool “confluentdbutil".

 

On 13 April 2017 at 11:30:14, banuchka (tyrche...@gmail.com) wrote:

Looks like that problem was before… The fix was to use ipmitool with 
keepalive(one from xcat repos).

Here pyghmi is used maybe that the reason?

 

On 13 April 2017 at 08:22:28, banuchka (tyrche...@gmail.com) wrote:

Hi, 

 

Im trying to completely migrate from conserver to confluent, but catch strange 
behaviour.

Some of my consoles hangs ~after 24, so no any new messages in their logs or in 
rcons.

I send messages with timestamp from OS >/dev/console every 30-60min and take a 
look on them for monitoring purposes(consoles availability monitoring).

I can open rcons and hit enter, after few secs console is waking up(strange). I 
didnt see it happen with conserver or maybe im wrong...

Some details:

- as i can see the bigest part of consoles with hangs behaviour are Dell idrac. 
Doesnt matter which type of RacSerial or IPMISerial is in use.

- racreset hard/ipmitool bmc reset didnt do the things

- hit enter to console wake it up(for example with expect i can send \r\n\f, 
but it looks bad)

- i didnt try to clean confluent's conf and restart it. Not sure it may help.

- HP consoles works well, same ipmi

- few consoles with custom pluging works good as well

 

So maybe my question is not about confluent, but if some of you have some 
knowledge about same problems please share it! ;)

 

-- 
banuchka

-- 
banuchka

-- 
banuchka

------------------------------------------------------------------------------ 
Check out the vibrant tech community on one of the world's most 
engaging tech sites, Slashdot.org! 
http://sdm.link/slashdot_______________________________________________ 
xCAT-user mailing list 
xCAT-user@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/xcat-user 

 

 

 

-- 
banuchka

-- 
banuchka

-- 
banuchka

-- 
banuchka



-- 
banuchka
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to