And the problem is not reproducible by the reporter, once the priority
is bumped.
On 06/07/17 15:59, Gary Lee wrote:
Hi Praveen / Hans
There are no errors in mds log. The messages are not lost, they are
just buffered between the MDS thread and main thread, for up to around
15 seconds!
I suspect the main thread has a large queue of events to process, but
it's processing them very slowly due to load or other circumstances on
this
system. As far as I can tell, this system is not running in a
container, but under KVM.
As you can see below, the node ups (rcv_evt 1) are 'not seen' by the
main thread between 17:20:27 until 17:20:46.
During this period, node ups are received every 1.1s from PLs in the
MDS thread.
The main thread is still processing other events. Perhaps there is
thread starvation but it's still working slowly ...
Given the time criticalness of the node ups with respect to the
cluster startup timer, it seems OK to make it higher priority than
other messages.
Jun 20 17:20:27.000252 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 1
Jun 20 17:20:27.000422 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.000880 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.001077 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.001510 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.001697 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.002119 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.002545 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.002998 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 4
Jun 20 17:20:27.007094 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.014762 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 5
Jun 20 17:20:27.016745 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.036527 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.036805 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.038630 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.038773 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.039313 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.039485 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.087184 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.087312 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.088138 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.088299 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:27.088785 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:27.088948 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
... << similar patterns snipped >> ...
Jun 20 17:20:43.774085 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 4
Jun 20 17:20:43.776366 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:44.846180 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:44.846437 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:44.847261 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:44.847451 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.258138 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:45.258613 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.259306 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 4
Jun 20 17:20:45.259979 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.260284 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:45.260428 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.260914 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:45.261047 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.580850 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:45.581504 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.582168 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 4
Jun 20 17:20:45.584366 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.585550 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:45.585754 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.586324 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:45.586531 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:45.587070 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 8
Jun 20 17:20:45.587742 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:46.048978 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 4
Jun 20 17:20:46.052609 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
Jun 20 17:20:46.054629 osafamfd
[10514:10514:src/amf/amfd/main.cc:0770] >> process_event: evt->rcv_evt 1
Jun 20 17:20:46.054834 osafamfd
[10514:10514:src/amf/amfd/main.cc:0818] << process_event
I will send you the full trace file privately.
Thanks
Gary
On 06/07/17 14:42, praveen malviya wrote:
Hi Gary,
When SC joins back after SC Absence state, there will not be many
messages in the queue to be processed even in large cluster also.
Also there is no error like ipc send failure.
Is AMFD stuck somewhere and it could not take any message from queue?
I think raising priority will not solve problem in such a situation.
Thanks
Praveen
On 23-Jun-17 7:18 AM, Gary Lee wrote:
---
src/amf/amfd/ndmsg.cc | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/src/amf/amfd/ndmsg.cc b/src/amf/amfd/ndmsg.cc
index 11bc8ac35..9bfab6fd8 100644
--- a/src/amf/amfd/ndmsg.cc
+++ b/src/amf/amfd/ndmsg.cc
@@ -371,7 +371,12 @@ uint32_t avd_n2d_msg_rcv(AVD_DND_MSG *rcv_msg,
NODE_ID node_id,
evt->info.avnd_msg = rcv_msg;
- if (m_NCS_IPC_SEND(&cb->avd_mbx, evt, NCS_IPC_PRIORITY_HIGH) !=
+ NCS_IPC_PRIORITY priority = NCS_IPC_PRIORITY_HIGH;
+ if (evt->rcv_evt == AVD_EVT_NODE_UP_MSG) {
+ priority = NCS_IPC_PRIORITY_VERY_HIGH;
+ }
+
+ if (m_NCS_IPC_SEND(&cb->avd_mbx, evt, priority) !=
NCSCC_RC_SUCCESS) {
LOG_ER("%s: ncs_ipc_send failed", __FUNCTION__);
avsv_dnd_msg_free(rcv_msg);
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel