- **status**: assigned --> review


---

** [tickets:#3322] log: log agent in main process is disabled after child 
process exits**

**Status:** review
**Milestone:** 5.22.11
**Created:** Wed Oct 05, 2022 09:36 AM UTC by Hieu Hong Hoang
**Last Updated:** Wed Oct 05, 2022 09:36 AM UTC
**Owner:** Hieu Hong Hoang
**Attachments:**

- [loga](https://sourceforge.net/p/opensaf/tickets/3322/attachment/loga) (28.8 
kB; application/octet-stream)
- 
[osaflogd](https://sourceforge.net/p/opensaf/tickets/3322/attachment/osaflogd) 
(527.4 kB; application/octet-stream)


While using log agent, if the new process was created by forking current 
process, both processes have a same mds address. If the child process exits, 
the destructor of log agent is invoked. It will unregister with mds so all 
other subscribing services will detect this log agent is down. However the main 
process is still using that mds address, all requests from main process become 
invalid because logd thinks this log agent has already been down.

Log analysis:

* Main process initializes the mds address:
~~~
<143>1 2022-10-05T10:41:05.537677+02:00 SC-2 logtest 763 loga [meta 
sequenceId="89"] 763:log/agent/lga_mds.cc:1287 >> lga_mds_init 
<143>1 2022-10-05T10:41:05.537782+02:00 SC-2 logtest 763 loga [meta 
sequenceId="90"] 763:log/agent/lga_mds.cc:1334 << lga_mds_init 
~~~
* Duplicate the main process by using 
https://man7.org/linux/man-pages/man2/fork.2.html . Then the duplicated process 
exits and the destructor of log agent is invoked:
~~~
<143>1 2022-10-05T10:41:05.541101+02:00 SC-2 logtest 763 loga [meta 
sequenceId="156"] 772:log/agent/lga_agent.cc:167 >> ~LogAgent 
<143>1 2022-10-05T10:41:05.54126+02:00 SC-2 logtest 763 loga [meta 
sequenceId="157"] 772:log/agent/lga_state.cc:160 >> stop_recovery2_thread 
<143>1 2022-10-05T10:41:05.541297+02:00 SC-2 logtest 763 loga [meta 
sequenceId="158"] 772:log/agent/lga_state.cc:166 TR stop_recovery2_thread 
RecoveryState::kNormal no thread to stop
<143>1 2022-10-05T10:41:05.541315+02:00 SC-2 logtest 763 loga [meta 
sequenceId="159"] 772:log/agent/lga_state.cc:183 << stop_recovery2_thread 
<143>1 2022-10-05T10:41:05.541322+02:00 SC-2 logtest 763 loga [meta 
sequenceId="160"] 772:log/agent/lga_util.cc:125 >> lga_shutdown 
<143>1 2022-10-05T10:41:05.541329+02:00 SC-2 logtest 763 loga [meta 
sequenceId="161"] 772:log/agent/lga_mds.cc:1351 >> lga_mds_deinit 
<143>1 2022-10-05T10:41:05.541573+02:00 SC-2 logtest 763 loga [meta 
sequenceId="162"] 772:log/agent/lga_mds.cc:1362 << lga_mds_deinit 
~~~
* Logd detects this log agent is down and delete all clients of this log agent:
~~~
<143>1 2022-10-05T10:41:05.541593+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3386"] 452:log/logd/lgs_mds.cc:1230 T8 MDS DOWN dest: 
2020f3aafb5a7, node ID: 2020f, svc_id: 21
<143>1 2022-10-05T10:41:05.541636+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3387"] 447:log/logd/lgs_evt.cc:415 >> proc_lga_updn_mds_msg 
<143>1 2022-10-05T10:41:05.541648+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3388"] 447:log/logd/lgs_evt.cc:436 TR proc_lga_updn_mds_msg: 
LGSV_LGS_EVT_LGA_DOWN mds_dest = 2020f3aafb5a7
<143>1 2022-10-05T10:41:05.541656+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3389"] 447:log/logd/lgs_evt.cc:338 >> 
lgs_client_delete_by_mds_dest: mds_dest 2020f3aafb5a7
<143>1 2022-10-05T10:41:05.541663+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3390"] 447:log/logd/lgs_evt.cc:191 >> lgs_client_delete: client_id 9
<143>1 2022-10-05T10:41:05.541678+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3391"] 447:log/logd/lgs_evt.cc:213 T4 client_id: 9, REMOVE stream 
id: 2
<143>1 2022-10-05T10:41:05.541686+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3392"] 447:log/logd/lgs_stream.cc:856 >> log_stream_close: 
safLgStrCfg=saLogSystem,safApp=safLogService
<143>1 2022-10-05T10:41:05.541713+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3393"] 447:log/logd/lgs_stream.cc:922 << log_stream_close: rc=0, 
numOpeners=7
<143>1 2022-10-05T10:41:05.541737+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3394"] 447:log/logd/lgs_evt.cc:239 << lgs_client_delete 
<143>1 2022-10-05T10:41:05.541744+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3395"] 447:log/logd/lgs_evt.cc:348 << lgs_client_delete_by_mds_dest 
~~~
* The main process send a writting request to logd:
~~~
<143>1 2022-10-05T10:41:07.541457+02:00 SC-2 logtest 763 loga [meta 
sequenceId="182"] 763:log/agent/lga_mds.cc:1439 >> lga_mds_msg_async_send 
<143>1 2022-10-05T10:41:07.541488+02:00 SC-2 logtest 763 loga [meta 
sequenceId="183"] 763:log/agent/lga_mds.cc:789 >> lga_mds_enc 
<143>1 2022-10-05T10:41:07.541516+02:00 SC-2 logtest 763 loga [meta 
sequenceId="184"] 763:log/agent/lga_mds.cc:820 T2 msgtype: 0
<143>1 2022-10-05T10:41:07.541524+02:00 SC-2 logtest 763 loga [meta 
sequenceId="185"] 763:log/agent/lga_mds.cc:834 T2 api_info.type: 4
<143>1 2022-10-05T10:41:07.541533+02:00 SC-2 logtest 763 loga [meta 
sequenceId="186"] 763:log/agent/lga_mds.cc:862 << lga_mds_enc: total_bytes = 108
<143>1 2022-10-05T10:41:07.541581+02:00 SC-2 logtest 763 loga [meta 
sequenceId="187"] 763:log/agent/lga_mds.cc:1462 << lga_mds_msg_async_send 
~~~
* Logd skips the request due to invalid client id:
~~~
<143>1 2022-10-05T10:41:07.541929+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3420"] 447:log/logd/lgs_evt.cc:1313 >> proc_write_log_async_msg: 
client_id 9, stream ID 2, node_name = SC-2
<143>1 2022-10-05T10:41:07.541941+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3421"] 447:log/logd/lgs_evt.cc:109 TR clm_node_id delete  failed : 9
<143>1 2022-10-05T10:41:07.541948+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3422"] 447:log/logd/lgs_evt.cc:114 TR client_id: 9 lookup failed
<143>1 2022-10-05T10:41:07.541954+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3423"] 447:log/logd/lgs_evt.cc:1323 TR Bad client ID: 9
<143>1 2022-10-05T10:41:07.541962+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3424"] 447:log/logd/lgs_util.cc:387 >> lgs_send_write_log_ack 
<141>1 2022-10-05T10:41:07.542052+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3425"] 447:log/logd/lgs_util.cc:406 NO Failed (2) to send of WRITE 
ack to: 2020f3aafb5a7
<143>1 2022-10-05T10:41:07.542073+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3426"] 447:log/logd/lgs_util.cc:408 << lgs_send_write_log_ack 
<143>1 2022-10-05T10:41:07.54208+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3427"] 447:log/logd/lgs_util.cc:416 >> lgs_free_write_log 
<143>1 2022-10-05T10:41:07.542087+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3428"] 447:log/logd/lgs_util.cc:438 << lgs_free_write_log 
<143>1 2022-10-05T10:41:07.542093+02:00 SC-1 osaflogd 447 osaflogd [meta 
sequenceId="3429"] 447:log/logd/lgs_evt.cc:0 << proc_write_log_async_msg
~~~


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to