[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2017-08-10 Thread Alex Jones via Opensaf-tickets
- **status**: review --> fixed
- **Comment**:

commit 4ca20e2caf15e22754af01ddecd01c1ea7413ccf
Author: Alex Jones 
Date:   Thu Aug 10 12:45:24 2017 -0400




---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** fixed
**Milestone:** 5.17.10
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Aug 08, 2017 06:13 PM UTC
**Owner:** Alex Jones


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2017-08-08 Thread Alex Jones via Opensaf-tickets
- **status**: assigned --> review
- **assigned_to**: A V Mahesh (AVM) --> Alex Jones
- **Part**: - --> d
- **Blocker**:  --> False
- **Milestone**: future --> 5.17.10



---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** review
**Milestone:** 5.17.10
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Aug 08, 2017 04:18 PM UTC
**Owner:** Alex Jones


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2017-08-08 Thread Alex Jones via Opensaf-tickets
I can reproduce this assertion as outlined in ticket 2545.


---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** assigned
**Milestone:** future
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Thu Mar 16, 2017 04:23 AM UTC
**Owner:** A V Mahesh (AVM)


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2017-03-15 Thread A V Mahesh (AVM)
In normal conditions we are not able to reproduce the problem by doing 
`/etc/init.d/opensafd restart `
so can please provide following information , to reproduce the problem:

1) Can you please share or elaborate  what  "./opensaf nodestop"  "./opensaf 
nodestart"
   scripts do aprt of ` /etc/init.d/opensafd stop`  &   `/etc/init.d/opensafd 
restart 

2) is their any other NON Opensaf application using MDS/TCP  libariry  ?
   if so are they stoped cleanly before ` /etc/init.d/opensafd stop`  





---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Mon Sep 26, 2016 02:26 PM UTC
**Owner:** A V Mahesh (AVM)


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2016-09-13 Thread A V Mahesh (AVM)
- **status**: unassigned --> assigned
- **assigned_to**: A V Mahesh (AVM)



---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** assigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 01:01 PM UTC
**Owner:** A V Mahesh (AVM)


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2016-09-13 Thread Anders Widell
- Description has changed:

Diff:



--- old
+++ new
@@ -39,3 +39,6 @@
 Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60
 
 ~~~
+
+Update: it seems I forgot to do "./opensaf nodestop" between the two 
"./opensaf nodestart" above. Thus, there are probably two SC-2 nodes at the 
same time, and the error message "Node already exit in the cluster with smiler 
configuration" should be interpreted as "duplicate node detected in the 
network". Reducing the priority of this defect to "minor". Still two problems 
ought to be fixed: the error message should be changed so that it is clear what 
it means, and osafdtmd should not assert (it could call opensaf_reboot() if a 
there is a configuration problem, but asserting idicates a software problem).
+



- **Priority**: major --> minor



---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 12:30 PM UTC
**Owner:** nobody


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting the 
node
Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60

~~~

Update: it seems I forgot to do "./opensaf nodestop" between the two "./opensaf 
nodestart" above. Thus, there are probably two SC-2 nodes at the same time, and 
the error message "Node already exit in the cluster with smiler configuration" 
should be interpreted as "duplicate node detected in the network". Reducing the 
priority of this defect to "minor". Still two problems ought to be fixed: the 
error message should be changed so that it is clear what it means, and osafdtmd 
should not assert (it could call opensaf_reboot() if a there is a configuration 
problem, but asserting idicates a software problem).




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing 

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2016-09-13 Thread Anders Widell
- Description has changed:

Diff:



--- old
+++ new
@@ -1,9 +1,9 @@
 osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:
 
 ~~~
-var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  
Node already exit in the cluster with smiler configuration , correct the other 
joining Node configuration 
-var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
-var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
+Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
+Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
+Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
 ~~~
 
 Here are the steps to reproduce this problem in UML:
@@ -16,3 +16,26 @@
 ./opensaf nodestart 2
 
 The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.
+
+It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:
+
+~~~
+Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
+Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
+Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafclmna[468]: MDTM:SOCKET recd_bytes :0, 
conn lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafclmd[458]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafntfd[448]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osaflogd[437]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafimmnd[426]: MDTM:SOCKET recd_bytes :0, 
conn lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafimmd[415]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osaffmd[405]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.err osafrded[392]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
+Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'SC-1'
+Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-4'
+Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-5'
+Sep 13 14:25:58 SC-2 local0.notice osafdtmd[378]: NO Established contact with 
'PL-3'
+Sep 13 14:25:59 SC-2 user.notice osafdtmd: osafdtmd Process down, Rebooting 
the node
+Sep 13 14:25:59 SC-2 user.notice opensaf_reboot: Rebooting local node; 
timeout=60
+
+~~~






---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 12:17 PM UTC
**Owner:** nobody


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: dtm_node_add failed 
.node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.

It seems that osafdtmd asserts and dies when this happens. Here is the result 
from a second run of the above test:

~~~
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: ER DTM:  Node already exit in 
the cluster with smiler configuration , correct the other joining Node 
configuration 
Sep 13 14:25:58 SC-2 local0.err osafdtmd[378]: dtm_node.c:109: 
dtm_process_node_info: Assertion '0' failed.
Sep 13 14:25:58 SC-2 local0.err osafamfd[478]: MDTM:SOCKET recd_bytes :0, conn 
lost with dh server, exiting library err :Success
Sep 13 14:25:58 SC-2 local0.err 

[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2016-09-13 Thread Anders Widell
Needless to say, the error message itself is also faulty here. I suppose "exit" 
should be "exists", and "smiler" should be "similar"? I am just guessing... :-)


---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 12:10 PM UTC
**Owner:** nobody


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  
Node already exit in the cluster with smiler configuration , correct the other 
joining Node configuration 
var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2030 dtm: "Node already exit in the cluster with smiler configuration"

2016-09-13 Thread Anders Widell



---

** [tickets:#2030] dtm: "Node already exit in the cluster with smiler 
configuration"**

**Status:** unassigned
**Milestone:** 4.7.2
**Created:** Tue Sep 13, 2016 12:10 PM UTC by Anders Widell
**Last Updated:** Tue Sep 13, 2016 12:10 PM UTC
**Owner:** nobody


osafdtm does not handle rapid consecutive node reboots properly. I got the 
following errors in syslog:

~~~
var/SC-2/log/messages:Sep 13 14:00:52 SC-2 local0.err osafdtmd[378]: ER DTM:  
Node already exit in the cluster with smiler configuration , correct the other 
joining Node configuration 
var/SC-2/log/messages:Sep 13 14:01:02 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
var/SC-2/log/messages:Sep 13 14:01:06 SC-2 local0.err osafdtmd[378]: ER DTM: 
dtm_node_add failed .node_ip: 192.168.0.1, node_id: 0
~~~

Here are the steps to reproduce this problem in UML:

./opensaf start
(wait until the cluster comes up)
./opensaf nodestop 2
(wait a few seconds)
./opensaf nodestart 2
./opensaf nodestart 2

The last two commands should be execute quickly after each other, maybe with 
one second delay in between them.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets