Hey Ted, Set it on all the nodes in the cluster, both controllers as well as all payloads.
Thanks, Nivrutti On Sat, Dec 20, 2014 at 5:27 PM, Yao Cheng LIANG <[email protected]> wrote: > Dear Nivrutti, > > I have been using TCP. And my *net.ipv4.tcp_retries2*=15. Where should I > set this value, on controllers only or I need to set it also on payload? > > Thanks. > > Ted > > Sent from Windows Mail > > *From:* Nivrutti Kale <[email protected]> > *Sent:* Saturday, December 20, 2014 7:29 PM > > *To:* Nagendra Kumar <[email protected]> > *Cc:* Yao Cheng LIANG <[email protected]>, piyush jaiswal > <[email protected]>, [email protected] > > Hi Ted, > > What is the transport you are using? > If you are using TCP, you need to adjust following tcp_retries2 parameters > of the system. > By default *tcp_retries2=15. * > > Add *net.ipv4.tcp_retries2=3* (3 works for me. You can try with other > values) in /etc/sysctl.conf to persist the changes across reboots. > > Let me know if this helps. > > Thanks, > Nivrutti > > On Fri, Dec 19, 2014 at 10:57 AM, Nagendra Kumar <[email protected]> > wrote: > >> Hi Ted, >> >> I was kind of guessing that. Please share the snaps of >> syslog and saflog of nodes. >> >> >> >> Thanks >> >> -Nagu >> >> >> >> *From:* Yao Cheng LIANG [mailto:[email protected]] >> *Sent:* 19 December 2014 10:47 >> *To:* Nagendra Kumar; Nivrutti Kale >> *Cc:* piyush jaiswal; [email protected]; Yao Cheng >> LIANG >> >> *Subject:* RE: [users] multiple-node simultaneous failure handling >> >> >> >> Dear Nagu, >> >> >> >> Thanks. This is different from OpenSAF “lock” operation. It is kind of >> operation similar to “reboot”. >> >> >> >> Ted >> >> >> >> *From:* Nagendra Kumar [mailto:[email protected] >> <[email protected]>] >> *Sent:* Friday, December 19, 2014 1:21 PM >> *To:* Yao Cheng LIANG; Nivrutti Kale >> *Cc:* piyush jaiswal; [email protected] >> *Subject:* RE: [users] multiple-node simultaneous failure handling >> >> >> >> Hi Ted, >> >> Can you please clarify how did you lock or what do you mean >> by locking “Physical node 1”. In OpenSAF, you can lock a node like sc-1, >> sc-2, pl-3, etc one at a time. >> >> >> >> Thanks >> >> -Nagu >> >> >> >> *From:* Yao Cheng LIANG [mailto:[email protected] <[email protected]>] >> *Sent:* 18 December 2014 19:54 >> *To:* Nivrutti Kale; Nagendra Kumar >> *Cc:* piyush jaiswal; [email protected]; Yao Cheng >> LIANG >> *Subject:* Re: [users] multiple-node simultaneous failure handling >> >> >> >> Dear all, >> >> >> >> Today I did more tests in virtualized environment, by “lock” one of >> the “compute” node where one “active” controller and a “active” payload >> reside. The “lock” operation would “terminate” all the virtual machine >> running on that physical node. I have expected that the “active” role would >> switched to another VM running on another compute node, which I have >> configured “1+1” protection relatiosnhip. >> >> >> >> But when surprised me is that that “controller” vm switched very quickly, >> but the payload vm did not switched(the “standby" vm kept in “standby” >> although “active” VM has been terminated). I have captured the packet on >> now “active” controller, and noticed that it has not received packets >> from(211.7 -- former “active” payload, but has been terminated" for long), >> but keep sending arp packet asking “who has 211.7”. >> >> >> >> Please see attached file for packet I have captured. >> >> >> >> Note: physical node 1 >> physical node 2 >> >> before lock: sc-1(211.2 >> )-active sc-2(211.3) - standby >> >> pl-3(211.7) -active >> pl-4 (211.7) - standby >> >> >> >> after lock sc-1 terminated >> sc-2(211.3) became active >> >> pl-3 termianted >> pl-4(211.7) kept “standby” >> >> >> >> The packets were captured on 211.3 >> >> >> >> Thanks. >> >> >> >> Ted >> >> >> >> Sent from Windows Mail >> >> >> >> *From:* Nivrutti Kale <[email protected]> >> *Sent:* Wednesday, December 10, 2014 2:16 PM >> *To:* Nagendra Kumar <[email protected]> >> *Cc:* Yao Cheng LIANG <[email protected]>, piyush jaiswal >> <[email protected]>, [email protected] >> >> >> >> Hi Ted, >> >> >> >> I am using opensaf in "virtualized environment " and I don't see any >> issues till now with OpenSAF. >> >> >> >> Regarding the multiple fail-over, we tested the blade fail-over on which >> 6 VM's (1 Active controller and 5 payloads) were placed. OpenSAF works like >> a charm here. >> >> >> >> First controller is failed-over, then notification for other payload >> nodes is received by new Active controller, so multiple fail-over in a >> correct sequence works very well with OpenSAF. I am using opensaf 4.2.0 and >> TCP as a OpenSAF transport. >> >> >> >> Thanks, >> >> Nivrutti >> >> >> >> On Wed, Dec 10, 2014 at 11:23 AM, Nagendra Kumar <[email protected]> >> wrote: >> >> Hi Ted, >> >> >> In my case, all these VMs works as payload. >> Then you should have no problem. >> >> Have you tested how many these concurrent failures OpenSAF can >> support? I am using 4.4.0. >> OpenSAF can handle any number of concurrent failures. >> >> I haven't joined OP-NFV. >> If the " virtualized environment " is only requirement, then OpenSAF can >> run without any problems. But I guess, there may be more requirements than >> that. >> We are working on Cloud requirements for OpenSAF and there has been few >> tickets raised. >> >> Thanks >> -Nagu >> >> > -----Original Message----- >> > From: Yao Cheng LIANG [mailto:[email protected]] >> >> > Sent: 10 December 2014 08:54 >> > To: Nagendra Kumar; piyush jaiswal; [email protected] >> > Cc: Yao Cheng LIANG >> > Subject: RE: [users] multiple-node simultaneous failure handling >> > >> > Dear Nagu, >> > >> > Thanks for clarification. In my case, all these VMs works as payload. >> Have you >> > tested how many these concurrent failures OpenSAF can support? I am >> using >> > 4.4.0. >> > >> > By the way, I am working in OP-NFV for HA proposal? Have you joined the >> same >> > work-force, and is there any issue applying OpenSAF to these virtualized >> > environment? >> > >> > Thanks. >> > >> > Ted >> > >> > -----Original Message----- >> > From: Nagendra Kumar [mailto:[email protected]] >> > Sent: Tuesday, December 09, 2014 8:53 PM >> > To: Yao Cheng LIANG; piyush jaiswal; >> [email protected] >> > Subject: RE: [users] multiple-node simultaneous failure handling >> > >> > Hi Yao, >> > If one controller remains available at a separate node then the >> given >> > scenario will work fine. >> > >> > Going detailed: >> > 1. If Node 1 and Node 2 are controllers and Node 1 reboots, the >> scenario works >> > fine. >> > 2. If Node 1 and Node 2 are payloads (Of course, there is one >> controller in the >> > cluster at Node X), then the scenario works fine. >> > 3. If Node 1 is payload and Node 2 is controller and Node 1 reboots, >> then the >> > scenario works fine. >> > 4. If Node 1 is controller and Node 2 is payload and Node 1 reboots(and >> there is >> > one another controller in the cluster), then the scenario works fine. >> > 5. If Node 1 is controller and Node 2 is payload and Node 1 reboots(and >> there is >> > no other controller in the cluster), then the scenario will not work as >> OpenSAF >> > cluster requires one controller. >> > >> > Thanks >> > -Nagu >> > >> > > -----Original Message----- >> > > From: Yao Cheng LIANG [mailto:[email protected]] >> > > Sent: 09 December 2014 17:37 >> > > To: piyush jaiswal; [email protected] >> > > Subject: [users] multiple-node simultaneous failure handling >> > > >> > > Dear all, >> > > >> > > I am now applying OpenSAF to a cloud environment. I have two physical >> > > nodes, on each node, there are a few virtual machine. Please see >> diagram >> > below: >> > > >> > > vm name on physical node 1+1 protected by vm on >> physical node >> > > >> ------------------------------------------------------------------------- >> ---------------------- >> > > vm1 physical node 1 vm2 >> physical node 2 >> > > vm3 physical node 1 vm4 >> physical node 2 >> > > vm5 physical node 1 vm6 >> physical node 2 >> > > vm7 physical node 1 vm8 >> physical node 2 >> > > >> > > so app1 on vm1 in protecte by the same app on vm2, app3 on vm3 is >> > > protected by the same app on vm4, .. >> > > >> > > My question is when I reboot physical node 1, can opensaf handle the >> > > simultaneous failure of vm1/3/5/7, and failover to vm2/4/6/8. >> > > >> > > Thanks. >> > > >> > > Ted >> > > ---------------------------------------------------------------------- >> > > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT >> > > Server from Actuate! Instantly Supercharge Your Business Reports and >> > > Dashboards with Interactivity, Sharing, Native Excel Exports, App >> > > Integration & more Get technology previously reserved for >> > > billion-dollar corporations, FREE >> > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg >> . >> > > clk >> > > trk >> > > _______________________________________________ >> > > Opensaf-users mailing list >> > > [email protected] >> > > https://lists.sourceforge.net/lists/listinfo/opensaf-users >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Opensaf-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/opensaf-users >> >> >> > > ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
