Re: [users] multiple-node simultaneous failure handling

Nivrutti Kale Sat, 20 Dec 2014 09:37:47 -0800

Hey Ted,

Set it on all the nodes in the cluster, both controllers as well as all
payloads.


Thanks,
Nivrutti

On Sat, Dec 20, 2014 at 5:27 PM, Yao Cheng LIANG <[email protected]> wrote:

>  Dear Nivrutti,
>
>  I have been using TCP. And my *net.ipv4.tcp_retries2*=15. Where should I
> set this value, on controllers only or I need to set it also on payload?
>
>  Thanks.
>
>  Ted
>
>  Sent from Windows Mail
>
>   *From:* Nivrutti Kale <[email protected]>
> *Sent:* ‎Saturday‎, ‎December‎ ‎20‎, ‎2014 ‎7‎:‎29‎ ‎PM
>
> *To:* Nagendra Kumar <[email protected]>
> *Cc:* Yao Cheng LIANG <[email protected]>, piyush jaiswal
> <[email protected]>, [email protected]
>
>  Hi Ted,
>
>  What is the transport you are using?
> If you are using TCP, you need to adjust following tcp_retries2 parameters
> of the system.
> By default *tcp_retries2=15. *
>
>  Add *net.ipv4.tcp_retries2=3* (3 works for me. You can try with other
> values) in /etc/sysctl.conf to persist the changes across reboots.
>
>  Let me know if this helps.
>
>  Thanks,
> Nivrutti
>
> On Fri, Dec 19, 2014 at 10:57 AM, Nagendra Kumar <[email protected]>
> wrote:
>
>>  Hi Ted,
>>
>>                 I was kind of guessing that. Please share the snaps of
>> syslog and saflog of nodes.
>>
>>
>>
>> Thanks
>>
>> -Nagu
>>
>>
>>
>> *From:* Yao Cheng LIANG [mailto:[email protected]]
>> *Sent:* 19 December 2014 10:47
>> *To:* Nagendra Kumar; Nivrutti Kale
>> *Cc:* piyush jaiswal; [email protected]; Yao Cheng
>> LIANG
>>
>> *Subject:* RE: [users] multiple-node simultaneous failure handling
>>
>>
>>
>> Dear Nagu,
>>
>>
>>
>> Thanks. This is different from OpenSAF “lock” operation. It is kind of
>> operation similar to “reboot”.
>>
>>
>>
>> Ted
>>
>>
>>
>> *From:* Nagendra Kumar [mailto:[email protected]
>> <[email protected]>]
>> *Sent:* Friday, December 19, 2014 1:21 PM
>> *To:* Yao Cheng LIANG; Nivrutti Kale
>> *Cc:* piyush jaiswal; [email protected]
>> *Subject:* RE: [users] multiple-node simultaneous failure handling
>>
>>
>>
>> Hi Ted,
>>
>>               Can you please clarify how did you lock or what do you mean
>> by locking “Physical node 1”. In OpenSAF, you can lock a node like sc-1,
>> sc-2, pl-3, etc one at a time.
>>
>>
>>
>> Thanks
>>
>> -Nagu
>>
>>
>>
>> *From:* Yao Cheng LIANG [mailto:[email protected] <[email protected]>]
>> *Sent:* 18 December 2014 19:54
>> *To:* Nivrutti Kale; Nagendra Kumar
>> *Cc:* piyush jaiswal; [email protected]; Yao Cheng
>> LIANG
>> *Subject:* Re: [users] multiple-node simultaneous failure handling
>>
>>
>>
>> Dear all,
>>
>>
>>
>> Today I did more tests in virtualized environment, by “lock” one of
>> the “compute” node where one “active” controller and a  “active” payload
>> reside. The “lock” operation would “terminate” all the virtual machine
>> running on that physical node. I have expected that the “active” role would
>> switched to another VM running on another compute node, which I have
>> configured “1+1” protection relatiosnhip.
>>
>>
>>
>> But when surprised me is that that “controller” vm switched very quickly,
>> but the payload vm did not switched(the “standby" vm kept in “standby”
>> although “active” VM has been terminated). I have captured the packet on
>> now “active” controller, and noticed that it has not received packets
>> from(211.7 -- former “active” payload, but has been terminated" for long),
>> but keep sending arp packet asking “who has 211.7”.
>>
>>
>>
>> Please see attached file for packet I have captured.
>>
>>
>>
>> Note:                      physical node 1
>> physical node 2
>>
>> before lock:           sc-1(211.2
>> )-active                     sc-2(211.3) - standby
>>
>>                                 pl-3(211.7) -active
>> pl-4 (211.7) - standby
>>
>>
>>
>> after lock               sc-1 terminated
>> sc-2(211.3) became active
>>
>>                                 pl-3 termianted
>> pl-4(211.7) kept “standby”
>>
>>
>>
>> The packets were captured on 211.3
>>
>>
>>
>> Thanks.
>>
>>
>>
>> Ted
>>
>>
>>
>> Sent from Windows Mail
>>
>>
>>
>> *From:* Nivrutti Kale <[email protected]>
>> *Sent:* ‎Wednesday‎, ‎December‎ ‎10‎, ‎2014 ‎2‎:‎16‎ ‎PM
>> *To:* Nagendra Kumar <[email protected]>
>> *Cc:* Yao Cheng LIANG <[email protected]>, piyush jaiswal
>> <[email protected]>, [email protected]
>>
>>
>>
>> Hi Ted,
>>
>>
>>
>> I am using opensaf in  "virtualized environment " and I don't see any
>> issues till now with OpenSAF.
>>
>>
>>
>> Regarding the multiple fail-over, we tested the blade fail-over on which
>> 6 VM's (1 Active controller and 5 payloads) were placed. OpenSAF works like
>> a charm here.
>>
>>
>>
>> First controller is failed-over, then notification for other payload
>> nodes is received by new Active controller, so multiple fail-over in a
>> correct sequence works very well with OpenSAF. I am using opensaf 4.2.0 and
>> TCP as a OpenSAF transport.
>>
>>
>>
>> Thanks,
>>
>> Nivrutti
>>
>>
>>
>> On Wed, Dec 10, 2014 at 11:23 AM, Nagendra Kumar <[email protected]>
>> wrote:
>>
>> Hi Ted,
>>
>> >> In my case, all these VMs works as payload.
>> Then you should have no problem.
>> >> Have you tested how many these concurrent failures OpenSAF can
>> support? I am using 4.4.0.
>> OpenSAF can handle any number of concurrent failures.
>>
>> I haven't joined OP-NFV.
>> If the " virtualized environment " is only requirement, then OpenSAF can
>> run without any problems. But I guess, there may be more requirements than
>> that.
>> We are working on Cloud requirements for OpenSAF and there has been few
>> tickets raised.
>>
>> Thanks
>> -Nagu
>>
>> > -----Original Message-----
>> > From: Yao Cheng LIANG [mailto:[email protected]]
>>
>> > Sent: 10 December 2014 08:54
>> > To: Nagendra Kumar; piyush jaiswal; [email protected]
>> > Cc: Yao Cheng LIANG
>> > Subject: RE: [users] multiple-node simultaneous failure handling
>> >
>> > Dear Nagu,
>> >
>> > Thanks for clarification. In my case, all these VMs works as payload.
>> Have you
>> > tested how many these concurrent failures OpenSAF can support? I am
>> using
>> > 4.4.0.
>> >
>> > By the way, I am working in OP-NFV for HA proposal? Have you joined the
>> same
>> > work-force, and is there any issue applying OpenSAF to these virtualized
>> > environment?
>> >
>> > Thanks.
>> >
>> > Ted
>> >
>> > -----Original Message-----
>> > From: Nagendra Kumar [mailto:[email protected]]
>> > Sent: Tuesday, December 09, 2014 8:53 PM
>> > To: Yao Cheng LIANG; piyush jaiswal;
>> [email protected]
>> > Subject: RE: [users] multiple-node simultaneous failure handling
>> >
>> > Hi Yao,
>> >       If one controller remains available at a separate node then the
>> given
>> > scenario will work fine.
>> >
>> > Going detailed:
>> > 1. If Node 1 and Node 2 are controllers and Node 1 reboots, the
>> scenario works
>> > fine.
>> > 2. If Node 1 and Node 2 are payloads  (Of course, there is one
>> controller in the
>> > cluster at Node X), then the scenario works fine.
>> > 3. If Node 1 is payload and Node 2 is controller and Node 1 reboots,
>> then the
>> > scenario works fine.
>> > 4. If Node 1 is controller and Node 2 is payload and Node 1 reboots(and
>> there is
>> > one another controller in the cluster), then the scenario works fine.
>> > 5. If Node 1 is controller and Node 2 is payload and Node 1 reboots(and
>> there is
>> > no other controller in the cluster), then the scenario will not work as
>> OpenSAF
>> > cluster requires one controller.
>> >
>> > Thanks
>> > -Nagu
>> >
>> > > -----Original Message-----
>> > > From: Yao Cheng LIANG [mailto:[email protected]]
>> > > Sent: 09 December 2014 17:37
>> > > To: piyush jaiswal; [email protected]
>> > > Subject: [users] multiple-node simultaneous failure handling
>> > >
>> > > Dear all,
>> > >
>> > > I am now applying OpenSAF to a cloud environment. I have two physical
>> > > nodes, on each node, there are a few virtual machine. Please see
>> diagram
>> > below:
>> > >
>> > > vm name   on  physical node                1+1 protected by vm on
>> physical node
>> > >
>> -------------------------------------------------------------------------
>> ----------------------
>> > > vm1              physical node 1                 vm2
>>              physical node 2
>> > > vm3               physical node 1                vm4
>>              physical node 2
>> > > vm5               physical node 1                vm6
>>              physical node 2
>> > > vm7               physical node 1                vm8
>>              physical node 2
>> > >
>> > > so app1 on vm1 in protecte by the same app on vm2, app3 on vm3 is
>> > > protected by the same app on vm4, ..
>> > >
>> > > My question is when I reboot physical node 1, can opensaf handle the
>> > > simultaneous failure of vm1/3/5/7, and failover to vm2/4/6/8.
>> > >
>> > > Thanks.
>> > >
>> > > Ted
>> > > ----------------------------------------------------------------------
>> > > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT
>> > > Server from Actuate! Instantly Supercharge Your Business Reports and
>> > > Dashboards with Interactivity, Sharing, Native Excel Exports, App
>> > > Integration & more Get technology previously reserved for
>> > > billion-dollar corporations, FREE
>> > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg
>> .
>> > > clk
>> > > trk
>> > > _______________________________________________
>> > > Opensaf-users mailing list
>> > > [email protected]
>> > > https://lists.sourceforge.net/lists/listinfo/opensaf-users
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Opensaf-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>>
>>
>>
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] multiple-node simultaneous failure handling

Reply via email to