I've been able to trace the problem to saCkptCheckpointRead. If I do block reads of 1000, I see this failure. I have to bring the block reads all the way down to 4 in order for it to succeed.
What I'm doing on the standby blade is opening each of the 5 active checkpoints, reading them, and at the same time registering the hot-standby callback to get real-time changes. With the MDS patch provided, saCkptCheckpointRead is timing out, and the standby is not able to read all the data, and therefore doesn't get a complete picture. If I use my own fix, changing the checkpoint service #define from 30M to 3M, and back out this MDS patch, everything works fine. If you guys really believe that this problem should be fixed in MDS, I am happy to test more patches. But, this one line change in the ckpt service makes everything work. Alex On 01/10/2014 01:15 PM, Alex Jones wrote: > Hi Guys, > > After doing some more testing I'm still seeing some problems. > > The patch worked fine for a 2N model, but our real requirements > are a little different. > > Here's the setup. 5+1 redundancy. 6 active blades and 1 standby > blade protecting all the other blades. I am creating a check point on > each active blade, and the standby is opening all 5 checkpoints to do > the backup. > > 40k sections on each checkpoint, and 1k of data in each section. > > Every so often I am still seeing MDS problems, but they are > different now with the patch. > > Jan 10 16:20:35.776939 <3680821261> ERR |MDS_SND_RCV: Timeout or > Error occured > Jan 10 16:20:35.777031 <3680821261> ERR |MDS_SND_RCV: Timeout > occured on sndrsp message > Jan 10 16:20:35.777062 <3680821261> ERR |MDS_SND_RCV: > Adest=<0x0002040f,3798024214> > > Jan 10 16:20:50.098279 <3680821261> ERR > |LEN-MISMATCH:recvd_on_sock=8034, size_in_mds_hdr=65034, > TIPC-ID=0x010010056ab7600b, ADEST=<0002050f,1790402571> > Jan 10 16:20:50.098326 <3680821261> ERR |DUMP:Changing > dump-extent:buff=0x998fa300:max=100, len=8034 > Jan 10 16:20:50.098348 <3680821261> ERR |DUMP:buff=0x998fa300:offset= > 0 to 7:Bytes = 0xfe 0x0a 0x00 0x00 : 0x0f 0x1e 0x80 0x01 > > Alex > > On 01/09/2014 04:43 AM, A V Mahesh wrote: >> Hi Alex, >> >> Use the below patch as workaround for you to proceed your testing . >> This patch just increases the MDS internal fragmentation value to >> ~ TIPC_MAX_USER_MSG_SIZE define in tipc.h >> >> I will work with Hans to have final patch by considering the both >> TIPC & TCP transports, >> and testing involved as a part of ticket `#654 MDS improvements` >> (https://sourceforge.net/p/opensaf/tickets/654/ ). >> >> I tested this patch with 10K sections checkpoint memory used was : >> 10136000 on TIPC transport. >> >> ================================================================================== >> >> >> diff --git a/osaf/libs/core/mds/include/mds_dt.h >> b/osaf/libs/core/mds/include/mds_dt.h >> --- a/osaf/libs/core/mds/include/mds_dt.h >> +++ b/osaf/libs/core/mds/include/mds_dt.h >> @@ -32,6 +32,7 @@ >> #include "ncs_main_papi.h" >> #include "ncssysf_mem.h" >> #include "ncspatricia.h" >> +#include <linux/tipc.h> >> >> >> /* This file is private to the MDTM layer. */ >> @@ -109,7 +110,7 @@ typedef struct mdtm_reassembly_queue { >> >> #define MDTM_MAX_DIRECT_BUFF_SIZE MDTM_MAX_SEGMENT_SIZE >> >> -#define MDTM_NORMAL_MSG_FRAG_SIZE 1400 >> +#define MDTM_NORMAL_MSG_FRAG_SIZE (TIPC_MAX_USER_MSG_SIZE-1000) /* >> TIPC_MAX_USER_MSG_SIZE = 66000 define <linux/tipc.h> */ >> >> #define MDTM_RECV_BUFFER_SIZE >> ((MDS_DIRECT_BUF_MAXSIZE>MDTM_NORMAL_MSG_FRAG_SIZE)? \ >> (MDS_DIRECT_BUF_MAXSIZE+SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN):(MDTM_NORMAL_MSG_FRAG_SIZE+SUM_MDS_HDR_PLUS_MDTM_HDR_PLUS_LEN)) >> >> >> ================================================================================== >> >> >> >> -AVM >> >> >> On 1/8/2014 10:42 PM, Alex Jones wrote: >>> Hi Hans, >>> >>> Changing rmem_default and rmem_max has no effect on the >>> problem. I even tried up to 2M to no avail. >>> >>> However, after looking at the cpnd_transfer_replica function in >>> cpnd_evt.c, I found the following in cpsv_evt.h which controls how >>> large the packets are which are sent through MDS: >>> >>> #define MAX_SYNC_TRANSFER_SIZE (30 * 1024 * 1024) >>> >>> 30M? What is the rationale for this number? This seems way too >>> high. When I change it to (4*1024*1024) (4M) it solves my problem, >>> and doesn't appear to affect performance. >>> >>> Alex >>> >>> On 01/08/2014 08:30 AM, Hans Feldt wrote: >>>> sysctl -a | grep rmem >>>> >>>> set rmem_default to 256K or so >>>> >>>> /Hans >>>> >>>>> -----Original Message----- >>>>> From: Hans Feldt [mailto:hans.fe...@ericsson.com] >>>>> Sent: den 8 januari 2014 14:01 >>>>> To: A V Mahesh; Alex Jones >>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>> Subject: Re: [devel] checkpoint problems >>>>> >>>>> The socket receive buffer size used is the system default. It can >>>>> be too small, pump it up. >>>>> I plan todo some change in MDS for this (and other stuff). >>>>> /Hans >>>>> >>>>>> -----Original Message----- >>>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>>>> Sent: den 8 januari 2014 11:29 >>>>>> To: Alex Jones >>>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>>> Subject: Re: [devel] checkpoint problems >>>>>> >>>>>> Hi Alex, >>>>>> >>>>>> I suggest you increase and try the following TIPC values ( tipc >>>>>> code ) >>>>>> and rebuild `tipc.ko`: >>>>>> >>>>>> net/tipc/tipc_socket.c:#define OVERLOAD_LIMIT_BASE 5000 >>>>>> >>>>>> You can increase it to 50000 and try again. >>>>>> >>>>>> - AVM. >>>>>> >>>>>> On 1/8/2014 4:16 AM, Alex Jones wrote: >>>>>>> After doing some deep debugging I am seeing the following in the >>>>>>> MDS >>>>>>> log on node B. This is when the CPND_EVT_ND2ND_CKPT_ACTIVE_SYNC is >>>>>>> sent from the active replica on node A to the replica on node >>>>>>> B. The >>>>>>> sync message never gets up to the CPND layer on node B because >>>>>>> it is >>>>>>> dropped. >>>>>>> >>>>>>> This is with 10k sections, each section 1k. >>>>>>> >>>>>>> Jan 7 21:32:32.772347 <1789648919> ERR |MDTM: Frag recd is not >>>>>>> next frag so dropping adest=<0x010010023922604c> >>>>>>> Jan 7 21:32:32.772399 <1789648919> ERR |MDTM: Message is dropped >>>>>>> as msg is out of seq TRANSPOR-ID=<0x010010023922604c> >>>>>>> >>>>>>> I've turned on MDS debug on node B, and the packet being sent >>>>>>> over is >>>>>>> gigantic. It starts failing at fragment number 2703. The next >>>>>>> fragment that comes in is 2707, then 2722. The last fragment that >>>>>>> comes in is 7444. >>>>>>> >>>>>>> I've done a cursory look at the hardware stats, and nothing is >>>>>>> being >>>>>>> rate-limited or dropped. >>>>>>> >>>>>>> I'm going to take a deeper look at this, but I'm mentioning it >>>>>>> in case >>>>>>> it rings any bells. I am using TIPC as the transport. >>>>>>> >>>>>>> Alex >>>>>>> >>>>>>> On 01/07/2014 07:24 AM, Alex Jones wrote: >>>>>>>> AVM, >>>>>>>> >>>>>>>> I get SA_AIS_ERR_TIMEOUT even when I pass SA_TIME_END as the >>>>>>>> timeout value. Is this not a bug? the synchronous CheckpointOpen >>>>>>>> call doesn't work at all in this scenario. It never succeeds. >>>>>>>> >>>>>>>> I can reproduce the problem with >>>>>>>> sectionCreationAttributes.expirationTime set to SA_TIME_ONE_DAY. >>>>>>>> >>>>>>>> You should be able to reproduce the problem with the code >>>>>>>> I sent >>>>>>>> in the last e-mail. >>>>>>>> >>>>>>>> Alex >>>>>>>> >>>>>>>> On 01/06/2014 10:31 PM, A V Mahesh wrote: >>>>>>>>> Hi Alex, >>>>>>>>> >>>>>>>>> CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT NOT a bug >>>>>>>>> , it >>>>>>>>> is expected if you pass less time out value `timeout = >>>>>>>>> 1000000000` >>>>>>>>> to saCkptCheckpointOpen(....,timeout ...) call ,when ckpt has >>>>>>>>> very >>>>>>>>> large data/section. just increasing timeout will avoids the >>>>>>>>> SA_AIS_ERR_TIMEOUT. >>>>>>>>> >>>>>>>>> Let us focus on your original issue/scenario, are you able to >>>>>>>>> reproduce the problem with >>>>>>>>> sectionCreationAttributes.expirationTime >>>>>>>>> with SA_TIME_ONE_DAY ? >>>>>>>>> >>>>>>>>> -AVM >>>>>>>>> >>>>>>>>> On 1/7/2014 1:17 AM, Alex Jones wrote: >>>>>>>>>> AVM, >>>>>>>>>> >>>>>>>>>> I've been playing around with your test program, and have >>>>>>>>>> gotten it to fail. >>>>>>>>>> >>>>>>>>>> I made the following changes: >>>>>>>>>> >>>>>>>>>> 1. Change init_dataX to be 1024k bytes, so that you are >>>>>>>>>> initializing the section to be 1024k. >>>>>>>>>> 2. Also, don't start the program on node B until A has >>>>>>>>>> finished >>>>>>>>>> writing/creating all the sections. >>>>>>>>>> 3. Before hitting the enter key on node B, wait for the >>>>>>>>>> OpenAsync >>>>>>>>>> call to finish. >>>>>>>>>> >>>>>>>>>> You might notice the CheckpointOpen call failing now with >>>>>>>>>> SA_AIS_ERR_TIMEOUT. I had to turn this into OpenAsync, and >>>>>>>>>> add a >>>>>>>>>> thread to process CkptDispatch messages. This uncovers >>>>>>>>>> another bug >>>>>>>>>> in OpenAsync. I've attached the mods to your program here. >>>>>>>>>> >>>>>>>>>> The OpenAsync callback will be called twice, both times with >>>>>>>>>> error == SA_AIS_ERR_TIMEOUT. If I call OpenAsync again when >>>>>>>>>> I get >>>>>>>>>> this error, the next callback returns success, but the callback >>>>>>>>>> gets called twice with success and with two different checkpoint >>>>>>>>>> handles! >>>>>>>>>> >>>>>>>>>> Alex >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 01/06/2014 06:18 AM, A V Mahesh wrote: >>>>>>>>>>> Hi Alex, >>>>>>>>>>> >>>>>>>>>>> I have created 10K sections ( please find the attached test >>>>>>>>>>> application `Alex_test_node_A_app.c` & >>>>>>>>>>> `Alex_test_node_B_app.c ` ) >>>>>>>>>>> with your specified scenario & configuration and I haven't >>>>>>>>>>> observed any >>>>>>>>>>> issue with sections on another node. >>>>>>>>>>> >>>>>>>>>>> Try to reproduce the problem on your setup & let me know the >>>>>>>>>>> result . >>>>>>>>>>> >>>>>>>>>>> One more importent point how much did you configured >>>>>>>>>>> `sectionCreationAttributes.expirationTime ` ? >>>>>>>>>>> I configured SA_TIME_ONE_DAY. >>>>>>>>>>> >>>>>>>>>>> Steps to rung the application : >>>>>>>>>>> >>>>>>>>>>> >>>>> ====================================================================================================== >>>>> >>>>> >>>>>> ============= >>>>>>>>>>> Compile : >>>>>>>>>>> >>>>>>>>>>> NODE-A# gcc Alex_test_node_A_app.c -o checkpoint_A -lSaCkpt >>>>>>>>>>> NODE-A# gcc Alex_test_node_B_app.c -o checkpoint_B -lSaCkpt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Run : >>>>>>>>>>> >>>>>>>>>>> 1) saCkptCheckpointOpen On node A >>>>>>>>>>> >>>>>>>>>>> NODE-A# ./checkpoint_A >>>>>>>>>>> >>>>>>>>>>> CPSV:CPA:ONsaCkptSectionCreate Waiting to Create Sections >>>>>>>>>>> safCkpt=test_checkpoint_name1,safApp=safCkptService.... >>>>>>>>>>> saCkptSectionCreate Press <Enter> key to continue... >>>>>>>>>>> >>>>>>>>>>> . >>>>>>>>>>> 2) saCkptCheckpointOpen() same ckpt On node B >>>>>>>>>>> >>>>>>>>>>> NODE-B# ./checkpoint_B >>>>>>>>>>> >>>>>>>>>>> CPSV:CPA:ONsaCkptSectionIterationInitialize Waiting to read >>>>>>>>>>> Sections >>>>>>>>>>> safCkpt=test_checkpoint_name1,safApp=safCkptService.... >>>>>>>>>>> saCkptActiveReplicaSet saCkptSectionIterationInitialize >>>>>>>>>>> Press <Enter> >>>>>>>>>>> key to continue... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 3) saCkptSectionCreate() On node A and read >>>>>>>>>>> saCkptCheckpointStatusGet() >>>>>>>>>>> >>>>>>>>>>> NODE-A# >>>>>>>>>>> checkpointStatus.numberOfSections : 10000 >>>>>>>>>>> checkpointStatus.memoryUsed :756000 >>>>>>>>>>> checkpointCreationAttributes.creationFlags;10 >>>>>>>>>>> checkpointCreationAttributes.checkpointSize;10240000 >>>>>>>>>>> checkpointCreationAttributes.retentionDuration;60000000000 >>>>>>>>>>> checkpointCreationAttributes.maxSections;10000 >>>>>>>>>>> checkpointCreationAttributes.maxSectionSize;1024 >>>>>>>>>>> checkpointCreationAttributes.maxSectionIdSize;64 >>>>>>>>>>> ================================ >>>>>>>>>>> saCkptCheckpointUnlink / saCkptCheckpointClose / >>>>>>>>>>> saCkptFinalize Press >>>>>>>>>>> <Enter> key to continue... >>>>>>>>>>> saCkptCheckpoint Press <Enter> key to continue... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 4) saCkptActiveReplicaSet() & On node B and >>>>>>>>>>> saCkptCheckpointStatusGet() >>>>>>>>>>> >>>>>>>>>>> NODE-B# >>>>>>>>>>> checkpointStatus.numberOfSections : 10000 >>>>>>>>>>> checkpointStatus.memoryUsed :756000 >>>>>>>>>>> checkpointCreationAttributes.creationFlags;10 >>>>>>>>>>> checkpointCreationAttributes.checkpointSize;10240000 >>>>>>>>>>> checkpointCreationAttributes.retentionDuration;60000000000 >>>>>>>>>>> checkpointCreationAttributes.maxSections;10000 >>>>>>>>>>> checkpointCreationAttributes.maxSectionSize;1024 >>>>>>>>>>> checkpointCreationAttributes.maxSectionIdSize;64 >>>>>>>>>>> >>>>>>>>>>> saCkptCheckpointUnlink / saCkptCheckpointClose / >>>>>>>>>>> saCkptFinalize Press >>>>>>>>>>> <Enter> key to continue... >>>>>>>>>>> saCkptCheckpoint Press <Enter> key to continue.. >>>>>>>>>>> >>>>>>>>>>> >>>>> ====================================================================================================== >>>>> >>>>> >>>>>> ========================== >>>>>>>>>>> -AVM >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 1/6/2014 12:32 PM, A V Mahesh wrote: >>>>>>>>>>>> Hi Alex, >>>>>>>>>>>> >>>>>>>>>>>> We never tested the 7500 sections , will test & and let >>>>>>>>>>>> you know , >>>>>>>>>>>> can you please share your test application , >>>>>>>>>>>> that allow us to respond quick. >>>>>>>>>>>> >>>>>>>>>>>> -AVM >>>>>>>>>>>> >>>>>>>>>>>> On 1/3/2014 8:23 PM, Alex Jones wrote: >>>>>>>>>>>>> Hello All, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm experimenting with the checkpoint service, and >>>>>>>>>>>>> some things >>>>>>>>>>>>> don't appear to work. >>>>>>>>>>>>> >>>>>>>>>>>>> The saCkptActiveReplicaSet and >>>>>>>>>>>>> saCkptCheckpointSynchronize[Async] don't appear to work >>>>>>>>>>>>> when the >>>>>>>>>>>>> checkpoint has section numbers greater than around 5500. >>>>>>>>>>>>> >>>>>>>>>>>>> I've created a checkpoint with 7500 sections, each >>>>>>>>>>>>> section being >>>>>>>>>>>>> 1024 bytes. The checkpoint is co-located and the "active >>>>>>>>>>>>> replica" >>>>>>>>>>>>> bit is set. >>>>>>>>>>>>> >>>>>>>>>>>>> I can create and write all the sections. And from >>>>>>>>>>>>> another node >>>>>>>>>>>>> I run saCkptCheckpointStatusGet, and the information all >>>>>>>>>>>>> looks good. >>>>>>>>>>>>> Everything is there. I see no errors from any CKPT API >>>>>>>>>>>>> calls. >>>>>>>>>>>>> >>>>>>>>>>>>> The problem comes when I call >>>>>>>>>>>>> saCkptActiveReplicaSet from this >>>>>>>>>>>>> other node. After I do this, saCkptCheckpointStatusGet >>>>>>>>>>>>> now returns >>>>>>>>>>>>> all the same information except the number of sections is >>>>>>>>>>>>> no longer >>>>>>>>>>>>> 7500 but 0. If I do this test with 50,000 sections only >>>>>>>>>>>>> about 3,000 >>>>>>>>>>>>> entries get synced. And iterating through the sections >>>>>>>>>>>>> shows that >>>>>>>>>>>>> there are only 3,000 sections. >>>>>>>>>>>>> >>>>>>>>>>>>> Calling saCkptCheckpointSynchronize[Async] in this >>>>>>>>>>>>> situation has >>>>>>>>>>>>> no effect, either. >>>>>>>>>>>>> >>>>>>>>>>>>> After looking through the code I see a comment in >>>>>>>>>>>>> cpnd_evt_proc_ckpt_arep_set that says "/* ###TBD sync up >>>>>>>>>>>>> is missing >>>>>>>>>>>>> with old active if now this fellow is becoming active. >>>>>>>>>>>>> */" So, it >>>>>>>>>>>>> doesn't appear that syncing is being done in the >>>>>>>>>>>>> saCkptActiveReplicaSet, which it should be. >>>>>>>>>>>>> >>>>>>>>>>>>> Can someone comment? >>>>>>>>>>>>> >>>>>>>>>>>>> I'm going to fix this and post a patch unless >>>>>>>>>>>>> someone else is >>>>>>>>>>>>> already working on it, but I didn't see a bug for it. >>>>>>>>>>>>> >>>>>>>>>>>>> Alex >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Rapidly troubleshoot problems before they affect your >>>>>>>>>>>>> business. Most IT >>>>>>>>>>>>> organizations don't have a clear picture of how >>>>>>>>>>>>> application performance >>>>>>>>>>>>> affects their revenue. With AppDynamics, you get 100% >>>>>>>>>>>>> visibility into >>>>>>>>>>>>> your >>>>>>>>>>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of >>>>>>>>>>>>> AppDynamics Pro! >>>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Opensaf-devel mailing list >>>>>>>>>>>>> Opensaf-devel@lists.sourceforge.net >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>>>> ------------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> Rapidly troubleshoot problems before they affect your business. >>>>>> Most IT >>>>>> organizations don't have a clear picture of how application >>>>>> performance >>>>>> affects their revenue. With AppDynamics, you get 100% visibility >>>>>> into your >>>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of >>>>>> AppDynamics Pro! >>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Opensaf-devel mailing list >>>>>> Opensaf-devel@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>>>> ------------------------------------------------------------------------------ >>>>> >>>>> >>>>> Rapidly troubleshoot problems before they affect your business. >>>>> Most IT >>>>> organizations don't have a clear picture of how application >>>>> performance >>>>> affects their revenue. With AppDynamics, you get 100% visibility >>>>> into your >>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of >>>>> AppDynamics Pro! >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk >>>>> >>>>> >>>>> _______________________________________________ >>>>> Opensaf-devel mailing list >>>>> Opensaf-devel@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel >>> >>> >> >> > ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel