Re: [devel] checkpoint problems

2014-01-16 Thread A V Mahesh
ans -Original Message- From: Hans Feldt [mailto:[email protected]] Sent: den 8 januari 2014 14:01 To: A V Mahesh; Alex Jones Cc: [email protected] Subject: Re: [devel] checkpoint problems The socket receive buffer size used is the system default. It can be too sm

Re: [devel] checkpoint problems

2014-01-14 Thread Alex Jones
sent through MDS: >>> >>> #define MAX_SYNC_TRANSFER_SIZE (30 * 1024 * 1024) >>> >>> 30M? What is the rationale for this number? This seems way too >>> high. When I change it to (4*1024*1024) (4M) it solves my problem, >&

Re: [devel] checkpoint problems

2014-01-10 Thread Alex Jones
>> 30M? What is the rationale for this number? This seems way too >> high. When I change it to (4*1024*1024) (4M) it solves my problem, >> and doesn't appear to affect performance. >> >> Alex >> >> On 01/08/2014 08:30 AM, Hans Feldt wrote:

Re: [devel] checkpoint problems

2014-01-09 Thread Alex Jones
gt;> >> Alex >> >> On 01/08/2014 08:30 AM, Hans Feldt wrote: >>> sysctl -a | grep rmem >>> >>> set rmem_default to 256K or so >>> >>> /Hans >>> >>>> -Original Message- >>>> From: Hans Fe

Re: [devel] checkpoint problems

2014-01-09 Thread A V Mahesh
l Message- >>> From: Hans Feldt [mailto:[email protected]] >>> Sent: den 8 januari 2014 14:01 >>> To: A V Mahesh; Alex Jones >>> Cc: [email protected] >>> Subject: Re: [devel] checkpoint problems >>> >>> The socket

Re: [devel] checkpoint problems

2014-01-08 Thread Alex Jones
ginal Message- >>> From: A V Mahesh [mailto:[email protected]] >>> Sent: den 8 januari 2014 11:29 >>> To: Alex Jones >>> Cc: [email protected] >>> Subject: Re: [devel] checkpoint problems >>> >>> Hi Ale

Re: [devel] checkpoint problems

2014-01-08 Thread Hans Feldt
sysctl -a | grep rmem set rmem_default to 256K or so /Hans > -Original Message- > From: Hans Feldt [mailto:[email protected]] > Sent: den 8 januari 2014 14:01 > To: A V Mahesh; Alex Jones > Cc: [email protected] > Subject: Re: [devel] checkpoint

Re: [devel] checkpoint problems

2014-01-08 Thread Hans Feldt
x Jones > Cc: [email protected] > Subject: Re: [devel] checkpoint problems > > Hi Alex, > > I suggest you increase and try the following TIPC values ( tipc code ) > and rebuild `tipc.ko`: > > net/tipc/tipc_socket.c:#define OVERLOAD_LIMIT_BASE 5000 &g

Re: [devel] checkpoint problems

2014-01-08 Thread A V Mahesh
Hi Alex, I suggest you increase and try the following TIPC values ( tipc code ) and rebuild `tipc.ko`: net/tipc/tipc_socket.c:#define OVERLOAD_LIMIT_BASE 5000 You can increase it to 5 and try again. - AVM. On 1/8/2014 4:16 AM, Alex Jones wrote: > After doing some deep debugging I am

Re: [devel] checkpoint problems

2014-01-07 Thread A V Mahesh
Hi Alex, On 1/8/2014 4:16 AM, Alex Jones wrote: > I can create and write all the sections. And from another node > I run saCkptCheckpointStatusGet, and the information all looks good. > Everything is there. I see no errors from any CKPT API calls. > > The problem comes when I call sa

Re: [devel] checkpoint problems

2014-01-07 Thread Alex Jones
After doing some deep debugging I am seeing the following in the MDS log on node B. This is when the CPND_EVT_ND2ND_CKPT_ACTIVE_SYNC is sent from the active replica on node A to the replica on node B. The sync message never gets up to the CPND layer on node B because it is dropped. This is wit

Re: [devel] checkpoint problems

2014-01-07 Thread Alex Jones
AVM, I get SA_AIS_ERR_TIMEOUT even when I pass SA_TIME_END as the timeout value. Is this not a bug? the synchronous CheckpointOpen call doesn't work at all in this scenario. It never succeeds. I can reproduce the problem with sectionCreationAttributes.expirationTime set to SA_TIME

Re: [devel] checkpoint problems

2014-01-06 Thread A V Mahesh
Hi Alex, CheckpointOpen call failing with SA_AIS_ERR_TIMEOUT NOT a bug , it is expected if you pass less time out value `timeout = 10` to saCkptCheckpointOpen(,timeout ...) call ,when ckpt has very large data/section. just increasing timeout will avoids the SA_AIS_ERR_TIMEOUT. Le

Re: [devel] checkpoint problems

2014-01-06 Thread Alex Jones
AVM, I've been playing around with your test program, and have gotten it to fail. I made the following changes: 1. Change init_dataX to be 1024k bytes, so that you are initializing the section to be 1024k. 2. Also, don't start the program on node B until A has finished writing/cr

Re: [devel] checkpoint problems

2014-01-06 Thread A V Mahesh
Hi Alex, I have created 10K sections ( please find the attached test application `Alex_test_node_A_app.c` & `Alex_test_node_B_app.c ` ) with your specified scenario & configuration and I haven't observed any issue with sections on another node. Try to reproduce the problem on your setup

Re: [devel] checkpoint problems

2014-01-05 Thread A V Mahesh
Hi Alex, when did the ckpt opened on the other node ? is it before the sections created on the fist node . -AVM On 1/6/2014 12:32 PM, A V Mahesh wrote: > Hi Alex, > > We never tested the 7500 sections , will test & and let you know , can > you please share your test application , >that all

Re: [devel] checkpoint problems

2014-01-05 Thread A V Mahesh
Hi Alex, We never tested the 7500 sections , will test & and let you know , can you please share your test application , that allow us to respond quick. -AVM On 1/3/2014 8:23 PM, Alex Jones wrote: > Hello All, > > I'm experimenting with the checkpoint service, and some things don't > a

[devel] checkpoint problems

2014-01-03 Thread Alex Jones
Hello All, I'm experimenting with the checkpoint service, and some things don't appear to work. The saCkptActiveReplicaSet and saCkptCheckpointSynchronize[Async] don't appear to work when the checkpoint has section numbers greater than around 5500. I've created a checkpoint with 75