Re: [devel] [PATCH 0 of 6] Review Request for : pre-review of 2PBE transaction payload handling (#21)

Anders Bjornerstedt Tue, 08 Oct 2013 02:07:30 -0700

About this:
>> The slave PBE can not be able to do classimplementerset
>>
>> Oct  4 15:05:58 Slot-4 osafimmnd[3039]: NO ERR_TRY_AGAIN: ccb 9657 is 
>> active on object cscfRdn=75387 of class neNumber. Can not add class 
>> applier
>> Oct  4 15:06:04 Slot-4 last message repeated 10 times
>> Oct  4 15:06:04 Slot-4 osafimmpbed: ER saImmOiClassImplementerSet for 
>> neNumber failed 6
>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer locally 
>> disconnected. Marking it as doomed 131 <429, 2020f> (@OpenSafImmPBE)
>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer locally 
>> disconnected. Marking it as doomed 132 <430, 2020f> (OsafImmPbeRt_B)
>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer disconnected 
>> 131 <429, 2020f> (@OpenSafImmPBE)
>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO Implementer disconnected 
>> 132 <430, 2020f> (OsafImmPbeRt_B)
>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: WA SLAVE PBE process has 
>> apparently died at non coord
>> Oct  4 15:06:04 Slot-4 osafimmnd[3039]: NO STARTING SLAVE PBE process.
>>     
> Not a serious problem, assuming it does not happen often.
> That is, this is a performance problem.
> The slave will restart and should hopefully succeed in initailizing the 
> next time.
> New CCBs will not generate when the imm is not persistent writable and 
> the imm is not
> persistent writable in 2PBE wehn not both PBEs are available.
> So this problem should dissapear once ccb 9657 has been aborted.
>   
Just realized that something similar to the above probably could cause 
problems.
In the above scenario I assume you had some kind of loop generating ccbs 
repeatedly
(in sequence) where they are applied.  The above ccb would then get 
aborted inside
its attempt to apply and then next ccb would be rejected at the 
oepration level (before apply).


But if you simply had a lingering CCB, not being applied, jsut lingering.
Think of an operator starting something and then going for coffee.
Then that *would* currently prevent the slave from rejoining.
I need to add a mechnism similar to that done in imm-sync, where on-going
(non critical) ccbs are geiven a period of grace and then aborted from 
below by the imm.
No period of grace would be involved in this new case since all such non 
critical ccbs
are doomed anyway. Another possible solution and probably simpler is to 
allow
the 2PBE-applier to attach even when there is an on going ccb, i.e. 
relax the above
guard for only this 2PBE applier. That of course means that the slave 
PBE would risk having
missed receving some operations included in that on-going non critical ccb.
But this wouold be caught in the apply of that ccb. The prepare protocol 
between the
PBEs would timneout and the ccb get aborted because ther oepration count 
would never
be complete at the slave.

/AndersBj

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 0 of 6] Review Request for : pre-review of 2PBE transaction payload handling (#21)

Reply via email to