There are at least three points to make in response to the claim that this has
to be a defect.
1) We have not seen this problem earlier. So obviously testing is different
this time, i.e. this
is new way of testing that was not performed when testing the earlier
releases.
2) Saying that this fix is "the only way for this campaign to succeed" is not
true unless you show
that the problem is not performance related. I am convinced that the root
cause is very much
performance related. So the very same campaign most likely succeeds,
probably has succeeded
in earlier releases, simply because the platform it was tested on had a
more reasonable
load/capacity ratio.
3) I have noticed that there is lately a tendency to stress test OpenSAF more
often with higher
load/capacity ratio, at least here at Ericsson due to various reasons.
Probably it is relaed to
the more volatile capacity of virtualized and/or "cloud" based platforms,
in particular when
they are being reconfigured.
What I am basically saying is that it is always possible to increase the
load/capcity ratio until you
do see a resource related problem ocurr in the system. It is a bit unfair to
then declare that problem
as a defect. Particuarly when the effect is benign. In this case an SMF
campaign gets aborted but
in a controlled way.
OpenSAF has no load regulation so OpenSAF is currently vulnerable to getting
stuck in resource
prroblems. OpensAF does have partial overload protection in the IMM service and
this is what
is geting triggered here (max outstanding fevs messages at the local IMMND a
type of flow
control).
On the other hand if this is really a pratical and real problem also for
deployments on old OpenSAF
releases being used in new ways in *production* , i.e. there is a plan to
regularly run with
overloaded capacity in production, then one could declare this as a defect,
even if it is a bit
"unfair".
---
** [tickets:#1448] smf: Make campaigns less fragile by retrying on
ERR_NO_RESOURCES**
**Status:** unassigned
**Milestone:** future
**Created:** Fri Aug 14, 2015 07:09 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Aug 25, 2015 09:28 AM UTC
**Owner:** nobody
The SMF service is a heavy user of the IMM service.
The IMM has an established client pattern for ERR_TRY_AGAIN which allows an
application realtime
control over how long it is prepared to wait for a transient inability of the
IMM service to fullfill a request.
Each response of TRY_AGAIN should in itself be fast so the application needs a
delay in its retry loop.
There is also the very similar error code ERR_NO_RESOURSES. Logically that
error code is identical
to TRY_AGAIN in that the request could not be accepted due to no fault of the
client but due to some
more or less temporary problem in the IMM service. The difference is that
NO_RESOURCES has no
realtime ambitions. Typically this error code is used by the imm when the imm
can not fullfill a request
due to reasons that are outside of the imm service control. Also the time from
request to a response
of ERR_NO_RESOUIRCES may be long.
The SMF service in general has no realtime requirments. The main goal for the
SMF service is to
successfully complete correctly formulated camopaings. This means that the SMF
service should be
programmed to avoid unnecessary fragility related to temporary problems, even
if the temporary problem
could linger for seconds or minutes.
The alternative of aborting the campaign will itself discard potentially large
execution times already
completed. It may sometimes even result in a system restore.
This means that SMF campaigns should have a "retry loop" that handles not just
TRY_AGAIN,
but also ERR_NO_RESOURCES where this return code is relevant (can be returned
according to
the API spec).. The error copde ERR_BUSY also exists and is for all practical
purposes identical
to ERR_NO_RESOURCES in semantics, both logical and timing.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets