What do you have defined in your ducc.properties for
ducc.rm.node.stability and ducc.agent.node.metrics.publish.rate? The
Web Server considers a node down according to the following
calculation:
private long getAgentMillisMIA() {
String location = "getAgentMillisMIA";
long secondsMIA = DOWN_AFTER_SECONDS*SECONDS_PER_MILLI;
Properties properties = DuccWebProperties.get();
String s_tolerance = properties.getProperty("ducc.rm.node.stability");
String s_rate =
properties.getProperty("ducc.agent.node.metrics.publish.rate");
try {
long tolerance = Long.parseLong(s_tolerance.trim());
long rate = Long.parseLong(s_rate.trim());
secondsMIA = (tolerance * rate) / 1000;
}
catch(Throwable t) {
logger.warn(location, jobid, t);
}
return secondsMIA;
}
The default is 65 seconds. Note that the Web Server has no effect on
actual operations in this case. If is just a reporter of information.
Lou.
On Wed, Nov 12, 2014 at 12:45 AM, reshu.agarwal
<[email protected]> wrote:
>
> Hi,
>
> When I was trying DUCC-1.1.0 on multi machine, I have faced an up-down
> status problem in machines. I have configured two machines and these
> machines are going down one by one. This makes the DUCC Services disable and
> Jobs to be initialize again and again.
>
> DUCC 1.0.0 was working fine on same machines.
>
> How can I fix this problem? I have also compared ducc.properties file for
> both versions. Both are using same configuration to check heartbeats.
>
> Re-Initialization of Jobs are increasing the processing time. Can I change
> or re-configure this process?
>
> Services are getting disabled automatically and showing excessive
> Initialization error status on mark over on disabled status but logs are not
> showing any error.
>
> I have to use DUCC 1.0.0 instead of DUCC 1.1.0.
>
> Thanks in Advance.
>
> --
> Signature *Reshu Agarwal*
>