Re: [boinc_dev] Validator problem

2013-09-12 Thread Radim Vančo
No, if validator works well, everything is ok, if it starts to marks 
them as inconclusive, there is this:


2013-09-11 21:01:28.6961  [WU#6111094 ps_130910_20263_277] handle_wu(): 
No canonical result yet
2013-09-11 21:01:28.7004 [debug]   [WU#6111094 ps_130910_20263_277] 
Found 4 viable results
2013-09-11 21:01:28.7005 [debug]   [WU#6111094 ps_130910_20263_277] 
Enough for quorum, checking set.
2013-09-11 21:01:28.7006 [CRITICAL]   check_set: 
init_result([RESULT#14692440 ps_130910_20263_277_0]) transient failure
2013-09-11 21:01:28.7007 [CRITICAL]   check_set: 
init_result([RESULT#14692441 ps_130910_20263_277_1]) transient failure
2013-09-11 21:01:28.7008 [CRITICAL]   check_set: 
init_result([RESULT#15596859 ps_130910_20263_277_2]) transient failure
2013-09-11 21:01:28.7009 [CRITICAL]   check_set: 
init_result([RESULT#15616637 ps_130910_20263_277_3]) transient failure
2013-09-11 21:01:28.7009[HOST#29989 AV#43] [outlier=0] Updating HAV 
in db.  pfc.n=335.00-335.00
2013-09-11 21:01:28.7010[HOST#33276 AV#43] [outlier=0] Updating HAV 
in db.  pfc.n=183.00-183.00
2013-09-11 21:01:28.7011[HOST#20844 AV#43] [outlier=0] Updating HAV 
in db.  pfc.n=0.00-0.00
2013-09-11 21:01:28.7011[RESULT#15616637 ps_130910_20263_277_3] 
Inconclusive [HOST#13946]
2013-09-11 21:01:28.7012[HOST#13946 AV#43] [outlier=0] Updating HAV 
in db.  pfc.n=12.00-12.00
2013-09-11 21:01:28.8081  [WU#6118169 ps_130910_20288_242] handle_wu(): 
No canonical result yet
2013-09-11 21:01:28.9177 [debug]   [WU#6118169 ps_130910_20288_242] 
Found 3 viable results
2013-09-11 21:01:28.9179 [debug]   [WU#6118169 ps_130910_20288_242] 
Enough for quorum, checking set.
2013-09-11 21:01:28.9180 [CRITICAL]   check_set: 
init_result([RESULT#14706758 ps_130910_20288_242_0]) transient failure
2013-09-11 21:01:28.9181 [CRITICAL]   check_set: 
init_result([RESULT#14706759 ps_130910_20288_242_1]) transient failure
2013-09-11 21:01:28.9182 [CRITICAL]   check_set: 
init_result([RESULT#15605864 ps_130910_20288_242_2]) transient failure
2013-09-11 21:01:28.9182[HOST#14290 AV#43] [outlier=0] Updating HAV 
in db.  pfc.n=46.00-46.00
2013-09-11 21:01:28.9183[HOST#13719 AV#39] [outlier=0] Updating HAV 
in db.  pfc.n=9.00-9.00
2013-09-11 21:01:28.9183[RESULT#15605864 ps_130910_20288_242_2] 
Inconclusive [HOST#13946]
2013-09-11 21:01:28.9184[HOST#13946 AV#39] [outlier=0] Updating HAV 
in db.  pfc.n=1.00-1.00
2013-09-11 21:01:28.9215  [WU#6124461 ps_130910_20312_62] handle_wu(): 
No canonical result yet
2013-09-11 21:01:28.9247 [debug]   [WU#6124461 ps_130910_20312_62] Found 
2 viable results
2013-09-11 21:01:28.9249 [debug]   [WU#6124461 ps_130910_20312_62] 
Enough for quorum, checking set.
2013-09-11 21:01:28.9251 [CRITICAL]   check_set: 
init_result([RESULT#14719522 ps_130910_20312_62_0]) transient failure
2013-09-11 21:01:28.9252 [CRITICAL]   check_set: 
init_result([RESULT#14719523 ps_130910_20312_62_1]) transient failure
2013-09-11 21:01:28.9253[RESULT#14719522 ps_130910_20312_62_0] 
Inconclusive [HOST#45463]
2013-09-11 21:01:28.9253[HOST#45463 AV#43] [outlier=0] Updating HAV 
in db.  pfc.n=131.00-131.00
2013-09-11 21:01:28.9263[RESULT#14719523 ps_130910_20312_62_1] 
Inconclusive [HOST#9190]
2013-09-11 21:01:28.9265[HOST#9190 AV#43] [outlier=0] Updating HAV 
in db.  pfc.n=502.00-502.00


It looks like some memory leaks or something similiar, but didn't figure 
it out yet.




Dne 12.9.2013 07:46, David Anderson napsal(a):

Are there any error messages in the validator log file?
-- David

On 11-Sep-2013 1:19 PM, Radim Vančo wrote:
I am still trying to solve problem with my validator. It works well 
at start,
but after a few days, it starts marking all results as inconclusive. 
If I
restart the validator, it validates again well for a few days. I am 
attaching

source code of the validator if anyone would know where is the problem.

Thanks


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Validator problem

2013-09-12 Thread David Anderson

transient failure means that the validator couldn't open
a directory containing an output file
(I'll change the message to make this clear).

This can happen if you're using NFS and the mount failed.

-- David

On 12-Sep-2013 2:24 AM, Radim Vančo wrote:

No, if validator works well, everything is ok, if it starts to marks them as
inconclusive, there is this:

2013-09-11 21:01:28.6961  [WU#6111094 ps_130910_20263_277] handle_wu(): No
canonical result yet
2013-09-11 21:01:28.7004 [debug]   [WU#6111094 ps_130910_20263_277] Found 4
viable results
2013-09-11 21:01:28.7005 [debug]   [WU#6111094 ps_130910_20263_277] Enough for
quorum, checking set.
2013-09-11 21:01:28.7006 [CRITICAL]   check_set: init_result([RESULT#14692440
ps_130910_20263_277_0]) transient failure
2013-09-11 21:01:28.7007 [CRITICAL]   check_set: init_result([RESULT#14692441
ps_130910_20263_277_1]) transient failure
2013-09-11 21:01:28.7008 [CRITICAL]   check_set: init_result([RESULT#15596859
ps_130910_20263_277_2]) transient failure
2013-09-11 21:01:28.7009 [CRITICAL]   check_set: init_result([RESULT#15616637
ps_130910_20263_277_3]) transient failure
2013-09-11 21:01:28.7009[HOST#29989 AV#43] [outlier=0] Updating HAV in db.
pfc.n=335.00-335.00
2013-09-11 21:01:28.7010[HOST#33276 AV#43] [outlier=0] Updating HAV in db.
pfc.n=183.00-183.00
2013-09-11 21:01:28.7011[HOST#20844 AV#43] [outlier=0] Updating HAV in db.
pfc.n=0.00-0.00
2013-09-11 21:01:28.7011[RESULT#15616637 ps_130910_20263_277_3] Inconclusive
[HOST#13946]
2013-09-11 21:01:28.7012[HOST#13946 AV#43] [outlier=0] Updating HAV in db.
pfc.n=12.00-12.00
2013-09-11 21:01:28.8081  [WU#6118169 ps_130910_20288_242] handle_wu(): No
canonical result yet
2013-09-11 21:01:28.9177 [debug]   [WU#6118169 ps_130910_20288_242] Found 3
viable results
2013-09-11 21:01:28.9179 [debug]   [WU#6118169 ps_130910_20288_242] Enough for
quorum, checking set.
2013-09-11 21:01:28.9180 [CRITICAL]   check_set: init_result([RESULT#14706758
ps_130910_20288_242_0]) transient failure
2013-09-11 21:01:28.9181 [CRITICAL]   check_set: init_result([RESULT#14706759
ps_130910_20288_242_1]) transient failure
2013-09-11 21:01:28.9182 [CRITICAL]   check_set: init_result([RESULT#15605864
ps_130910_20288_242_2]) transient failure
2013-09-11 21:01:28.9182[HOST#14290 AV#43] [outlier=0] Updating HAV in db.
pfc.n=46.00-46.00
2013-09-11 21:01:28.9183[HOST#13719 AV#39] [outlier=0] Updating HAV in db.
pfc.n=9.00-9.00
2013-09-11 21:01:28.9183[RESULT#15605864 ps_130910_20288_242_2] Inconclusive
[HOST#13946]
2013-09-11 21:01:28.9184[HOST#13946 AV#39] [outlier=0] Updating HAV in db.
pfc.n=1.00-1.00
2013-09-11 21:01:28.9215  [WU#6124461 ps_130910_20312_62] handle_wu(): No
canonical result yet
2013-09-11 21:01:28.9247 [debug]   [WU#6124461 ps_130910_20312_62] Found 2
viable results
2013-09-11 21:01:28.9249 [debug]   [WU#6124461 ps_130910_20312_62] Enough for
quorum, checking set.
2013-09-11 21:01:28.9251 [CRITICAL]   check_set: init_result([RESULT#14719522
ps_130910_20312_62_0]) transient failure
2013-09-11 21:01:28.9252 [CRITICAL]   check_set: init_result([RESULT#14719523
ps_130910_20312_62_1]) transient failure
2013-09-11 21:01:28.9253[RESULT#14719522 ps_130910_20312_62_0] Inconclusive
[HOST#45463]
2013-09-11 21:01:28.9253[HOST#45463 AV#43] [outlier=0] Updating HAV in db.
pfc.n=131.00-131.00
2013-09-11 21:01:28.9263[RESULT#14719523 ps_130910_20312_62_1] Inconclusive
[HOST#9190]
2013-09-11 21:01:28.9265[HOST#9190 AV#43] [outlier=0] Updating HAV in db.
pfc.n=502.00-502.00

It looks like some memory leaks or something similiar, but didn't figure it out
yet.



Dne 12.9.2013 07:46, David Anderson napsal(a):

Are there any error messages in the validator log file?
-- David

On 11-Sep-2013 1:19 PM, Radim Vančo wrote:

I am still trying to solve problem with my validator. It works well at start,
but after a few days, it starts marking all results as inconclusive. If I
restart the validator, it validates again well for a few days. I am attaching
source code of the validator if anyone would know where is the problem.

Thanks


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

___
boinc_dev mailing list

[boinc_dev] Validator problem

2013-09-11 Thread Radim Vančo
I am still trying to solve problem with my validator. It works well at 
start, but after a few days, it starts marking all results as 
inconclusive. If I restart the validator, it validates again well for a 
few days. I am attaching source code of the validator if anyone would 
know where is the problem.


Thanks
#include string
#include vector
#include math.h
#include error_numbers.h
#include boinc_db.h
#include sched_util.h
#include validate_util.h
#include validate_util2.h
#include validator.h

using std::string;
using std::vector;

struct DATA {
int nlines;
double per[10];
double rms[10];
double chisq[10];
};

int init_result(RESULT result, void* data) {
FILE* f;
OUTPUT_FILE_INFO fi;
int n, retval, nlines;
double per[10], rms[10], chisq[10], dark, lambda, beta;

retval = get_output_file_path(result, fi.path);
if (retval) return retval;
retval = try_fopen(fi.path.c_str(), f, r);
if (retval) return retval;

DATA* dp = new DATA;

nlines = 0;
while (feof(f) == 0)
{
n = fscanf(f, %lf %lf %lf %lf %lf %lf, per[nlines], rms[nlines], chisq[nlines], dark, lambda, beta);
if (n != 6  n != -1) return ERR_XML_PARSE;

dp-per[nlines] = per[nlines];
dp-rms[nlines] = rms[nlines];
dp-chisq[nlines] = chisq[nlines];
	nlines++;
}
dp-nlines = nlines;
fclose(f);

data = (void*) dp;
return 0;
}

int compare_results(RESULT r1, void* _data1, RESULT const r2, void* _data2, bool match) {

int i;
double tol_per = 0.1, tol_rms = 0.1, tol_chisq = 0.5;

DATA* data1 = (DATA*)_data1;
DATA* data2 = (DATA*)_data2;
match = true;

for (i = 0; i  data1-nlines; i++)
{
if (fabs((data1-per[i] - data2-per[i]) / (data1-per[i] + data2-per[i])) / 2  tol_per) match = false;
if (fabs((data1-rms[i] - data2-rms[i]) / (data1-rms[i] + data2-rms[i])) / 2  tol_rms) match = false;
if (fabs((data1-chisq[i] - data2-chisq[i]) / (data1-chisq[i] + data2-chisq[i])) / 2  tol_chisq) match = false;
}
return 0;
}

int cleanup_result(RESULT const r, void* data) {
if (data) delete (DATA*) data;
return 0;
}

___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Validator problem

2013-06-19 Thread McLeod, John
Different processors can come up with slightly different results for each step 
in a long calculation.  If you allow processors of different types to crunch 
the same work unit, then you will have to write some fuzziness into your 
validator.

-Original Message-
From: boinc_dev [mailto:boinc_dev-boun...@ssl.berkeley.edu] On Behalf Of 
radim.vanco
Sent: Tuesday, June 04, 2013 7:26 AM
To: boinc_dev@ssl.berkeley.edu
Subject: [boinc_dev] Validator problem

Hi,
I have one problem with this custom validator. It is based on custom validator 
on wiki, it compares three numbers with decimal point and check structure 
before it. It works fine but after some time (two - four  days) it will start 
to mark all results as inconclusive. If I test it on only a few WUs it works 
exactly as I want but when there are many results then it starts after few days 
mark everything as invalid. Does anyone know what could cause the problem? I 
attached source code of the validator.

Radim



___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.


[boinc_dev] Validator problem

2013-06-04 Thread radim.vanco
Hi,
I have one problem with this custom validator. It is based on custom validator 
on wiki, it compares three numbers with decimal point and check structure 
before it. It works fine but after some time (two - four  days) it will start 
to mark all results as inconclusive. If I test it on only a few WUs it works 
exactly as I want but when there are many results then it starts after few days 
mark everything as invalid. Does anyone know what could cause the problem? I 
attached source code of the validator.

Radim



#include string
#include vector
#include math.h
#include error_numbers.h
#include boinc_db.h
#include sched_util.h
#include validate_util.h
#include validate_util2.h
#include validator.h

using std::string;
using std::vector;

struct DATA {
int nlines;
double per[10];
double rms[10];
double chisq[10];
};

//extern int init_result(RESULT const  result, void* data) {
int init_result(RESULT result, void* data) {
FILE* f;
OUTPUT_FILE_INFO fi;
int n, retval, nlines;
double per[10], rms[10], chisq[10], dark, lambda, beta;

retval = get_output_file_path(result, fi.path);
if (retval) return retval;
retval = try_fopen(fi.path.c_str(), f, r);
if (retval) return retval;

DATA* dp = new DATA;

nlines = 0;
while (feof(f) == 0)
{
n = fscanf(f, %lf %lf %lf %lf %lf %lf, per[nlines], rms[nlines], 
chisq[nlines], dark, lambda, beta);
if (n != 6  n != -1) return ERR_XML_PARSE;

dp-per[nlines] = per[nlines];
dp-rms[nlines] = rms[nlines];
dp-chisq[nlines] = chisq[nlines];
nlines++;
//printf (Výstup1: %lf %lf %lf %lf %lf %lf\n, per[nlines], rms[nlines], 
chisq[nlines], dark, lambda, beta);
//printf (Počet řádků: %d\n, n);
//printf (Aktuální řádek: %d\n, nlines);
}
dp-nlines = nlines;
fclose(f);

data = (void*) dp;
return 0;
}

int compare_results(RESULT r1, void* _data1, RESULT const r2, void* _data2, 
bool match) {

int i;
double tol_per = 0.1, tol_rms = 0.1, tol_chisq = 0.5;

DATA* data1 = (DATA*)_data1;
DATA* data2 = (DATA*)_data2;
match = true;

for (i = 0; i  data1-nlines; i++)
{
//  if (fabs(data1-per[i] - data2-per[i])  tol_per) match = false;
//if (fabs(data1-rms[i] - data2-rms[i])  tol_rms) match = false;
//if (fabs(data1-chisq[i] - data2-chisq[i])  tol_chisq) match = 
false;
if (fabs((data1-per[i] - data2-per[i]) / (data1-per[i] + 
data2-per[i])) / 2  tol_per) match = false;
if (fabs((data1-rms[i] - data2-rms[i]) / (data1-rms[i] + 
data2-rms[i])) / 2  tol_rms) match = false;
if (fabs((data1-chisq[i] - data2-chisq[i]) / (data1-chisq[i] + 
data2-chisq[i])) / 2  tol_chisq) match = false;
//printf (Výstup: %lf %lf %lf \n, data1-per[i], data1-rms[i], 
data1-chisq[i]);
}
return 0;
}


int cleanup_result(RESULT const r, void* data) {
if (data) delete (DATA*) data;
return 0;
}


___
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.