Sorry for replying to myself, but I wanted to leave the solution to this problem archived in the mailing list.
The "problem", so to speak, is that there is a function in the dakota source code, in the file: Dakota-5.1/src/regexp.c (well, obviously :) also named regcomp, and that ended up being called instead of the one in the slurm library, (I don't know where the latter is supposed to come from.) I just renamed all functions in this file (just to be safe :) and in another file that calls them, CtelRegExp.C, and everything worked as expected. With hindsight everything is ridiculously simple, but this was actuall *REALLY HARD* for me to find :) Thanks for slurm anyway, I was an user of PBS and relatives, but now I think slurm is much better all around. Ramiro 2011/5/19 Ramiro Willmersdorf <[email protected]>: > Hello, > > I've been trying for a couple of weeks now (not doing this > exclusively, of course) to get > DAKOTA (http://dakota.sandia.gov) running in parallel, with MPI, on a cluster > that runs BULL's XBAS-5v3.1. Bull provides its own version of MPI and > also intel MPI. > (These are actually a bit of problem since dakota automaticaly detects > neither, but > that's another problem...) > > After compiling the system, when submitted to slurm, I get these messages > (from > all processors, I edited the output to shorten it) > > cpu_bind=MASK - super34, task 0 0 [6100]: mask 0x1 set > dakota: error: keyvalue regex compilation failed > dakota: error: Parsing error at unrecognized key: > dakota: error: Parse error in file /etc/slurm/slurm.conf line 1: > "ClusterName=super" > dakota: error: Parsing error at unrecognized key: > dakota: error: Parse error in file /etc/slurm/slurm.conf line 2: > "ControlMachine=super0" > dakota: error: Parsing error at unrecognized key: > dakota: error: Parse error in file /etc/slurm/slurm.conf line 3: > "Licenses=imex*100,stars*100,gem*100,br*100" > dakota: error: Parsing error at unrecognized key: > dakota: error: Parse error in file /etc/slurm/slurm.conf line 8: > "SlurmUser=slurm" > > and on and on and on until the end of slurm.conf. > > Due to an incredible bout of serial stupidity, it took me quite a > while to locate the > source of the messages, and I blindly tried all possible combinations > of compilers and mpi > libraries in the system. I must have compiled dakota about 30 times :) > > Finally I realized that these messages are being printed by libslurm, and > come from, as far as I can tell, from this code snippet, from the file > ./src/common/parse_config.c > > static void _keyvalue_regex_init(void) > { > if (!keyvalue_initialized) { > if (regcomp(&keyvalue_re, keyvalue_pattern, > REG_EXTENDED) != 0) { > /* FIXME - should be fatal? */ > error("keyvalue regex compilation failed\n"); > } > keyvalue_initialized = true; > } > } > > *Really*, for the life of me, I can't see what could possibly fail here, > and not fail in the many, many other mpi programs that run fine in the > cluster. > > ( I took the above code from the stock source distribution of > slurm-2.0.5, which is > the version BULL uses in this system, (which I think is the same as in > RHEL5.3), > but I can't swear they didn't modify it. ) > > The only thing I can think of, and I don't know if this is even possible, and > would, at first, think not at all, is that when linked to dakota, the > slurm library > ends up using a different regex library than what it's expecting. The > regex patterns > are constant, how can the compilation fail in this case? > > It's kind of hard to debug this because it's a production cluster, and > I'm somewhat > reluctant to install a different resource manager, recompiled with debugging > information on. That said, it now occurred to me that I don't need to > install the > version with debugging symbols, just link this one executable to the > library with > debugging symbols on, no? Of course, probably everything will work fine with > the > new library :) Also, debugging this is a royal PITA, because it's a > parallel program, > but it's doable. > > Well, I'm off to try this now, but, if anyone has any recommendation > on anything else > I could try, I'd be more than grateful. > > Thank you for you attention, > > Ramiro. > > > -- > Ramiro Brito Willmersdorf [email protected] > Departamento de Eng. Mecânica UFPE > tel: +81 2126-8231r239 fax: +81 2126-8232 > -- Ramiro Brito Willmersdorf [email protected] Departamento de Eng. Mecânica UFPE tel: +81 2126-8231r239 fax: +81 2126-8232
