We’ve had this discussion before and wasted too much time on it. On the BG 
just don’t allow the damn loading of options for files for large runs, say 
greater than 128 nodes so if the user asks for loading from a file  or node 0 
finds [.]petscrc files then generate a useful error message and stop. Yes, this 
check would be specific to one class of machines.

  We can’t deliver a product that simply doesn’t work on IBM and then blame IBM 
for being buggy.  The other choice is to have have configure detect BG and then 
stop and refuse to configure at all for the “buggy machine” but that is a silly 
emotional reaction.

   We sure as hell shouldn’t have a product that for each new user on BG 
requires them to try to use PETSc, have it fail, debug the problem, like it is 
now!

    So just fix it like this and let it go,

   Barry


On Dec 18, 2013, at 2:14 PM, Jed Brown <[email protected]> wrote:

> Satish Balay <[email protected]> writes:
>> I had a chat with Derek today morning. The error case was with 512
>> nodes [same as above] with --ranks-per-node 4 or 8. And this was on
>> ceatus.
> 
> It is spelled "cetus".
> 
>> The hang was confirmed to be in PetscInitialze [via the debugger] and
>> -skip_petscrc went past the hang.
> 
> That is where the particular sequence of collectives (MPI_Bcast) gets
> called.  Getting past that part does not rule out the same problem
> occurring later, perhaps with lower probability.

Reply via email to