On Dec 18, 2013, at 3:17 PM, Jed Brown <[email protected]> wrote:
> Barry Smith <[email protected]> writes: > >> We’ve had this discussion before and wasted too much time on it. On >> the BG just don’t allow the damn loading of options for files for >> large runs, say greater than 128 nodes > > If we're going to ignore the files, we should generate a loud warning at > configure time and then *always* ignore the files. Ignoring only on > large runs sets people up to do a small trial run and then launch their > expensive job, only to find that it ignored their options. I never said “ignore” the files. I said error out if the files exist or somewhat passes a file name down. Yes erroring out is a bit annoying but much better than hangs. I also question why some system calls are made on all processes but only used on one, for example if (!flag) { ierr = PetscGetHomeDirectory(pfile,PETSC_MAX_PATH_LEN-16);CHKERRQ(ierr); /* PetscOptionsInsertFile() does a fopen() on rank0 only - so only rank0 HomeDir value is relavent */ I think a good policy is to minimize system calls to exactly where and when they are needed. Barry > > And what about -options_file? It also uses MPI_Bcast and is vulnerable > to the same bugs. What about DMDAGetLogicalCoordinate (an insane > function that should be deleted) or DMPlexDistribute, which also call > MPI_Bcast? > > Alternative is to have configure detect BG/Q and warn loudly that the > user should set PAMID_COLLECTIVES=0 due to known bugs (all I really want > for Christmas is for IBM to have a public bug tracker we can reference) > and then have PetscInitialize check and warn again in case it is not > set. > >> We sure as hell shouldn’t have a product that for each new user on >> BG requires them to try to use PETSc, have it fail, debug the >> problem, like it is now! > > Agreed, this is unacceptable.
