This is currently a mess. 

   Say one process calls PetscFunctionListAdd() with a function pointer, but 
another calls it with the string name of the function. Now both processes call 
PetscFunctionListFind() with a common comm. The process with the function 
pointer will return immediately with the answer. The one without the function 
pointer will start mucking around with dynamic libraries which "sometimes" 
could be collective on the comm so it would block? 

  These sets of routines evolved organically overtime. We need to refactor the 
whole hierarchy of these routines and figure out what collectivity is needed.  
There are too many potential comms since they were kind of shoved in over time. 

  It may be simplest if we treat accessing the dynamic libraries as completely 
non-collective, this means removing things like PetscDLLibraryRetrieve() which, 
while a way cool concept has never proven to be practical during its 15 years 
of existence.

   So are we able to treat accessing dynamic libraries as completely 
non-collective? Will this lose a valuable feature?

   Barry


On Feb 4, 2013, at 9:22 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Sat, Feb 2, 2013 at 3:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    Yeah I noticed this problem but didn't want to deal with it when I changed 
> the code.
> 
> So if we believe the documentation of PetscFunctionListAdd, 
> XXInitializePackage() is effectively collective on COMM_WORLD (though not 
> documented as such). This means that if 
> !defined(PETSC_USE_DYNAMIC_LIBRARIES), the following could deadlock:
> 
> if (!rank) {
>   VecCreate(PETSC_COMM_SELF,....);
> }
> 
> which would be awfully bad behavior. In reality, PetscFunctionListAdd() does 
> not reference comm at all. Why did you add the comm argument? "Consistency"?
> 
> Whatever the "next" documentation system is, it should be taught to trace the 
> "collective" attribute and complain if a "Not Collective" function calls a 
> Collective function with an argument other than COMM_SELF.
> 
> 
>     Yes we should remove the "Formally Collective", I was drinking that week 
> :-)
> 
>    Barry
> 
> On Feb 2, 2013, at 2:54 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> 
> > In [1], PetscFunctionListAdd became implicitly collective on COMM_WORLD, 
> > but the all the XXRegisterDynamic() say "Not collective". These all have to 
> > be updated if this is the case, but I'm not sure it's even a good thing. 
> > What if we have a big multi-domain simulation in which we initialize each 
> > of the components on their own subcomm. Those sub-components would not be 
> > allowed to register methods (or load plugins) that they might use because 
> > registration was implicitly more global.
> >
> > The comm is used by PetscLs and others. This is important because file 
> > systems are terrible at independent access. (Same for loading shared 
> > libraries; it's potentially much easier to do it by broadcasting the 
> > library, though portability is tricky.)
> >
> > Anyway, it would be really bad to PetscDLLibraryAppend() on a subcomm and 
> > have the registration function in the shared lib call PCRegisterDynamic() 
> > that promotes itself to COMM_WORLD.
> >
> > Maybe we need to pass an explicit comm to all the registration functions.
> >
> > [1] 
> > https://bitbucket.org/petsc/petsc-dev/commits/07f9e01e040feeb4162253a60ca63556436f4135
> >
> > What does "Formally collective" mean anyway? Either it's always safe to 
> > call independently, it's "Logically collective" so that there is no 
> > performance impact, but it still needs to be collective to have consistent 
> > state, or it's Not Collective. This falls under Not Collective because it 
> > can deadlock if you call it independently.
> 
> 

Reply via email to