On Thu, Dec 20, 2012 at 3:39 PM, Thomas Witkowski <Thomas.Witkowski at tu-dresden.de> wrote: > I cannot use the information from log_summary, as I have three different LU > factorizations and solve (local matrices and two hierarchies of coarse > grids). Therefore, I use the following work around to get the timing of the > solve I'm intrested in:
You misunderstand how to use logging. You just put these thing in separate stages. Stages represent parts of the code over which events are aggregated. Matt > MPI::COMM_WORLD.Barrier(); > wtime = MPI::Wtime(); > KSPSolve(*(data->ksp_schur_primal_local), tmp_primal, tmp_primal); > FetiTimings::fetiSolve03 += (MPI::Wtime() - wtime); > > The factorization is done explicitly before with "KSPSetUp", so I can > measure the time for LU factorization. It also does not scale! For 64 cores, > I takes 0.05 seconds, for 1024 cores 1.2 seconds. In all calculations, the > local coarse space matrices defined on four cores have exactly the same > number of rows and exactly the same number of non zero entries. So, from my > point of view, the time should be absolutely constant. > > Thomas > > Zitat von Barry Smith <bsmith at mcs.anl.gov>: > > >> >> Are you timing ONLY the time to factor and solve the subproblems? Or >> also the time to get the data to the collection of 4 cores at a time? >> >> If you are only using LU for these problems and not elsewhere in the >> code you can get the factorization and time from MatLUFactor() and >> MatSolve() or you can use stages to put this calculation in its own stage >> and use the MatLUFactor() and MatSolve() time from that stage. >> Also look at the load balancing column for the factorization and solve >> stage, it is well balanced? >> >> Barry >> >> On Dec 20, 2012, at 2:16 PM, Thomas Witkowski >> <thomas.witkowski at tu-dresden.de> wrote: >> >>> In my multilevel FETI-DP code, I have localized course matrices, which >>> are defined on only a subset of all MPI tasks, typically between 4 and 64 >>> tasks. The MatAIJ and the KSP objects are both defined on a MPI >>> communicator, which is a subset of MPI::COMM_WORLD. The LU factorization of >>> the matrices is computed with either MUMPS or superlu_dist, but both show >>> some scaling property I really wonder of: When the overall problem size is >>> increased, the solve with the LU factorization of the local matrices does >>> not scale! But why not? I just increase the number of local matrices, but >>> all of them are independent of each other. Some example: I use 64 cores, >>> each coarse matrix is spanned by 4 cores so there are 16 MPI communicators >>> with 16 coarse space matrices. The problem need to solve 192 times with the >>> coarse space systems, and this takes together 0.09 seconds. Now I increase >>> the number of cores to 256, but let the local coarse space be defined again >>> on only 4 cores. Again, 192 solutions with these coarse spaces are >>> required, but now this takes 0.24 seconds. The same for 1024 cores, and we >>> are at 1.7 seconds for the local coarse space solver! >>> >>> For me, this is a total mystery! Any idea how to explain, debug and >>> eventually how to resolve this problem? >>> >>> Thomas >> >> >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
