Mark, I just tried the following options on Kraken on 1200 cores:
-pres_ksp_type bcgsl -pres_pc_type gamg -pres_pc_gamg_type agg -pres_pc_gamg_agg_nsmooths 1 -pres_pc_gamg_threshold 0.05 -pres_mg_levels_ksp_type richardson -pres_mg_levels_pc_type sor -pres_mg_coarse_ksp_typ e richardson -pres_mg_coarse_pc_type sor -pres_mg_coarse_pc_sor_its 4 It hung at [0]PCSetData_AGG bs=1 MM=10672 for nearly 15 minutes. I take it this is not normal. John On Mon, Jul 9, 2012 at 2:41 PM, John Mousel <john.mousel at gmail.com> wrote: > Can you clarify what you mean by null-space cleaning. I just run SOR on > the coarse grid. > > > > > On Mon, Jul 9, 2012 at 11:52 AM, Mark F. Adams <mark.adams at > columbia.edu>wrote: > >> >> On Jul 9, 2012, at 12:39 PM, John Mousel wrote: >> >> Mark, >> >> The problem is indeed non-symmetric. We went back and forth in March >> about this problem. I think we ended up concluding that the coarse size >> couldn't get too small or the null-space presented problems. >> >> >> Oh its singular. I forget what the issues were but an iterative coarse >> grid solver should be fine for singular problems, perhaps with null space >> cleaning if the kernel is sneaking in. Actually there is an SVD coarse >> grid solver: >> >> -mg_coarse_pc_type svd >> >> That is the most robust. >> >> When I did get it to work, I tried to scale it up, and on my local >> university cluster, it seemed to just hang when the core counts got above >> something like 16 cores. I don't really trust that machine though. >> >> >> That's the machine. GAMG does have some issues but I've not seen it hang. >> >> It's new and has been plagued by hardware incompatability issues since >> day 1. I could re-examine this on Kraken. Also, what option are you talking >> about with ML. I thought I had tried all the -pc_ml_CoarsenScheme options, >> but I could be wrong. >> >> >> This sounds like the right one. I try to be careful in my solvers to be >> invariant to subdomain shapes and sizes and I think Ray Tuminaro (ML >> developer) at least has options that should be careful about this also. >> But I don't know much about what they are deploying these days. >> >> Mark >> >> >> John >> >> >> >> On Mon, Jul 9, 2012 at 11:30 AM, Mark F. Adams <mark.adams at >> columbia.edu>wrote: >> >>> What problems are you having again with GAMG? Are you problems >>> unsymmetric? >>> >>> ML has several coarsening strategies available and I think the default >>> does aggregation locally and does not aggregate across processor >>> subdomains. If you have poorly shaped domains then you want to use a >>> global coarsening method (these are not expensive). >>> >>> Mark >>> >>> On Jul 9, 2012, at 12:17 PM, John Mousel wrote: >>> >>> Mark, >>> >>> I still haven't had much luck getting GAMG to work consistently for my >>> Poisson problem. ML seems to work nicely on low core counts, but I have a >>> problem where I can get long thin portions of grid on some processors >>> instead of nice block like chunks at high core counts, which leads to a >>> pretty tough time for ML. >>> >>> John >>> >>> On Mon, Jul 9, 2012 at 10:58 AM, John Mousel <john.mousel at >>> gmail.com>wrote: >>> >>>> Getting rid of the Hypre option seemed to be the trick. >>>> >>>> On Mon, Jul 9, 2012 at 10:40 AM, Mark F. Adams <mark.adams at columbia.edu >>>> > wrote: >>>> >>>>> Google PTL_NO_SPACE and you will find some NERSC presentations on how >>>>> to go about fixing this. (I have run into these problems years ago but >>>>> forget the issues) >>>>> >>>>> Also, I would try running with a Jacobi solver to see if that fixes >>>>> the problem. If so then you might try >>>>> >>>>> -pc_type gamg >>>>> -pc_gamg_agg_nsmooths 1 >>>>> -pc_gamg_type agg >>>>> >>>>> This is a built in AMG solver so perhaps it plays nicer with resources >>>>> ... >>>>> >>>>> Mark >>>>> >>>>> On Jul 9, 2012, at 10:57 AM, John Mousel wrote: >>>>> >>>>> > I'm running on Kraken and am currently working with 4320 cores. I >>>>> get the following error in KSPSolve. >>>>> > >>>>> > [2711]: >>>>> (/ptmp/ulib/mpt/nightly/5.3/120211/mpich2/src/mpid/cray/src/adi/ptldev.c:2046) >>>>> PtlMEInsert failed with error : PTL_NO_SPACE >>>>> > MHV_exe: >>>>> /ptmp/ulib/mpt/nightly/5.3/120211/mpich2/src/mpid/cray/src/adi/ptldev.c:2046: >>>>> MPIDI_CRAY_ptldev_desc_pkt: Assertion `0' failed. >>>>> > forrtl: error (76): Abort trap signal >>>>> > Image PC Routine Line >>>>> Source >>>>> > MHV_exe 00000000014758CB Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 000000000182ED43 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 0000000001829460 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 00000000017EDE3E Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 00000000017B3FE6 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 00000000017B3738 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 00000000017B2B12 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 00000000017B428F Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 000000000177FCE1 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 0000000001590A43 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 00000000014F909B Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 00000000014FF53B Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 00000000014A4E25 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 0000000001487D57 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 000000000147F726 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 000000000137A8D3 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 0000000000E97BF2 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 000000000098EAF1 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 0000000000989C20 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 000000000097A9C2 Unknown Unknown >>>>> Unknown >>>>> > MHV_exe 000000000082FF2D axbsolve_ 539 >>>>> PetscObjectsOperations.F90 >>>>> > >>>>> > This is somewhere in KSPSolve. Is there an MPICH environment >>>>> variable that needs tweaking? I couldn't really find much on this >>>>> particular error. >>>>> > The solver is BiCGStab with Hypre as a preconditioner. >>>>> > >>>>> > -ksp_type bcgsl -pc_type hypre -pc_hypre_type boomeramg -ksp_monitor >>>>> > >>>>> > Thanks, >>>>> > >>>>> > John >>>>> >>>>> >>>> >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120709/43861cb1/attachment.html>
