On Tue, Sep 18, 2012 at 8:24 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > On Sep 18, 2012, at 8:09 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote: > > > We should make it brain-dead simple for KSP to reorder internally and > run the solve in a low-bandwidth ordering. > > I've always had very difficult philosophical issues with doing this. > It messes up the layering of KSP outside of the PC/matrix; I have the same > issues with having KSP diagonally scale the system before solving it. We > have this really ugly chunk of code in KSPSolve() to do the diagonal scaling > > /* diagonal scale RHS if called for */ > if (ksp->dscale) { > ierr = > VecPointwiseMult(ksp->vec_rhs,ksp->vec_rhs,ksp->diagonal);CHKERRQ(ierr); > /* second time in, but matrix was scaled back to original */ > if (ksp->dscalefix && ksp->dscalefix2) { > Mat mat,pmat; > > ierr = PCGetOperators(ksp->pc,&mat,&pmat,PETSC_NULL);CHKERRQ(ierr); > ierr = > MatDiagonalScale(pmat,ksp->diagonal,ksp->diagonal);CHKERRQ(ierr); > if (mat != pmat) {ierr = > MatDiagonalScale(mat,ksp->diagonal,ksp->diagonal);CHKERRQ(ierr);} > } > > /* scale initial guess */ > if (!ksp->guess_zero) { > if (!ksp->truediagonal) { > ierr = > VecDuplicate(ksp->diagonal,&ksp->truediagonal);CHKERRQ(ierr); > ierr = VecCopy(ksp->diagonal,ksp->truediagonal);CHKERRQ(ierr); > ierr = VecReciprocal(ksp->truediagonal);CHKERRQ(ierr); > } > ierr = > VecPointwiseMult(ksp->vec_sol,ksp->vec_sol,ksp->truediagonal);CHKERRQ(ierr); > } > } > > But it is nasty because it changes the convergence tests, the monitor > routines (they report the residual norms in the scale system not the > original). Also, does it unscale the matrix after the solve (or leave it > scaled for when it is never going to be used again?). The scaling can screw > up algebraic multigrid methods. Does the scaling affect Eisenstat-Walker > type convergence tests for Newton's method?. It is nasty code, hard to > follow and hard for users to fully appreciate. > > We could do the same ugly hacked up thing for reordering; using a > partitioner for between processes and a low-bandwidth ordering within > inside the KSPSolve/SetUp(). > > It would be nice to have a cleaner/clearer abstract model in terms of the > software layering to handle this. For example I played with the idea of a > "diagonal scaling" PC and "reordering" PC that > does the change and then has its own KSP inside for the solve. Thus you'd > run with -ksp_type preonly -pc_type reorder -reorder_ksp_type gmres > -reorder_pc_type ilu etc. But that seems a bit pedantic > and annoying that you have to put all your "true" solver options with a > prefix. > > Jed, what is your solution? > Why not make it part of the matrix? For the minute, assume we are using a DM. Then the matrix has the nonzero pattern already. We can use an option to compute a fill-reducing ordering and either permute it directly, or just use the permutations on in and out. This insulates it from the solver completely. Matt > Barry > > > > > > > > > The Matrix Market orderings are often so contrived that performance > numbers are nearly meaningless. > > > > > On Tue, Sep 18, 2012 at 8:05 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: > > > > Good paper, > http://www.epcc.ed.ac.uk/wp-content/uploads/2011/11/PramodKumbhar.pdf, > worth reading > > > > > > On Sep 18, 2012, at 7:46 PM, C. Bergstr?m <cbergstrom at pathscale.com> > wrote: > > > > > > > > Hi > > > > > > I'm hoping someone with some spare cycles and patience is willing to > help test a nightly ENZO build with petsc. > > > > > > Here's the nightly which won't require a key (It will ask, but it's > optional) > > > > http://c591116.r16.cf2.rackcdn.com/enzo/nightly/Linux/enzo-2012-09-18-installer.run > > > > > > For BLAS we're testing against this (and in the future will ship our > own built version) > > > https://github.com/xianyi/OpenBLAS/ > > > ---------- > > > I'm specifically looking for feedback on the GPGPU side of this and > performance. The reason why anyone would care - We've put a lot of work in > performance for memory bound kernels, predictable latency and lowest > latency. (We don't generate any PTX and go direct to bare metal codegen > tied with our own very small runtime. We officially only support Tesla > 2050/2070 cards at this time, but ping me if you have another card you can > test with) > > > > > > You can replace nvcc with pathcu (We don't support the nvcc flags) > > > > > > pathcu -c foo.cu # CUDA (Bugs found should be fixed quickly, but > expect bugs - Thrust and CuSP testing also in progress) > > > pathcc/f90 -hmpp # OpenHMPP > > > pathcc/f90 -openacc # OpenACC and the flag will be changed to -acc soon > > > > > > For more details, documentation and or bug reports please email me > directly. > > > > > > Cheers, > > > > > > > > > Christopher > > > > > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120918/0ed034d6/attachment.html>