On Oct 22, 2011, at 11:05 AM, Dominik Szczerba wrote: > After upgrade to 3.2 the mentioned valgrind issues were still there > (except for the first one, related to partitioning). However, I seem > to be able to find the cause for them, which is NOT updating the ghost > values in x BEFORE kspsolve, only AFTER. That way my coefficient > matrix, depending on 'x', and obviously assembled before kspsolve, > contained uninitialized values. When fixing the issue, bcgs solver > behaves as expected, and as the other solvers. I am relieved the issue > was with me. > > However, bcgs and only bcgs will occasionally break down (inconsistent > state, division by zero), if the initial solution is exact zero. > Pre-filling it with something small (compared to the expected > solution) fixes the breakdown, this small issue, however, still > bothers me a bit, what's so special about zero? > We'd need an example code that reproduces this. You could use -ksp_view_binary to generate binaryoutput file and send it to petsc-maint at mcs.anl.gov
Barry > Many thanks for your valuable support, > Dominik > > On Fri, Oct 21, 2011 at 7:57 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: >> >> On Oct 21, 2011, at 11:57 AM, Dominik Szczerba wrote: >> >>> On Fri, Oct 21, 2011 at 6:29 PM, Barry Smith <bsmith at mcs.anl.gov> wrote: >>>> >>>> On Oct 21, 2011, at 9:29 AM, Dominik Szczerba wrote: >>>> >>>>> I am doing a transient computation, solving one linear problem per >>>>> timestep, so naturally I want to exploit 'x' from the previous time step >>>>> to be the initial value for the next solve (KSPSetInitialGuessNonzero). >>>>> For the longest time, however, I was getting wrong results, unless I was >>>>> resetting 'x' each time step (to some constant value, pure zero caused >>>>> bcgs to break down). >>>> >>>> What happened if you did not set it to some constant (that is kept the >>>> old solution)? Did you get KSP_DIVERGED_BREAKDOWN? It would be very odd >>>> that starting with a good initial guess would lead to breakdown but that >>>> cannot be completely ruled out. >>>> >>> >>> There was no error, the iterations reportedly converged. Only the >>> results were wrong, sort of strong random spikes. >>> >>>> >>>> I would also check with valgrind >>>> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind >>>> >>> >>> There are 3 issues with valgrind: >>> >>> 1) Syscall param writev(vector[...]) points to uninitialised byte(s) >>> -> tribbered by MatPartitioningApply, then leading deep into ParMetis >>> >>> 2) Conditional jump or move depends on uninitialised value(s) -> >>> many times, in VecMin and VecMax and KSPSolve_BCGSL >>> and >>> >>> 3) Syscall param writev(vector[...]) points to uninitialised byte(s) >>> -> just once, in VecScatterBegin triggered by VecCreateGhost on the >>> 'x' vector, which is ghosted.\ >> >> These are very bad things and should not happen at all. They must be >> tracked down before anything can be trusted. Start by sending the full >> valgrind output from a PETSc 3.2 run to petsc-maint at mcs.anl.gov >> >> >> Barry >> >>> >>> Do they pose any serious threats? >>> >>>> >>>> Have you tried KSPBCGSL? This is "enhanced Bi-CG-stab" algorithm that >>>> is designed to handle certain situations that may cause grief for regular >>>> Bi-CG-stab I guess. >>>> >>> >>> Thanks for the hint on bcgsl - it works as expected. >>> >>> So, do I have a problem in the code or bcgs is unreliable? If the >>> latter: as a method or as this specific implementation? >>> >>> Thanks for any comments, >>> Dominik >>> >>> >>>> >>>> Barry >>>> >>>> >>>> >>>>> After hours of debugging I was unable to find any errors in my >>>>> coefficients, I experimentally found out, however, that changing the >>>>> solver from bcgs to gmres or fgmres removes the problem: I no longer need >>>>> to clear the solution vector. >>>>> Now I am a bit worried, if this is still some time bomb in my code or is >>>>> a known phenomenon. Thanks for any hints. >>>>> >>>>> Regards, Dominik >>>> >> >>
