On Tue, Sep 24, 2013 at 10:39 AM, Analabha Roy <[email protected]>wrote:
> Hi, > > > > On Tue, Sep 24, 2013 at 9:33 PM, Matthew Knepley <[email protected]>wrote: > >> On Tue, Sep 24, 2013 at 8:35 AM, Analabha Roy <[email protected]>wrote: >> >>> >>> >>> >>> On Tue, Sep 24, 2013 at 8:41 PM, Matthew Knepley <[email protected]>wrote: >>> >>>> On Tue, Sep 24, 2013 at 8:08 AM, Analabha Roy >>>> <[email protected]>wrote: >>>> >>>>> >>>>> On Tue, Sep 24, 2013 at 1:42 PM, Jed Brown <[email protected]>wrote: >>>>> >>>>>> Analabha Roy <[email protected]> writes: >>>>>> >>>>>> > Hi all, >>>>>> > >>>>>> > >>>>>> > Compiling and running this >>>>>> > code< >>>>>> https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c >>>>>> >that >>>>>> > builds a petsc matrix gives different results when run with >>>>>> different >>>>>> > number of processors. >>>>>> >>>>>> >>>>> Thanks for the reply. >>>>> >>>>> >>>>>> Uh, if you call rand() on different processors, why would you >>>>>> expect it >>>>>> to give the same results? >>>>>> >>>>>> Right, I get that. The rand() was a placeholder. >>>>> >>>>> This original much larger >>>>> code<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c>replicates >>>>> the same loop structure and runs the same Petsc subroutines, but >>>>> running it by >>>>> >>>>> mpirun -np $N ./eth -lattice_size 5 -vector_size 1 -repulsion 0.0 >>>>> -draw_out -draw_pause -1 >>>>> >>>>> with N=1,2,3,4 gives different results for the matrix dumped out by >>>>> lines >>>>> 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>. >>>>> The matrix itself is evaluated in parallel, created in lines 263-275 >>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#263>and >>>>> evaluated in lines >>>>> 294-356<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#294> >>>>> >>>>> (you can click on the line numbers above to navigate directly to them) >>>>> >>>>> Here is a sample <http://i43.tinypic.com/zyhf2f.jpg> of the output >>>>> of lines >>>>> 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514> >>>>> for N=1,2,3,4 procs left to right. >>>>> >>>>> Thty're different for different procs. They should be the same, since >>>>> none of my input parameters are numprocs dependent, and I don't explicitly >>>>> use the size or rank anywhere in the code. >>>>> >>>> >>>> You are likely not dividing the rows you loop over so you are >>>> redundantly computing. >>>> >>> >>> Thanks for the reply. >>> >>> Line >>> 274<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#274>gets >>> the local row indices of Petsc Matrix >>> AVG_BDIBJ >>> >>> Line 295 >>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#295>iterates >>> over the local rows and the lines below get the column >>> elements. For each row, the column elements are assigned by the lines up >>> to Line >>> 344<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#344>and >>> stored locally in colvalues[]. Dunno if the details are relevant. >>> >>> Line >>> 347<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#347>inserts >>> the sitestride1^th row into the matrix >>> >>> Line >>> 353+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#353>does >>> the mat assembly >>> >>> Then, after a lot of currently irrelevant code, >>> >>> Line >>> 514+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>dumps >>> the mat plot to graphics >>> >>> >>> Different numprocs give different matrices. >>> >>> Can somebody suggest what I did wrong (or didn't do)? >>> >> >> Different values are being given to MatSetValues() for different numbers >> of processes. So >> >> 1) Reduce this to the smallest problem size possible >> >> 2) Print out all rows/cols/values for each call >> >> 3) Compare 2 procs to the serial case >> >> > > Thanks for your excellent suggestion. > > I modified my > code<https://code.google.com/p/daneelrepo/source/diff?spec=svn1435&r=1435&format=side&path=/eth_question/eth.c>to > dump the matrix in binary > > Then I used this python script I > had<https://code.google.com/p/daneelrepo/source/browse/eth_question/mat_bin2ascii.py>to > convert to ascii > Do not print the matrix, print the data you are passing to MatSetValues(). MatSetValues() is not likely to be broken. Every PETSc code in the world calls this many times on every simulation. Matt > > Here are the values of > <http://pastebin.ca/2457842>AVG_BDIBJ<http://pastebin.ca/2457842>, > a 9X9 matrix (the smallest possible problem size) run with the exact same > input parameters with 1,2,3 and 4 procs > > As you can see, the 1 and 2 procs match up, but the 3 and 4 procs do not. > > Serious wierdness. > > > >> Matt >> >> >>> >>>> Matt >>>> >>>> >>>>> >>>>> >>>>>> for (sitestride1 = Istart; sitestride1 < Iend; sitestride1++) >>>>>> { >>>>>> for (sitestride2 = 0; sitestride2 < matsize; sitestride2++) >>>>>> { >>>>>> for (alpha = 0; alpha < dim; alpha++) >>>>>> { >>>>>> for (mu = 0; mu < dim; mu++) >>>>>> for (lambda = 0; lambda < dim; lambda++) >>>>>> { >>>>>> vecval = rand () / rand (); >>>>>> } >>>>>> >>>>>> VecSetValue (BDB_AA, alpha, vecval, INSERT_VALUES); >>>>>> >>>>>> } >>>>>> VecAssemblyBegin (BDB_AA); >>>>>> VecAssemblyEnd (BDB_AA); >>>>>> VecSum (BDB_AA, &element); >>>>>> colvalues[sitestride2] = element; >>>>>> >>>>>> } >>>>>> //Insert the array of colvalues to the sitestride1^th row of H >>>>>> MatSetValues (AVG_BDIBJ, 1, &sitestride1, matsize, idx, >>>>>> colvalues, >>>>>> INSERT_VALUES); >>>>>> >>>>>> } >>>>>> >>>>>> > The code is large and complex, so I have created a smaller program >>>>>> > with the same >>>>>> > loop structure here. <http://pastebin.ca/2457643> >>>>>> > >>>>>> > Compile it and run it with "mpirun -np $N ./test -draw_pause -1" >>>>>> gives >>>>>> > different results for different values of N even though it's not >>>>>> supposed >>>>>> > to. >>>>>> >>>>>> What do you expect to see? >>>>>> >>>>>> > Here is a sample output <http://i42.tinypic.com/2s16ccw.jpg> for >>>>>> N=1,2,3,4 >>>>>> > from left to right. >>>>>> > >>>>>> > Can anyone guide me as to what I'm doing wrong? Are any of the >>>>>> petssc >>>>>> > routines used not parallelizable? >>>>>> > >>>>>> > Thanks in advance, >>>>>> > >>>>>> > Regards. >>>>>> > >>>>>> > -- >>>>>> > --- >>>>>> > *Analabha Roy* >>>>>> > C.S.I.R <http://www.csir.res.in> Senior Research >>>>>> > Associate<http://csirhrdg.res.in/poolsra.htm> >>>>>> > Saha Institute of Nuclear Physics <http://www.saha.ac.in> >>>>>> > Section 1, Block AF >>>>>> > Bidhannagar, Calcutta 700064 >>>>>> > India >>>>>> > *Emails*: [email protected], [email protected] >>>>>> > *Webpage*: http://www.ph.utexas.edu/~daneel/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> --- >>>>> *Analabha Roy* >>>>> C.S.I.R <http://www.csir.res.in> Senior Research >>>>> Associate<http://csirhrdg.res.in/poolsra.htm> >>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in> >>>>> Section 1, Block AF >>>>> Bidhannagar, Calcutta 700064 >>>>> India >>>>> *Emails*: [email protected], [email protected] >>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/ >>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >>> >>> -- >>> --- >>> *Analabha Roy* >>> C.S.I.R <http://www.csir.res.in> Senior Research >>> Associate<http://csirhrdg.res.in/poolsra.htm> >>> Saha Institute of Nuclear Physics <http://www.saha.ac.in> >>> Section 1, Block AF >>> Bidhannagar, Calcutta 700064 >>> India >>> *Emails*: [email protected], [email protected] >>> *Webpage*: http://www.ph.utexas.edu/~daneel/ >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > --- > *Analabha Roy* > C.S.I.R <http://www.csir.res.in> Senior Research > Associate<http://csirhrdg.res.in/poolsra.htm> > Saha Institute of Nuclear Physics <http://www.saha.ac.in> > Section 1, Block AF > Bidhannagar, Calcutta 700064 > India > *Emails*: [email protected], [email protected] > *Webpage*: http://www.ph.utexas.edu/~daneel/ > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
