Hi, Sorry for misunderstanding
I modified my source thus <http://pastebin.ca/2457850> so that the rows/cols/values for each call are printed before inserting into MatSetValues() Then ran it with 1,2 processors Here are the outputs <http://pastebin.ca/2457852> Strange! Running it with 2 procs and only half the values show up!!!!! And even those do not match!!!! On Tue, Sep 24, 2013 at 11:12 PM, Matthew Knepley <[email protected]> wrote: > On Tue, Sep 24, 2013 at 10:39 AM, Analabha Roy <[email protected]>wrote: > >> Hi, >> >> >> >> On Tue, Sep 24, 2013 at 9:33 PM, Matthew Knepley <[email protected]>wrote: >> >>> On Tue, Sep 24, 2013 at 8:35 AM, Analabha Roy <[email protected]>wrote: >>> >>>> >>>> >>>> >>>> On Tue, Sep 24, 2013 at 8:41 PM, Matthew Knepley <[email protected]>wrote: >>>> >>>>> On Tue, Sep 24, 2013 at 8:08 AM, Analabha Roy >>>>> <[email protected]>wrote: >>>>> >>>>>> >>>>>> On Tue, Sep 24, 2013 at 1:42 PM, Jed Brown <[email protected]>wrote: >>>>>> >>>>>>> Analabha Roy <[email protected]> writes: >>>>>>> >>>>>>> > Hi all, >>>>>>> > >>>>>>> > >>>>>>> > Compiling and running this >>>>>>> > code< >>>>>>> https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c >>>>>>> >that >>>>>>> > builds a petsc matrix gives different results when run with >>>>>>> different >>>>>>> > number of processors. >>>>>>> >>>>>>> >>>>>> Thanks for the reply. >>>>>> >>>>>> >>>>>>> Uh, if you call rand() on different processors, why would you >>>>>>> expect it >>>>>>> to give the same results? >>>>>>> >>>>>>> Right, I get that. The rand() was a placeholder. >>>>>> >>>>>> This original much larger >>>>>> code<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c>replicates >>>>>> the same loop structure and runs the same Petsc subroutines, but >>>>>> running it by >>>>>> >>>>>> mpirun -np $N ./eth -lattice_size 5 -vector_size 1 -repulsion 0.0 >>>>>> -draw_out -draw_pause -1 >>>>>> >>>>>> with N=1,2,3,4 gives different results for the matrix dumped out by >>>>>> lines >>>>>> 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>. >>>>>> The matrix itself is evaluated in parallel, created in lines 263-275 >>>>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#263>and >>>>>> evaluated in lines >>>>>> 294-356<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#294> >>>>>> >>>>>> (you can click on the line numbers above to navigate directly to them) >>>>>> >>>>>> Here is a sample <http://i43.tinypic.com/zyhf2f.jpg> of the output >>>>>> of lines >>>>>> 514-519<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514> >>>>>> for N=1,2,3,4 procs left to right. >>>>>> >>>>>> Thty're different for different procs. They should be the same, since >>>>>> none of my input parameters are numprocs dependent, and I don't >>>>>> explicitly >>>>>> use the size or rank anywhere in the code. >>>>>> >>>>> >>>>> You are likely not dividing the rows you loop over so you are >>>>> redundantly computing. >>>>> >>>> >>>> Thanks for the reply. >>>> >>>> Line >>>> 274<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#274>gets >>>> the local row indices of Petsc Matrix >>>> AVG_BDIBJ >>>> >>>> Line 295 >>>> <https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#295>iterates >>>> over the local rows and the lines below get the column >>>> elements. For each row, the column elements are assigned by the lines >>>> up to Line >>>> 344<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#344>and >>>> stored locally in colvalues[]. Dunno if the details are relevant. >>>> >>>> Line >>>> 347<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#347>inserts >>>> the sitestride1^th row into the matrix >>>> >>>> Line >>>> 353+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#353>does >>>> the mat assembly >>>> >>>> Then, after a lot of currently irrelevant code, >>>> >>>> Line >>>> 514+<https://code.google.com/p/daneelrepo/source/browse/eth_question/eth.c#514>dumps >>>> the mat plot to graphics >>>> >>>> >>>> Different numprocs give different matrices. >>>> >>>> Can somebody suggest what I did wrong (or didn't do)? >>>> >>> >>> Different values are being given to MatSetValues() for different numbers >>> of processes. So >>> >>> 1) Reduce this to the smallest problem size possible >>> >>> 2) Print out all rows/cols/values for each call >>> >>> 3) Compare 2 procs to the serial case >>> >>> >> >> Thanks for your excellent suggestion. >> >> I modified my >> code<https://code.google.com/p/daneelrepo/source/diff?spec=svn1435&r=1435&format=side&path=/eth_question/eth.c>to >> dump the matrix in binary >> >> Then I used this python script I >> had<https://code.google.com/p/daneelrepo/source/browse/eth_question/mat_bin2ascii.py>to >> convert to ascii >> > > Do not print the matrix, print the data you are passing to MatSetValues(). > > MatSetValues() is not likely to be broken. Every PETSc code in the world > calls this many times on every simulation. > > Matt > > >> >> Here are the values of >> <http://pastebin.ca/2457842>AVG_BDIBJ<http://pastebin.ca/2457842>, >> a 9X9 matrix (the smallest possible problem size) run with the exact same >> input parameters with 1,2,3 and 4 procs >> >> As you can see, the 1 and 2 procs match up, but the 3 and 4 procs do not. >> >> Serious wierdness. >> >> >> >>> Matt >>> >>> >>>> >>>>> Matt >>>>> >>>>> >>>>>> >>>>>> >>>>>>> for (sitestride1 = Istart; sitestride1 < Iend; sitestride1++) >>>>>>> { >>>>>>> for (sitestride2 = 0; sitestride2 < matsize; sitestride2++) >>>>>>> { >>>>>>> for (alpha = 0; alpha < dim; alpha++) >>>>>>> { >>>>>>> for (mu = 0; mu < dim; mu++) >>>>>>> for (lambda = 0; lambda < dim; lambda++) >>>>>>> { >>>>>>> vecval = rand () / rand (); >>>>>>> } >>>>>>> >>>>>>> VecSetValue (BDB_AA, alpha, vecval, INSERT_VALUES); >>>>>>> >>>>>>> } >>>>>>> VecAssemblyBegin (BDB_AA); >>>>>>> VecAssemblyEnd (BDB_AA); >>>>>>> VecSum (BDB_AA, &element); >>>>>>> colvalues[sitestride2] = element; >>>>>>> >>>>>>> } >>>>>>> //Insert the array of colvalues to the sitestride1^th row of H >>>>>>> MatSetValues (AVG_BDIBJ, 1, &sitestride1, matsize, idx, >>>>>>> colvalues, >>>>>>> INSERT_VALUES); >>>>>>> >>>>>>> } >>>>>>> >>>>>>> > The code is large and complex, so I have created a smaller program >>>>>>> > with the same >>>>>>> > loop structure here. <http://pastebin.ca/2457643> >>>>>>> > >>>>>>> > Compile it and run it with "mpirun -np $N ./test -draw_pause -1" >>>>>>> gives >>>>>>> > different results for different values of N even though it's not >>>>>>> supposed >>>>>>> > to. >>>>>>> >>>>>>> What do you expect to see? >>>>>>> >>>>>>> > Here is a sample output <http://i42.tinypic.com/2s16ccw.jpg> for >>>>>>> N=1,2,3,4 >>>>>>> > from left to right. >>>>>>> > >>>>>>> > Can anyone guide me as to what I'm doing wrong? Are any of the >>>>>>> petssc >>>>>>> > routines used not parallelizable? >>>>>>> > >>>>>>> > Thanks in advance, >>>>>>> > >>>>>>> > Regards. >>>>>>> > >>>>>>> > -- >>>>>>> > --- >>>>>>> > *Analabha Roy* >>>>>>> > C.S.I.R <http://www.csir.res.in> Senior Research >>>>>>> > Associate<http://csirhrdg.res.in/poolsra.htm> >>>>>>> > Saha Institute of Nuclear Physics <http://www.saha.ac.in> >>>>>>> > Section 1, Block AF >>>>>>> > Bidhannagar, Calcutta 700064 >>>>>>> > India >>>>>>> > *Emails*: [email protected], [email protected] >>>>>>> > *Webpage*: http://www.ph.utexas.edu/~daneel/ >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> --- >>>>>> *Analabha Roy* >>>>>> C.S.I.R <http://www.csir.res.in> Senior Research >>>>>> Associate<http://csirhrdg.res.in/poolsra.htm> >>>>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in> >>>>>> Section 1, Block AF >>>>>> Bidhannagar, Calcutta 700064 >>>>>> India >>>>>> *Emails*: [email protected], [email protected] >>>>>> *Webpage*: http://www.ph.utexas.edu/~daneel/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>> >>>> >>>> >>>> -- >>>> --- >>>> *Analabha Roy* >>>> C.S.I.R <http://www.csir.res.in> Senior Research >>>> Associate<http://csirhrdg.res.in/poolsra.htm> >>>> Saha Institute of Nuclear Physics <http://www.saha.ac.in> >>>> Section 1, Block AF >>>> Bidhannagar, Calcutta 700064 >>>> India >>>> *Emails*: [email protected], [email protected] >>>> *Webpage*: http://www.ph.utexas.edu/~daneel/ >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> >> >> -- >> --- >> *Analabha Roy* >> C.S.I.R <http://www.csir.res.in> Senior Research >> Associate<http://csirhrdg.res.in/poolsra.htm> >> Saha Institute of Nuclear Physics <http://www.saha.ac.in> >> Section 1, Block AF >> Bidhannagar, Calcutta 700064 >> India >> *Emails*: [email protected], [email protected] >> *Webpage*: http://www.ph.utexas.edu/~daneel/ >> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- --- *Analabha Roy* C.S.I.R <http://www.csir.res.in> Senior Research Associate<http://csirhrdg.res.in/poolsra.htm> Saha Institute of Nuclear Physics <http://www.saha.ac.in> Section 1, Block AF Bidhannagar, Calcutta 700064 India *Emails*: [email protected], [email protected] *Webpage*: http://www.ph.utexas.edu/~daneel/
