This is a very interesting thread because use of block matrix improves the performance of AMG a lot. In my case is the elasticity problem.
One more question I like to ask, which is more on the performance of the solver. That if I have a coupled problem, says the point block is [u_x u_y u_z p] in which entries of p block in stiffness matrix is in a much smaller scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? Also, is there a utility in PETSc which does automatic scaling of variables? Giang On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang <[email protected]> wrote: > Okay that makes sense, thanks > > On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <[email protected]> wrote: > >> >> > On Jan 13, 2016, at 10:24 PM, Justin Chang <[email protected]> wrote: >> > >> > Thanks Barry, >> > >> > 1) So for block matrices, the ja array is smaller. But what's the >> "hardware" explanation for this performance improvement? Does it have to do >> with spatial locality where you are more likely to reuse data in that ja >> array, or does it have to do with the fact that loading/storing smaller >> arrays are less likely to invoke a cache miss, thus reducing the amount of >> bandwidth? >> >> There are two distinct reasons for the improvement: >> >> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" >> savings is that you have to load something that is much smaller than >> before. Cache/spatial locality have nothing to do with this particular >> improvement. >> >> 2) The other improvement comes from the reuse of each x[j] value >> multiplied by 5 values (a column) of the little block. The hardware >> explanation is that x[j] can be reused in a register for the 5 multiplies >> (while otherwise it would have to come from cache to register 5 times and >> sometimes might even have been flushed from the cache so would have to come >> from memory). This is why we have code like >> >> for (j=0; j<n; j++) { >> xb = x + 5*(*idx++); >> x1 = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4]; >> sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3 + v[15]*x4 + v[20]*x5; >> sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3 + v[16]*x4 + v[21]*x5; >> sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3 + v[17]*x4 + v[22]*x5; >> sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3 + v[18]*x4 + v[23]*x5; >> sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3 + v[19]*x4 + v[24]*x5; >> v += 25; >> } >> >> to do the block multiple. >> >> > >> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation >> of more than one dof per point) then using the BAIJ format is highly >> advisable. But if I want to form a nested matrix, say I am solving Stokes >> equation, then each "submatrix" is of AIJ format? Can these sub matrices >> also be BAIJ? >> >> Sure, but if you have separated all the variables of pressure, >> velocity_x, velocity_y, etc into there own regions of the vector then the >> block size for the sub matrices would be 1 so BAIJ does not help. >> >> There are Stokes solvers that use Vanka smoothing that keep the >> variables interlaced and hence would use BAIJ and NOT use fieldsplit >> >> >> > >> > Thanks, >> > Justin >> > >> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <[email protected]> >> wrote: >> > >> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <[email protected]> >> wrote: >> > > >> > > Hi all, >> > > >> > > 1) I am guessing MATMPIBAIJ could theoretically have better >> performance than simply using MATMPIAIJ. Why is that? Is it similar to the >> reasoning that block (dense) matrix-vector multiply is "faster" than simple >> matrix-vector? >> > >> > See for example table 1 in >> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf >> > >> > > >> > > 2) I am looking through the manual and online documentation and it >> seems the term "block" used everywhere. In the section on "block matrices" >> (3.1.3 of the manual), it refers to field splitting, where you could either >> have a monolithic matrix or a nested matrix. Does that concept have >> anything to do with MATMPIBAIJ? >> > >> > Unfortunately the numerical analysis literature uses the term block >> in multiple ways. For small blocks, sometimes called "point-block" with >> BAIJ and for very large blocks (where the blocks are sparse themselves). I >> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. >> > > >> > > It makes sense to me that one could create a BAIJ where if you have 5 >> dofs of the same type of physics (e.g., five different primary species of a >> geochemical reaction) per grid point, you could create a block size of 5. >> And if you have different physics (e.g., velocity and pressure) you would >> ideally want to separate them out (i.e., nested matrices) for better >> preconditioning. >> > >> > Sometimes you put them together with BAIJ and sometimes you keep >> them separate with nested matrices. >> > >> > > >> > > Thanks, >> > > Justin >> > >> > >> >> >
