OK thanks Barry!  I've attached source for a minimal example which has the
same structure as my problem.

On Thu, Nov 2, 2017 at 5:36 PM, Smith, Barry F. <[email protected]> wrote:

>
>   Please send your "simple" code that attempts to do the appropriate
> preallocation and we'll look at why it is failing.
>
>   Barry
>
>
> > On Nov 2, 2017, at 4:03 PM, zakaryah . <[email protected]> wrote:
> >
> > I ran MatView with the smallest possible grid size.  The elements which
> are nonzero after the first assembly but not after the matrix is created
> are exactly the couplings, i.e. row and column 0 off the diagonal, which
> I'm trying to preallocate, and eventually place in the second call to
> FormCoupleLocations, although right now it's only being called once - not
> sure why prealloc_only is set - maybe because I have
> MAT_NEW_NONZERO_ALLOCATION_ERR set to false?  I'm not sure why the Jbh
> terms result in so few additional mallocs - are columns allocated in chunks
> on the fly?  For example, when my DMDA size is 375, adding values to Jhb
> results in 375 additional mallocs, but adding values to Jbh only adds 25
> mallocs.
> >
> > I guess my FormCoupleLocations is pretty incomplete, because
> incrementing dnz and onz doesn't seem to have any effect.  I also don't
> know how this function should behave - as I said before, writing to dnz[i]
> and onz[i] where i goes from rstart to rstart+nrows-1 causes a segfault on
> processors >0, and if I inspect those values of dnz[i] and onz[i] by
> writing them to a file, they are nonsense when processor >0.  Are dnz and
> onz local to the processor, i.e. should my iterations go from 0 to
> nrows-1?  What is the general idea behind setting the structure from within
> FormCoupleLocations?  I'm currently not doing that at all.
> >
> > Thanks for all the help!
> >
> > On Thu, Nov 2, 2017 at 12:31 PM, Smith, Barry F. <[email protected]>
> wrote:
> >
> >
> > > On Nov 1, 2017, at 10:32 PM, zakaryah . <[email protected]> wrote:
> > >
> > > I worked on the assumptions in my previous email and I at least
> partially implemented the function to assign the couplings.  For row 0,
> which is the redundant field, I set dnz[0] to end-start, and onz[0] to the
> size of the matrix minus dnz[0].  For all other rows, I just increment the
> existing values of dnz[i] and onz[i], since the coupling to the redundant
> field adds one extra element beyond what's allocated for the DMDA stencil.
> >
> >   Sounds reasonable.
> > >
> > > I see in the source that the FormCoupleLocations function is called
> once if the DM has PREALLOC_ONLY set to true, but twice otherwise.  I
> assume that the second call is for setting the nonzero structure.
> >
> >   Yes
> > > Do I need to do this?
> >
> >    You probably should.
> >
> >    I would do a MatView() small DMDA on the matrix you obtain and then
> again after you add the numerical values. This will show you what values
> are not being properly allocated/put in when the matrix is created by the
> DM.
> > >
> > > In any case, something is still not right.  Even with the extra
> elements preallocated, the first assembly of the matrix is very slow.  I
> ran a test problem on a single process with -info, and got this:
> > >
> > > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0
> unneeded,1 used
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> > >
> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.
> > >
> > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using
> Inode routines
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space:
> 0 unneeded,629703 used
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> > >
> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.
> > >
> > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5.
> Using Inode routines
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0
> unneeded,1 used
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> > >
> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.
> > >
> > > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using
> Inode routines
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space:
> 0 unneeded,629703 used
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> > >
> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.
> > >
> > > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5.
> Using Inode routines
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space:
> 18018 unneeded,629704 used
> >
> >   Yes, really bad.
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> > >
> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> > >
> > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5.
> Using Inode routines
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446774208 30818112
> > >
> > > [0] DMGetDMSNES(): Creating new DMSNES
> > >
> > > [0] DMGetDMKSP(): Creating new DMKSP
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > >   0 SNES Function norm 2.302381528359e+00
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space:
> 126132 unneeded,647722 used
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 9610
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> > >
> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> > >
> > > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5.
> Using Inode routines
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space:
> 0 unneeded,647722 used
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues()
> is 0
> > >
> > > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> > >
> > > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> > >
> > >
> > >
> > > The 9610 mallocs during MatSetValues seem suspicious and are probably
> what's taking so long with larger problems.  601 of them are apparently in
> the call to set Jbh, and 9009 are in the call to set Jhb (b is the
> redundant field, h is the DMDA field).  If I run with more than one
> process, I get a segfault when a process which has rank greater than 0 sets
> dnz or onz in the FormCoupleLocations call.
> > >
> > >
> > > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . <[email protected]>
> wrote:
> > > Thanks Barry, that looks like exactly what I need.  I'm looking at
> pack.c and packm.c and I want to check my understanding of what my coupling
> function should do.  The relevant line in DMCreateMatrix_Composite_AIJ
> seems to be:
> > >
> > > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__
> nrows,__start,__end);
> > >
> > > and I infer that dnz and onz are the number of nonzero elements in the
> diagonal and off-diagonal submatrices, for each row of the DMComposite
> matrix.  I suppose I can just set each of these in a for loop, but can I
> use the arguments to FormCoupleLocations as the range for the loop?  Which
> ones - __rstart to __rstart+__nrows?  How can I determine the number of
> rows on each processor from within the function that I pass?  From the
> preallocation macros it looks like __start to __end describe the range of
> the columns of the diagonal submatrix - is that right?  It looks like the
> ranges will be specific to each processor.  Do I just set the values in dnz
> and onz, or do I need to reduce them?
> > >
> > > Thanks for all the help!  Maybe if I get things working I can carve
> out the core of the code to make an example program for
> DMRedundant/Composite.
> > >
> >
> >
>
>
#include "main.h"

static inline PetscInt DMDALocalIndex3D(DMDALocalInfo *info, PetscInt i, PetscInt j, PetscInt k, PetscInt c)
{return ((((k-info->gzs)*info->gym + (j-info->gys))*info->gxm + (i-info->gxs))*info->dof+c);}

PetscErrorCode FormInitialGuess(prms *prm, Vec X) {
  PetscInt ix, iy, iz;
  Pt3 ***h;
  Vec vbeta, vh;
  PetscScalar *beta;
  DMDALocalInfo info;

  DMCompositeGetAccess(prm->packer_da,X,&vbeta,&vh);
  VecGetArray(vbeta,&beta);
  DMDAVecGetArray(prm->dmda,vh,&h);
  DMDAGetLocalInfo(prm->dmda,&info);
  
  for (iz=info.zs;iz<info.zs+info.zm;iz++) {
    for (iy=info.ys;iy<info.ys+info.ym;iy++) {
      for (ix=info.xs;ix<info.xs+info.xm;ix++) {
	h[iz][iy][ix].x = 1;
	h[iz][iy][ix].y = 1;
	h[iz][iy][ix].z = 1;
      }
    }
  }

  if (prm->MPI_rank == 0) {
    beta[0] = 1;
  }
  
  VecRestoreArray(vbeta,&beta);
  DMDAVecRestoreArray(prm->dmda,vh,&h);
  DMCompositeRestoreAccess(prm->packer_da,X,&vbeta,&vh);

  return 0;
}

PetscErrorCode FormFunction(SNES snes, Vec X, Vec F, void *aux) {
  Pt3 ***h, ***Fh;
  Vec vbeta, vh, vFbeta, vFh;
  prms *prm;
  DMDALocalInfo info;
  PetscScalar *beta, *Fbeta, Fbeta_local;
  int ix, iy, iz;
  PetscErrorCode ierr;

  prm = (prms*)aux;

  DMCompositeGetLocalVectors(prm->packer_da,&vbeta,&vh);
  DMCompositeGetLocalVectors(prm->packer_da,&vFbeta,&vFh);
  DMCompositeScatter(prm->packer_da,X,vbeta,vh);
  VecGetArray(vbeta,&beta);
  VecGetArray(vFbeta,&Fbeta);
  DMDAVecGetArray(prm->dmda,vh,&h);
  DMDAVecGetArray(prm->dmda,vFh,&Fh);
  DMDAGetLocalInfo(prm->dmda,&info);

  Fbeta_local = 0;

  for (iz = info.zs; iz < info.zs+info.zm; iz++) {
    for (iy = info.ys; iy < info.ys+info.ym; iy++) {
      for (ix = info.xs; ix < info.xs+info.xm; ix++) {
	Fh[iz][iy][ix].x = h[iz][iy][ix].x + beta[0];
	Fh[iz][iy][ix].y = h[iz][iy][ix].y + beta[0];
	Fh[iz][iy][ix].z = h[iz][iy][ix].z + beta[0];

	Fbeta_local += h[iz][iy][ix].x + h[iz][iy][ix].y + h[iz][iy][ix].z;
      }
    }
  }

  ierr = MPI_Reduce(&Fbeta_local,Fbeta,1,MPI_DOUBLE,MPI_SUM,0,PETSC_COMM_WORLD);CHKERRQ(ierr);

  if (prm->MPI_rank==0) {
    Fbeta[0] += beta[0];
  }
  VecRestoreArray(vbeta,&beta);
  VecRestoreArray(vFbeta,&Fbeta);
  DMDAVecRestoreArray(prm->dmda,vh,&h);
  DMDAVecRestoreArray(prm->dmda,vFh,&Fh);
  DMCompositeGather(prm->packer_da,INSERT_VALUES,F,NULL,vFh);
  DMCompositeRestoreLocalVectors(prm->packer_da,&vbeta,&vh);
  DMCompositeRestoreLocalVectors(prm->packer_da,&vFbeta,&vFh);
  return 0;
}

PetscErrorCode FormJacobian(SNES snes, Vec X, Mat J, Mat jac, void *aux) {
  PetscScalar *beta, Jhh_vals[57],Jbb_val;
  PetscInt row, Jhh_cols[57];
  int ix, iy, iz, cur_col, cur_Jcol, cur_Jrow;
  Pt3 ***h;
  prms *prm;
  Vec vbeta, vh;
  Mat Jbb, Jbh, Jhb, Jhh;
  IS *is;
  DMDALocalInfo info;

  prm = (prms*)aux;

  DMCompositeGetLocalVectors(prm->packer_da,&vbeta,&vh);
  DMCompositeScatter(prm->packer_da, X, vbeta, vh);
  VecGetArray(vbeta,&beta);
  DMDAVecGetArray(prm->dmda,vh,&h);
  MatZeroEntries(jac);
  DMCompositeGetLocalISs(prm->packer_da,&is);
  MatGetLocalSubMatrix(jac,is[0],is[0],&Jbb);
  MatGetLocalSubMatrix(jac,is[0],is[1],&Jbh);
  MatGetLocalSubMatrix(jac,is[1],is[0],&Jhb);
  MatGetLocalSubMatrix(jac,is[1],is[1],&Jhh);

  DMDAGetLocalInfo(prm->dmda,&info);

  //first, calculate Jbb term on processor 0
  if (prm->MPI_rank == 0) {
    row = 0;
    Jbb_val = 1;
    MatSetValuesLocal(Jbb, 1, &row, 1, &row, &Jbb_val, INSERT_VALUES);
  }

  cur_Jcol = 0;
  cur_Jrow = 0;

  for (iz = info.zs; iz < info.zs+info.zm; iz++) {
    for (iy = info.ys; iy < info.ys+info.ym; iy++) {
      for (ix = info.xs; ix < info.xs+info.xm; ix++) {
	prm->Jcol[cur_Jcol] = DMDALocalIndex3D(&info, ix, iy, iz, 0);
	prm->Jcol_vals[cur_Jcol] = 1;
	cur_Jcol++;
	prm->Jcol[cur_Jcol] = DMDALocalIndex3D(&info, ix, iy, iz, 1);
	prm->Jcol_vals[cur_Jcol] = 1;
	cur_Jcol++;
	prm->Jcol[cur_Jcol] = DMDALocalIndex3D(&info, ix, iy, iz, 2);
	prm->Jcol_vals[cur_Jcol] = 1;
	cur_Jcol++;
	
	prm->Jrow[cur_Jrow] = DMDALocalIndex3D(&info, ix, iy, iz, 0);
	prm->Jrow_vals[cur_Jrow] = 1;
	cur_Jrow++;
	prm->Jrow[cur_Jrow] = DMDALocalIndex3D(&info, ix, iy, iz, 1);
	prm->Jrow_vals[cur_Jrow] = 1;
	cur_Jrow++;
	prm->Jrow[cur_Jrow] = DMDALocalIndex3D(&info, ix, iy, iz, 2);
	prm->Jrow_vals[cur_Jrow] = 1;
	cur_Jrow++;

	//Jhh terms
	//hx terms start here
	row = DMDALocalIndex3D(&info, ix, iy, iz, 0);

	cur_col = 0;
	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 0);

	Jhh_vals[cur_col] = 1;
	cur_col++;

	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 1);
	Jhh_vals[cur_col] = 0;
	cur_col++;
	
	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 2);
	Jhh_vals[cur_col] = 0;
	cur_col++;

	if (ix > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  if (iy > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 0);
	    Jhh_vals[cur_col] =0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iy < prm->H-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz < prm->D-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (ix < prm->W-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  if (iy > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iy < prm->H-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  
	  if (iz > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz < prm->D-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }	  
	}
	if (iy > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  if (iz > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz < prm->D-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (iy < prm->H-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  if (iz > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz < prm->D-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (iz > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	}
	if (iz < prm->D-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	}
	MatSetValuesLocal(Jhh, 1, &row, cur_col, Jhh_cols, Jhh_vals, INSERT_VALUES);

	//hy terms start here
	row = DMDALocalIndex3D(&info, ix, iy, iz, 1);

	cur_col = 0;

	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 1);
        Jhh_vals[cur_col] = 1;
	cur_col++;

	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 0);
	Jhh_vals[cur_col] = 0;
	cur_col++;
	
	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 2);
	Jhh_vals[cur_col] = 0;
	cur_col++;

	if (iy > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  if (ix > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (ix < prm->W-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz < prm->D-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (iy < prm->H-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  
	  if (ix > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (ix < prm->W-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz < prm->D-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (ix > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  if (iz > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz < prm->D-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (ix < prm->W-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  if (iz > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iz < prm->D-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (iz > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	}
	if (iz < prm->D-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	}
	MatSetValuesLocal(Jhh, 1, &row, cur_col, Jhh_cols, Jhh_vals, INSERT_VALUES);

	//hz terms start here
	row = DMDALocalIndex3D(&info, ix, iy, iz, 2);

	cur_col = 0;

	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 2);
	Jhh_vals[cur_col] = 1;
	cur_col++;

	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 1);
	Jhh_vals[cur_col] = 0;
	cur_col++;
	
	Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz, 0);
	Jhh_vals[cur_col] = 0;
	cur_col++;

	if (iz > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 2);

	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz-1, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  if (iy > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iy < prm->H-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (ix > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (ix < prm->W-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz-1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (iz < prm->D-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy, iz+1, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  if (iy > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (iy < prm->H-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (ix > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (ix < prm->W-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz+1, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (iy > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy-1, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  if (ix > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy-1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (ix < prm->W-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy-1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (iy < prm->H-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix, iy+1, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	  if (ix > 0) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy+1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	  if (ix < prm->W-1) {
	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 2);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 1);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;

	    Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy+1, iz, 0);
	    Jhh_vals[cur_col] = 0;
	    cur_col++;
	  }
	}
	if (ix > 0) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix-1, iy, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	}
	if (ix < prm->W-1) {
	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 2);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 1);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;

	  Jhh_cols[cur_col] = DMDALocalIndex3D(&info, ix+1, iy, iz, 0);
	  Jhh_vals[cur_col] = 0;
	  cur_col++;
	}
	MatSetValuesLocal(Jhh, 1, &row, cur_col, Jhh_cols, Jhh_vals, INSERT_VALUES);
      }
    }
  }
  row = 0;
  MatSetValuesLocal(Jhb, cur_Jrow, prm->Jrow, 1, &row, prm->Jrow_vals, INSERT_VALUES);
  MatSetValuesLocal(Jbh, 1, &row, cur_Jcol, prm->Jcol, prm->Jcol_vals, INSERT_VALUES);

  MatRestoreLocalSubMatrix(jac,is[0],is[0],&Jbb);
  MatRestoreLocalSubMatrix(jac,is[0],is[1],&Jbh);
  MatRestoreLocalSubMatrix(jac,is[1],is[0],&Jhb);
  MatRestoreLocalSubMatrix(jac,is[1],is[1],&Jhh);
  ISDestroy(&is[0]);
  ISDestroy(&is[1]);
  PetscFree(is);
  VecRestoreArray(vbeta,&beta);
  DMDAVecRestoreArray(prm->dmda,vh,&h);
  DMCompositeRestoreLocalVectors(prm->packer_da,&vbeta,&vh);
  MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY);
  MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY);
  MatAssemblyBegin(J,MAT_FINAL_ASSEMBLY);
  MatAssemblyEnd(J,MAT_FINAL_ASSEMBLY);

  return 0;
}

PetscErrorCode FormCoupleLocations(DM dm, Mat mat, PetscInt *dnz, PetscInt *onz, PetscInt rstart, PetscInt nrows, PetscInt start, PetscInt end) {
  int row;
  prms *prm=NULL;

  DMGetApplicationContext(dm,&prm);

  if (mat) {
    //need to set nonzero structure here

  } else {    
    for (row = 0; row < nrows; row++) {
      if ((row == 0) && (rstart == 0)) {
	dnz[row] = end-start;
	onz[row] = 1+3*prm->W*prm->H*prm->D-dnz[row];
      } else {
	if (rstart == 0)
	  dnz[row]++;
	else
	  onz[row]++;
      }
    }
  }
  return 0;
}

static char help[] = "Simple example attempting to set coupling between redundant field and DMDA variables.\n\n";

int main(int argc, char **argv) {
  PetscErrorCode ierr;
  prms prm;

  PetscInitialize(&argc,&argv,NULL,help);

  prm.W = 3;
  prm.H = 3;
  prm.D = 3;

  //allocate the global vectors for column 0 of Jacobian
  ierr = PetscCalloc1(3*prm.W*prm.H*prm.D*sizeof(PetscInt),&(prm.Jcol));CHKERRQ(ierr);
  ierr = PetscCalloc1(3*prm.W*prm.H*prm.D*sizeof(PetscInt),&(prm.Jrow));CHKERRQ(ierr);
  ierr = PetscCalloc1(3*prm.W*prm.H*prm.D*sizeof(PetscScalar),&(prm.Jcol_vals));CHKERRQ(ierr);
  ierr = PetscCalloc1(3*prm.W*prm.H*prm.D*sizeof(PetscScalar),&(prm.Jrow_vals));CHKERRQ(ierr);
  
  //make the packer (i.e. the DM composite)
  ierr = DMCompositeCreate(PETSC_COMM_WORLD,&(prm.packer_da));CHKERRQ(ierr);
  ierr = DMSetApplicationContext(prm.packer_da,(void*)&prm);
  ierr = DMRedundantCreate(PETSC_COMM_WORLD,0,1,&(prm.red_da));CHKERRQ(ierr);
  ierr = DMCompositeAddDM(prm.packer_da,prm.red_da);CHKERRQ(ierr);
  
  ierr = DMDACreate3d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DMDA_STENCIL_BOX, prm.W, prm.H, prm.D,
		      PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, 3, 1, NULL, NULL, NULL, &(prm.dmda));CHKERRQ(ierr);
  ierr = DMSetUp(prm.dmda);CHKERRQ(ierr);
  ierr = DMCompositeAddDM(prm.packer_da,prm.dmda);
  
  ierr = DMCreateGlobalVector(prm.packer_da,&(prm.X));CHKERRQ(ierr);
  ierr = VecDuplicate(prm.X,&(prm.R));CHKERRQ(ierr);

  //set the routine to define the extra non-zero elements of the Jacobian arising from the coupling between the redundant field and the DMDA variables
  ierr = DMCompositeSetCoupling(prm.packer_da,FormCoupleLocations);

  ierr = DMCreateMatrix(prm.packer_da,&(prm.J));CHKERRQ(ierr);

  //It's still necessary to turn off new nonzero allocation errors because my coupling routine isn't working
  ierr = MatSetOption(prm.J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);

  ierr = SNESCreate(PETSC_COMM_WORLD,&(prm.snes));CHKERRQ(ierr);

  ierr = SNESSetFunction(prm.snes,prm.R,FormFunction,(void*)&prm);CHKERRQ(ierr);
  ierr = SNESSetJacobian(prm.snes,prm.J,prm.J,FormJacobian,(void*)&prm);CHKERRQ(ierr);

  ierr = SNESSetDM(prm.snes,prm.packer_da);CHKERRQ(ierr);

  ierr = SNESSetFromOptions(prm.snes);CHKERRQ(ierr);

  ierr = VecSet(prm.X,(PetscScalar)0);CHKERRQ(ierr);
  ierr = FormInitialGuess(&prm,prm.X);CHKERRQ(ierr);
  ierr = SNESSolve(prm.snes,NULL,prm.X);CHKERRQ(ierr);

  ierr = SNESDestroy(&(prm.snes));
  ierr = MatDestroy(&(prm.J));
  ierr = VecDestroy(&(prm.R));
  ierr = VecDestroy(&(prm.X));
  ierr = DMDestroy(&(prm.dmda));
  ierr = DMDestroy(&(prm.red_da));
  ierr = DMDestroy(&(prm.packer_da));

  PetscFinalize();
 
  return 0;
}
#include <petscdm.h>
#include <petscdmda.h>
#include <petscsys.h>
#include <petscsnes.h>
#include <petscdmredundant.h>
#include <petscdmcomposite.h>

typedef struct {
  PetscScalar x, y, z;
} Pt3;

typedef struct {
  int W, H, D, MPI_rank;
  PetscInt *Jcol, *Jrow;
  PetscScalar beta, *Jcol_vals, *Jrow_vals;
  Vec X, R;
  Mat J;
  DM packer_da, red_da, dmda;
  SNES snes;
} prms;

PetscErrorCode FormInitialGuess(prms *prm, Vec X);
PetscErrorCode FormFunction(SNES snes, Vec X, Vec F, void *aux);
PetscErrorCode FormJacobian(SNES snes, Vec X, Mat J, Mat jac, void *aux);
PetscErrorCode FormCoupleLocations(DM dm, Mat mat, PetscInt *dnz, PetscInt *onz, PetscInt rstart, PetscInt nrows, PetscInt start, PetscInt end);

Reply via email to