git branch --contains barry/2019-09-01/robustify-version-check
balay/jed-gitlab-ci
master
Make a new branch from your current branch, add like -feature-sf-on-gpu to
the end of the name and merge in jczhang/feature-sf-on-gpu
configure and test with that.
Barry
> On Sep
Junchao and Barry,
I am using mark/fix-cuda-with-gamg-pintocpu, which is built on barry's
robustify branch. Is this in master yet? If so, I'd like to get my branch
merged to master, then merge Junchao's branch. Then us it.
I think we were waiting for some refactoring from Karl to proceed.
On Sat, Aug 31, 2019 at 8:04 PM Mark Adams
mailto:mfad...@lbl.gov>> wrote:
On Sat, Aug 31, 2019 at 4:28 PM Smith, Barry F.
mailto:bsm...@mcs.anl.gov>> wrote:
Any explanation for why the scaling is much better for CPUs and than GPUs? Is
it the "extra" time needed for communication from
On Sat, Aug 31, 2019 at 4:28 PM Smith, Barry F. wrote:
>
> Any explanation for why the scaling is much better for CPUs and than
> GPUs? Is it the "extra" time needed for communication from the GPUs?
>
The GPU work is well load balanced so it weak scales perfectly. When you
put that work in
Any explanation for why the scaling is much better for CPUs and than GPUs? Is
it the "extra" time needed for communication from the GPUs?
Perhaps you could try the GPU version with Junchao's new MPI-aware CUDA
branch (in the gitlab merge requests) that can speed up the communication
Ahh, PGI compiler, that explains it :-)
Ok, thanks. Don't worry about the runs right now. We'll figure out the fix.
The code is just
*a = (PetscReal)strtod(name,endptr);
could be a compiler bug.
> On Aug 14, 2019, at 9:23 PM, Mark Adams wrote:
>
> I am getting this error with
I am getting this error with single:
22:21 /gpfs/alpine/geo127/scratch/adams$ jsrun -n 1 -a 1 -c 1 -g 1
./ex56_single -cells 2,2,2 -ex56_dm_vec_type cuda -ex56_dm_mat_type
aijcusparse -fp_trap
[0] 81 global equations, 27 vertices
[0]PETSC ERROR: *** unknown floating point error occurred ***
Oh, doesn't even have to be that large. We just need to be able to look at
the flop rates (as a surrogate for run times) and compare with the previous
runs. So long as the size per process is pretty much the same that is good
enough.
Barry
> On Aug 14, 2019, at 8:45 PM, Mark Adams
I can run single, I just can't scale up. But I can use like 1500 processors.
On Wed, Aug 14, 2019 at 9:31 PM Smith, Barry F. wrote:
>
> Oh, are all your integers 8 bytes? Even on one node?
>
> Once Karl's new middleware is in place we should see about reducing to 4
> bytes on the GPU.
>
>
Oh, are all your integers 8 bytes? Even on one node?
Once Karl's new middleware is in place we should see about reducing to 4
bytes on the GPU.
Barry
> On Aug 14, 2019, at 7:44 PM, Mark Adams wrote:
>
> OK, I'll run single. It a bit perverse to run with 4 byte floats and 8 byte
OK, I'll run single. It a bit perverse to run with 4 byte floats and 8 byte
integers ... I could use 32 bit ints and just not scale out.
On Wed, Aug 14, 2019 at 6:48 PM Smith, Barry F. wrote:
>
> Mark,
>
>Oh, I don't even care if it converges, just put in a fixed number of
> iterations.
FYI, this test has a smooth (polynomial) body force and it runs a
convergence study.
On Wed, Aug 14, 2019 at 6:15 PM Brad Aagaard via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:
> Q2 is often useful in problems with body forces (such as gravitational
> body forces), which tend to have linear
"Smith, Barry F." writes:
>> On Aug 14, 2019, at 5:58 PM, Jed Brown wrote:
>>
>> "Smith, Barry F." writes:
>>
On Aug 14, 2019, at 2:37 PM, Jed Brown wrote:
Mark Adams via petsc-dev writes:
> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F.
> wrote:
>
> On Aug 14, 2019, at 5:58 PM, Jed Brown wrote:
>
> "Smith, Barry F." writes:
>
>>> On Aug 14, 2019, at 2:37 PM, Jed Brown wrote:
>>>
>>> Mark Adams via petsc-dev writes:
>>>
On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. wrote:
>
> Mark,
>
> Would you be
> On Aug 14, 2019, at 3:36 PM, Mark Adams wrote:
>
>
>
> On Wed, Aug 14, 2019 at 3:37 PM Jed Brown wrote:
> Mark Adams via petsc-dev writes:
>
> > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. wrote:
> >
> >>
> >> Mark,
> >>
> >>Would you be able to make one run using single
"Smith, Barry F." writes:
>> On Aug 14, 2019, at 2:37 PM, Jed Brown wrote:
>>
>> Mark Adams via petsc-dev writes:
>>
>>> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. wrote:
>>>
Mark,
Would you be able to make one run using single precision? Just single
> On Aug 14, 2019, at 2:37 PM, Jed Brown wrote:
>
> Mark Adams via petsc-dev writes:
>
>> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. wrote:
>>
>>>
>>> Mark,
>>>
>>> Would you be able to make one run using single precision? Just single
>>> everywhere since that is all we support
Mark,
Oh, I don't even care if it converges, just put in a fixed number of
iterations. The idea is to just get a baseline of the possible improvement.
ECP is literally dropping millions into research on "multi precision"
computations on GPUs, we need to have some actual numbers for
Here is the times for KSPSolve on one node with 2,280,285 equations. These
nodes seem to have 42 cores. There are 6 "devices" (GPUs) and 7 core
attached to the device. The anomalous 28 core result could be from only
using 4 "devices". I figure I will use 36 cores for now. I should really
do this
Brad Aagaard via petsc-dev writes:
> Q2 is often useful in problems with body forces (such as gravitational
> body forces), which tend to have linear variations in stress.
It's similar on the free-surface Stokes side, where pressure has a
linear gradient and must be paired with a stable
Q2 is often useful in problems with body forces (such as gravitational
body forces), which tend to have linear variations in stress.
On 8/14/19 2:51 PM, Mark Adams via petsc-dev wrote:
Do you have any applications that specifically want Q2 (versus Q1)
elasticity or have some test
>
>
>
> Do you have any applications that specifically want Q2 (versus Q1)
> elasticity or have some test problems that would benefit?
>
>
No, I'm just trying to push things.
Mark Adams writes:
> On Wed, Aug 14, 2019 at 3:37 PM Jed Brown wrote:
>
>> Mark Adams via petsc-dev writes:
>>
>> > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F.
>> wrote:
>> >
>> >>
>> >> Mark,
>> >>
>> >>Would you be able to make one run using single precision? Just single
>> >>
On Wed, Aug 14, 2019 at 2:19 PM Smith, Barry F. wrote:
>
> Mark,
>
> This is great, we can study these for months.
>
> 1) At the top of the plots you say SNES but that can't be right, there is
> no way it is getting such speed ups for the entire SNES solve since the
> Jacobians are CPUs
On Wed, Aug 14, 2019 at 3:37 PM Jed Brown wrote:
> Mark Adams via petsc-dev writes:
>
> > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F.
> wrote:
> >
> >>
> >> Mark,
> >>
> >>Would you be able to make one run using single precision? Just single
> >> everywhere since that is all we
Mark Adams via petsc-dev writes:
> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. wrote:
>
>>
>> Mark,
>>
>>Would you be able to make one run using single precision? Just single
>> everywhere since that is all we support currently?
>>
>>
> Experience in engineering at least is single
On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. wrote:
>
> Mark,
>
>Would you be able to make one run using single precision? Just single
> everywhere since that is all we support currently?
>
>
Experience in engineering at least is single does not work for FE
elasticity. I have tried it
Mark,
Would you be able to make one run using single precision? Just single
everywhere since that is all we support currently?
The results will give us motivation (or anti-motivation) to have support for
running KSP (or PC (or Mat) in single precision while the simulation is
Mark,
This is great, we can study these for months.
1) At the top of the plots you say SNES but that can't be right, there is no
way it is getting such speed ups for the entire SNES solve since the Jacobians
are CPUs and take much of the time. Do you mean the KSP part of the SNES
>
>
> 3) Is comparison between pointers appropriate? For example if (dptr !=
> zarray) { is scary if some arrays are zero length how do we know what the
> pointer value will be?
>
>
Yes, you need to consider these cases, which is kind of error prone.
Also, I think merging transpose,and not,is a
My concern is
1) is it actually optimally efficient for all cases? This kind of stuff, IMHO
if (yy) {
if (dptr != zarray) {
ierr = VecCopy_SeqCUDA(yy,zz);CHKERRQ(ierr);
} else if (zz != yy) {
ierr = VecAXPY_SeqCUDA(zz,1.0,yy);CHKERRQ(ierr);
}
} else
Yea, I agree. Once this is working, I'll go back and split MatMultAdd, etc.
On Wed, Jul 10, 2019 at 11:16 AM Smith, Barry F. wrote:
>
>In the long run I would like to see smaller specialized chunks of code
> (with a bit of duplication between them) instead of highly overloaded
> routines
In the long run I would like to see smaller specialized chunks of code (with
a bit of duplication between them) instead of highly overloaded routines like
MatMultAdd_AIJCUSPARSE. Better 3 routines, for multiple alone, for multiple add
alone and for multiple add with sparse format. Trying
Thanks, you made several changes here, including switches with the
workvector size. I guess I should import this logic to the transpose
method(s), except for the yy==NULL branches ...
MatMult_ calls MatMultAdd with yy=0, but the transpose version have their
own code.
On Wed, Jul 10, 2019 at 1:13 AM Smith, Barry F. wrote:
>
> ierr = VecGetLocalSize(xx,);CHKERRQ(ierr);
> if (nt != A->rmap->n)
> SETERRQ2(PETSC_COMM_SELF,PETSC_ERR_ARG_SIZ,"Incompatible partition of A
> (%D) and xx (%D)",A->rmap->n,nt);
> ierr =
ierr = VecGetLocalSize(xx,);CHKERRQ(ierr);
if (nt != A->rmap->n)
SETERRQ2(PETSC_COMM_SELF,PETSC_ERR_ARG_SIZ,"Incompatible partition of A (%D)
and xx (%D)",A->rmap->n,nt);
ierr = VecScatterInitializeForGPU(a->Mvctx,xx);CHKERRQ(ierr);
ierr =
I am stumped with this GPU bug(s). Maybe someone has an idea.
I did find a bug in the cuda transpose mat-vec that cuda-memcheck detected,
but I still have differences between the GPU and CPU transpose mat-vec.
I've got it down to a very simple test: bicg/none on a tiny mesh with two
processors.
37 matches
Mail list logo