Re: [petsc-dev] Apply for Google Summer of Code 2023?

2023-02-22 Thread Karl Rupp

Dear all,

unfortunately our application for the Google Summer of Code 2023 got 
rejected. I haven't received any feedback on the reasons yet; however, 
looking at our GSoC ideas list I can see that we haven't done a good 
enough job to describe our GSoC-projects.


Well, we can take this as input for a better application next year :-)

Best regards,
Karli


On 2/7/23 18:37, Karl Rupp wrote:

Dear all,

thanks for all the input and help. Our application has been submitted, 
let's keep our fingers crossed.


Also, this is a friendly reminder to fill out the details on the 
GSoC-topics:

  https://gitlab.com/petsc/petsc/-/issues/?search=GSoC
Part of the evaluation is whether our ideas are properly communicated. :-)

Thanks and best regards,
Karli



On 2/6/23 20:24, Karl Rupp wrote:

Hello all,

thanks for proposing projects. I've created the suggestions so far as 
'issues' in the issue tracker on Gitlab, prefixed by 'GSoC:'. Please 
add a better description to your suggestions so that applicants get a 
better idea of what that project is all about and how to get started. :-)


Also, Satish, Junchao, Jed, and Matt should have received invitations 
to join the PETSc org for GSoC 2023. Please join today, as we need to 
apply by tomorrow (Tuesday) 18:00 UTC.


I've got one question regarding payment processing; since that is a 
bit sensitive, I'll send it to the private list petsc-maint.


Thanks and best regards,
Karli



On 2/4/23 20:46, Matthew Knepley wrote:
On Fri, Feb 3, 2023 at 6:28 PM Jed Brown <mailto:j...@jedbrown.org>> wrote:


    Thanks for proposing this. Some ideas:

    * DMPlex+libCEED automation
    * Pipelined Krylov methods using Rust async
    * Differentiable programming using Enzyme with PETSc


I like all those.

   Matt

    Karl Rupp mailto:r...@iue.tuwien.ac.at>> 
writes:


 > Dear PETSc developers,
 >
 > in order to attract students to PETSc development, I'm thinking
    about a
 > PETSc application for Google Summer of Code (GSoC) 2023:
 > https://summerofcode.withgoogle.com/programs/2023
    <https://summerofcode.withgoogle.com/programs/2023>
 >
 > The org application deadline is February 7, i.e. in 4 days. This
 > application is - roughly speaking - a form with a state of intent
    and a
 > justification why the project is a good fit for GSoC. I've done
    this in
 > the past (~2010-12) and can do the paperwork again this year.
 >
 > What is required:
 >   - PETSc developers, who are willing to act as mentors
    throughout the
 > program.
 >   - A few good project ideas (e.g. MATDENSE for GPUs) for
 > contributors/students to work on
 >
 > It used to be that new organizations will get at most 2 
contributor
 > slots assigned. That's fair, because one must not 
underestimate the

 > effort that goes into mentoring.
 >
 > Thoughts? Shall we apply (yes/no)? If yes, are you willing to be
    mentor?
 > The more mentors, the better; it underlines the importance of the
 > project and indicates that contributors will find a good 
environment.

 >
 > Thanks and best regards,
 > Karli



--
What most experimenters take for granted before they begin their 
experiments is infinitely more interesting than any results to which 
their experiments lead.

-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 
<http://www.cse.buffalo.edu/~knepley/>


Re: [petsc-dev] Apply for Google Summer of Code 2023?

2023-02-07 Thread Karl Rupp

Dear all,

thanks for all the input and help. Our application has been submitted, 
let's keep our fingers crossed.


Also, this is a friendly reminder to fill out the details on the 
GSoC-topics:

 https://gitlab.com/petsc/petsc/-/issues/?search=GSoC
Part of the evaluation is whether our ideas are properly communicated. :-)

Thanks and best regards,
Karli



On 2/6/23 20:24, Karl Rupp wrote:

Hello all,

thanks for proposing projects. I've created the suggestions so far as 
'issues' in the issue tracker on Gitlab, prefixed by 'GSoC:'. Please add 
a better description to your suggestions so that applicants get a better 
idea of what that project is all about and how to get started. :-)


Also, Satish, Junchao, Jed, and Matt should have received invitations to 
join the PETSc org for GSoC 2023. Please join today, as we need to apply 
by tomorrow (Tuesday) 18:00 UTC.


I've got one question regarding payment processing; since that is a bit 
sensitive, I'll send it to the private list petsc-maint.


Thanks and best regards,
Karli



On 2/4/23 20:46, Matthew Knepley wrote:
On Fri, Feb 3, 2023 at 6:28 PM Jed Brown <mailto:j...@jedbrown.org>> wrote:


    Thanks for proposing this. Some ideas:

    * DMPlex+libCEED automation
    * Pipelined Krylov methods using Rust async
    * Differentiable programming using Enzyme with PETSc


I like all those.

   Matt

    Karl Rupp mailto:r...@iue.tuwien.ac.at>> 
writes:


 > Dear PETSc developers,
 >
 > in order to attract students to PETSc development, I'm thinking
    about a
 > PETSc application for Google Summer of Code (GSoC) 2023:
 > https://summerofcode.withgoogle.com/programs/2023
    <https://summerofcode.withgoogle.com/programs/2023>
 >
 > The org application deadline is February 7, i.e. in 4 days. This
 > application is - roughly speaking - a form with a state of intent
    and a
 > justification why the project is a good fit for GSoC. I've done
    this in
 > the past (~2010-12) and can do the paperwork again this year.
 >
 > What is required:
 >   - PETSc developers, who are willing to act as mentors
    throughout the
 > program.
 >   - A few good project ideas (e.g. MATDENSE for GPUs) for
 > contributors/students to work on
 >
 > It used to be that new organizations will get at most 2 
contributor
 > slots assigned. That's fair, because one must not underestimate 
the

 > effort that goes into mentoring.
 >
 > Thoughts? Shall we apply (yes/no)? If yes, are you willing to be
    mentor?
 > The more mentors, the better; it underlines the importance of the
 > project and indicates that contributors will find a good 
environment.

 >
 > Thanks and best regards,
 > Karli



--
What most experimenters take for granted before they begin their 
experiments is infinitely more interesting than any results to which 
their experiments lead.

-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 
<http://www.cse.buffalo.edu/~knepley/>


Re: [petsc-dev] Apply for Google Summer of Code 2023?

2023-02-06 Thread Karl Rupp

Hello all,

thanks for proposing projects. I've created the suggestions so far as 
'issues' in the issue tracker on Gitlab, prefixed by 'GSoC:'. Please add 
a better description to your suggestions so that applicants get a better 
idea of what that project is all about and how to get started. :-)


Also, Satish, Junchao, Jed, and Matt should have received invitations to 
join the PETSc org for GSoC 2023. Please join today, as we need to apply 
by tomorrow (Tuesday) 18:00 UTC.


I've got one question regarding payment processing; since that is a bit 
sensitive, I'll send it to the private list petsc-maint.


Thanks and best regards,
Karli



On 2/4/23 20:46, Matthew Knepley wrote:
On Fri, Feb 3, 2023 at 6:28 PM Jed Brown <mailto:j...@jedbrown.org>> wrote:


Thanks for proposing this. Some ideas:

* DMPlex+libCEED automation
* Pipelined Krylov methods using Rust async
* Differentiable programming using Enzyme with PETSc


I like all those.

   Matt

Karl Rupp mailto:r...@iue.tuwien.ac.at>> writes:

 > Dear PETSc developers,
 >
 > in order to attract students to PETSc development, I'm thinking
about a
 > PETSc application for Google Summer of Code (GSoC) 2023:
 > https://summerofcode.withgoogle.com/programs/2023
<https://summerofcode.withgoogle.com/programs/2023>
 >
 > The org application deadline is February 7, i.e. in 4 days. This
 > application is - roughly speaking - a form with a state of intent
and a
 > justification why the project is a good fit for GSoC. I've done
this in
 > the past (~2010-12) and can do the paperwork again this year.
 >
 > What is required:
 >   - PETSc developers, who are willing to act as mentors
throughout the
 > program.
 >   - A few good project ideas (e.g. MATDENSE for GPUs) for
 > contributors/students to work on
 >
 > It used to be that new organizations will get at most 2 contributor
 > slots assigned. That's fair, because one must not underestimate the
 > effort that goes into mentoring.
 >
 > Thoughts? Shall we apply (yes/no)? If yes, are you willing to be
mentor?
 > The more mentors, the better; it underlines the importance of the
 > project and indicates that contributors will find a good environment.
 >
 > Thanks and best regards,
 > Karli



--
What most experimenters take for granted before they begin their 
experiments is infinitely more interesting than any results to which 
their experiments lead.

-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>


[petsc-dev] Apply for Google Summer of Code 2023?

2023-02-03 Thread Karl Rupp

Dear PETSc developers,

in order to attract students to PETSc development, I'm thinking about a 
PETSc application for Google Summer of Code (GSoC) 2023:

 https://summerofcode.withgoogle.com/programs/2023

The org application deadline is February 7, i.e. in 4 days. This 
application is - roughly speaking - a form with a state of intent and a 
justification why the project is a good fit for GSoC. I've done this in 
the past (~2010-12) and can do the paperwork again this year.


What is required:
 - PETSc developers, who are willing to act as mentors throughout the 
program.
 - A few good project ideas (e.g. MATDENSE for GPUs) for 
contributors/students to work on


It used to be that new organizations will get at most 2 contributor 
slots assigned. That's fair, because one must not underestimate the 
effort that goes into mentoring.


Thoughts? Shall we apply (yes/no)? If yes, are you willing to be mentor? 
The more mentors, the better; it underlines the importance of the 
project and indicates that contributors will find a good environment.


Thanks and best regards,
Karli


Re: [petsc-dev] empty space on left side of website pages

2021-04-27 Thread Karl Rupp

Hi,

such adjustments should not need a direct modification of the theme. One 
can just override the CSS settings in custom CSS files instead:

 https://docs.readthedocs.io/en/stable/guides/adding-custom-css.html

The benefit of such an approach is that all future updates of the theme 
will continue to work. Plus, one has all CSS-tweaks neatly collected in 
a small file (ideally just a few lines).


Best regards,
Karli

On 4/26/21 8:58 AM, Patrick Sanan wrote:
As far as I know (which isn't very far, with web stuff), changing things 
on that level requires somehow getting into CSS.


For instance, you can see what it looks like with other widths directly 
from Firefox (fun, didn't know you could do this):

- go to the page
- hit F12
- click around on the left to find the  that corresponds to the 
part you care about
- look in the middle column to find the piece of CSS that's controlling 
things (here, something called .col-md-3)
- edit the CSS - in attached screenshot I change the max width of that 
sidebar to 5%.


But, I want to avoid having to do things on the level of CSS and HTML - 
I think that should be done as a collective effort in maintaining the 
theme (and Sphinx itself).
If we really care enough about the width of that sidebar, we'll create a 
fork of the theme, add a setting for it, and try to get it merged to the 
theme's release branch.


Am 23.04.2021 um 23:12 schrieb Barry Smith >:



   Thanks. Even if we just leave it is there a way to make it a little 
"skinnier", it seems very wide in my default browser.




On Apr 23, 2021, at 1:08 PM, Patrick Sanan > wrote:


It is possible to put things there, as in this link which is both 
documentation and example:
https://pydata-sphinx-theme.readthedocs.io/en/latest/user_guide/sections.html#the-left-sidebar 



Other projects using this theme have the mostly-empty left sidebar:
https://numpy.org/doc/stable/ 
https://jupyter.readthedocs.io/en/latest/ 



(They also have fancier landing pages, though, which we have been 
discussing).



It goes away on mobile devices or small windows, at least.


Am 23.04.2021 um 19:21 schrieb Barry Smith >:



  There is a lot of empty space on the left side of the website 
pages; under the Search slot.  Does this empty left side need to be 
so large, seems to waste a lot of the screen?


  Barry









Re: [petsc-dev] PETSc issue I cannot post combine WaitForCUDA(); inside PetscLogGpuTimeEnd();

2020-08-28 Thread Karl Rupp



 Since we cannot post issues (reported here 
https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith) 
here is my issue so I don't forget it.

 I think
err  = WaitForCUDA();CHKERRCUDA(err);
ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
should be changed to include WaitForCUDA() actually WaitForDevice() 
inside the PetscLogGpuTimeEnd().
Currently sometimes the WaitForCUDA() is missing in a few places 
resulting in bad timing.
Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need 
to be fixed.

The current model is a maintenance nightmare.
Does anyone see a problem with making this change?


I'm fine with this change, as the maintenance benefits outweigh the 
performance cost for typical use cases.


I propose to also add the WaitForDevice(); at 
PetscLogGpuTimeBegin(). This will ensure that no previous GPU kernel 
executions spill over into the timed section.


   Karl,

    When synchronization is turned on the precious GPU kernels should 
always have their own WaitForDevice(), so are you concerned about buggy 
code that does not include WaitForDevice?


I'm primarily thinking of user callback routines here. For example, a 
FormFunction provided by the user that is running some GPU kernels. We 
have no guarantee that these user kernels have completed before entering 
the timed sections inside PETSc, so the logs will be skewed to report an 
unusually slow kernel in PETSc (the one right after the user form 
function). Arguably we could add a WaitForDevice() after user callback 
invocations.


I didn't think of the WaitForDevice() after each kernel call in PETSc; 
with that we do get reasonable timings within PETSc (except for the user 
callbacks mentioned above), so the two-barrier model is not needed.


Best regards,
Karli







 Might this incur an extra overhead checking the device? Or will it 
always be true that if there are no outstanding kernels it will not 
go to the GPU and the check will return immediately?


If we want to have a two barrier model, I propose we log the timing 
for waiting at the first barrier separately.


Barry



Best regards,
Karli






Re: [petsc-dev] PETSc issue I cannot post combine WaitForCUDA(); inside PetscLogGpuTimeEnd();

2020-08-28 Thread Karl Rupp

Hi,

   Since we cannot post issues (reported here 
https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith) 
here is my issue so I don't forget it.


   I think

  err  = WaitForCUDA();CHKERRCUDA(err);
  ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);

should be changed to include WaitForCUDA() actually WaitForDevice() 
inside the PetscLogGpuTimeEnd().


Currently sometimes the WaitForCUDA() is missing in a few places 
resulting in bad timing.


Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need to be 
fixed.


The current model is a maintenance nightmare.

Does anyone see a problem with making this change?


I'm fine with this change, as the maintenance benefits outweigh the 
performance cost for typical use cases.


I propose to also add the WaitForDevice(); at PetscLogGpuTimeBegin(). 
This will ensure that no previous GPU kernel executions spill over into 
the timed section.


Best regards,
Karli


Re: [petsc-dev] https://developer.nvidia.com/nccl

2020-06-16 Thread Karl Rupp
From a practical standpoint it seems to me that NCCL is an offering to 
a community that isn't used to MPI. It's categorized as 'Deep Learning 
Software' on the NVIDIA page ;-)


The section 'NCCL and MPI' has some interesting bits:
 https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html

At the bottom of the page there is
 "Using NCCL to perform inter-GPU communication concurrently with 
CUDA-aware MPI may create deadlocks. (...) Using both MPI and NCCL to 
perform transfers between the same sets of CUDA devices concurrently is 
therefore not guaranteed to be safe."


While I'm impressed that NVIDIA even 'reinvents' MPI for their GPUs to 
serve the deep learning community, I don't think NCCL provides enough 
beyond MPI for PETSc.


Best regards,
Karli





On 6/17/20 4:13 AM, Junchao Zhang wrote:
It should be renamed as NCL (NVIDIA Communications Library) as it adds 
point-to-point, in addition to collectives. I am not sure whether to 
implement it in petsc as none exscale machine uses nvidia GPUs.


--Junchao Zhang


On Tue, Jun 16, 2020 at 6:44 PM Matthew Knepley > wrote:


It would seem to make more sense to just reverse-engineering this as
another MPI impl.

    Matt

On Tue, Jun 16, 2020 at 6:22 PM Barry Smith mailto:bsm...@petsc.dev>> wrote:




-- 
What most experimenters take for granted before they begin their

experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/




Re: [petsc-dev] Valgrind MPI-Related Errors

2020-06-01 Thread Karl Rupp

Hi Jacob,

the recommendation in the past was to use MPICH as it is (was?) 
valgrind-clean. Which MPI do you use? OpenMPI used to have these kinds 
of issues. (My information might be outdated)


Best regards,
Karli

On 6/2/20 2:43 AM, Jacob Faibussowitsch wrote:

Hello All,

TL;DR: valgrind always complains about "Syscall param write(buf) points 
to uninitialised byte(s)” for a LOT of MPI operations in petsc code, 
making debugging using valgrind fairly annoying since I have to sort 
through a ton of unrelated stuff. I have built valgrind from source, 
used apt install valgrind, apt install valgrind-mpi to no avail.


I am using valgrind from docker. Dockerfile is attached below as well. I 
have been unsuccessfully trying to resolve these local valgrind errors, 
but I am running out of ideas. Googling the issue has also not provided 
entirely applicable solutions. Here is an example of the error:


$ make -f gmakefile test VALGRIND=1
...
#==54610== Syscall param write(buf) points to uninitialised byte(s)
#==54610==    at 0x6F63317: write (write.c:26)
#==54610==    by 0x9056AC9: MPIDI_CH3I_Sock_write (in 
/usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x9059FCD: MPIDI_CH3_iStartMsg (in 
/usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x903F298: MPIDI_CH3_EagerContigShortSend (in 
/usr/local/lib/libmpi.so.12.1.8)

#==54610==    by 0x9049479: MPID_Send (in /usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x8FC9B2A: MPIC_Send (in /usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x8F86F2E: MPIR_Bcast_intra_binomial (in 
/usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x8EE204E: MPIR_Bcast_intra_auto (in 
/usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x8EE21F4: MPIR_Bcast_impl (in 
/usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x8F887FB: MPIR_Bcast_intra_smp (in 
/usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x8EE206E: MPIR_Bcast_intra_auto (in 
/usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x8EE21F4: MPIR_Bcast_impl (in 
/usr/local/lib/libmpi.so.12.1.8)

#==54610==    by 0x8EE2A6F: PMPI_Bcast (in /usr/local/lib/libmpi.so.12.1.8)
#==54610==    by 0x4B377B8: PetscOptionsInsertFile (options.c:525)
#==54610==    by 0x4B39291: PetscOptionsInsert (options.c:672)
#==54610==    by 0x4B5B1EF: PetscInitialize (pinit.c:996)
#==54610==    by 0x10A6BA: main (ex9.c:75)
#==54610==  Address 0x1ffeffa944 is on thread 1's stack
#==54610==  in frame #3, created by MPIDI_CH3_EagerContigShortSend (???:)
#==54610==  Uninitialised value was created by a stack allocation
#==54610==    at 0x903F200: MPIDI_CH3_EagerContigShortSend (in 
/usr/local/lib/libmpi.so.12.1.8)


There are probably 20 such errors every single time, regardless of what 
code is being run. I have tried using apt install valgrind, apt install 
valgrind-mpi, and building valgrind from source:


# VALGRIND
WORKDIR /
RUN git clone git://sourceware.org/git/valgrind.git
WORKDIR /valgrind
RUN git pull
RUN ./autogen.sh
RUN ./configure --with-mpicc=/usr/local/bin/mpicc
RUN make -j 5
RUN make install

None of the those approaches lead to these errors disappearing. Perhaps 
I am missing some funky MPI args?


Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)
Cell: (312) 694-3391




Re: [petsc-dev] snes_tutorials-ex19_cuda_1

2020-04-01 Thread Karl Rupp
The fluctuations in this example have been fixed a few months ago; the 
issue was the use of multiple streams instead of a single one. Maybe 
additional CUDA streams have been reintroduced recently?


Best regards,
Karli


On 4/2/20 5:02 AM, Junchao Zhang wrote:

I could not reproduce it locally. Even in the CI, it is random.

--Junchao Zhang


On Wed, Apr 1, 2020 at 7:47 PM Matthew Knepley > wrote:


I saw Satish talking about this on the CI Tracker MR.

    Matt

On Wed, Apr 1, 2020 at 8:36 PM Lisandro Dalcin mailto:dalc...@gmail.com>> wrote:

Well, my request will not fix the problem:
https://gitlab.com/petsc/petsc/-/jobs/495147366#L5231

On Thu, 2 Apr 2020 at 03:26, Lisandro Dalcin mailto:dalc...@gmail.com>> wrote:

Can anyone messing with CPUs please update test
snes_tutorials-ex19_cuda_1 to use -ksp_monitor_short and
update its output with REPLACE=1 ?

Please do it in maint, or cherry-pick if already fixed in
master.

Regards,

-- 
Lisandro Dalcin


Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/



-- 
Lisandro Dalcin


Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/



-- 
What most experimenters take for granted before they begin their

experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/




[petsc-dev] No PETSc User Meeting in 2020

2020-03-14 Thread Karl Rupp

Dear PETSc developers and PETSc users,

due to the recent Covid-19 outbreak in Europe there will not be a PETSc 
User Meeting this year. We are looking into alternatives for keeping in 
touch with our user base, e.g. via webinars. Suggestions welcome :-)


Thanks and best regards,
Karl


Re: [petsc-dev] PETSc meeting not on web page ...

2020-02-24 Thread Karl Rupp

Hi Mark,

we are finalizing the registration system these days and announce soon. 
Better to have a fully populated webpage at the point of the 
announcement rather than giving a half-done impression if it's only a 
matter of a few days. :-)


Best regards,
Karli



On 2/23/20 6:32 AM, Mark Adams wrote:
Cool, One of our friends did not know about it because it was not on the 
web page.


No big deal, but it might be nice to put the place and date as soon as 
its set, with a note that more details are to come.


Thanks,
Mark
BTW, I wrote that kernel that we were discussing in CUDA (my first CUDA 
code). Thanks for the discussion at PP20.



On Sat, Feb 22, 2020 at 11:16 PM Karl Rupp <mailto:r...@iue.tuwien.ac.at>> wrote:


Hi Mark,

we are just finalizing the last few details (in particular: which
registration system to use) before sending out the announcements and
putting the link on the main webpage. Just a matter of a few days. :-)

Best regards,
Karli

On 2/22/20 9:11 PM, Mark Adams wrote:
 > Maybe the announcement is somewhere but its not on the first
Google hit
 > (https://www.mcs.anl.gov/petsc/). The logo should get on there also.
 >
 > (Patrick Farrell did not know about it)
 >
 > Thanks,
 > Mark



Re: [petsc-dev] Proper matrix size to choose when evaluating MatMult?

2020-02-22 Thread Karl Rupp

Hi Junchao,

I want to evaluate MatMult on GPU.  I took a 2M x 2M matrix and ran with 
6 mpi ranks and 6 GPUs.  It took about 0.9 seconds.  


How many nonzeros per row? With 0.9 seconds you should either have many 
runs of MatMult, or a fairly dense matrix; or a really slow MatMult 
kernel ;-)


A 2M-by-2M matrix for a 5-point stencil is probably still on the small 
side (I'm assuming that you run 2M-by-2M for *each* GPU), but should 
suffice. Expect that communication cost are significant (i.e. the 
bookkeeping and data exchange between GPUs is on the order of the costs 
for running the MatMult kernel for the respective diagonal block).



A kernel launch or 
a stream synchronization took about 10us.  Compared with MatMult, they 
are tiny. Does it mean we can ignore them?  What is a proper size to 
evaluate MatMult?  I heard it is a few thousand rows per MPI rank.  Why?


That would be a typical strong scaling limit for a CPU-based run a 
well-tuned BlueGene-type system. With GPUs you will probably need at 
least 100k unknowns (or ~1M nonzeros) per rank in the strong scaling 
limit. Add a factor of ~10 to make latency costs small in comparison.


Best regards,
Karli


Re: [petsc-dev] PETSc meeting not on web page ...

2020-02-22 Thread Karl Rupp

Hi Mark,

we are just finalizing the last few details (in particular: which 
registration system to use) before sending out the announcements and 
putting the link on the main webpage. Just a matter of a few days. :-)


Best regards,
Karli

On 2/22/20 9:11 PM, Mark Adams wrote:
Maybe the announcement is somewhere but its not on the first Google hit 
(https://www.mcs.anl.gov/petsc/). The logo should get on there also.


(Patrick Farrell did not know about it)

Thanks,
Mark


Re: [petsc-dev] First call to cudaMalloc or cudaFree is very slow on summit

2020-02-13 Thread Karl Rupp

Hi Hong,

have you tried running the code through gprof and look at the output 
(e.g. with kcachegrind)?


(apologies if this has been suggested already)

Best regards,
Karli



On 2/12/20 7:29 PM, Zhang, Hong via petsc-dev wrote:




On Feb 12, 2020, at 5:11 PM, Smith, Barry F.  wrote:


  ldd -o on the petsc program (static) and the non petsc program (static), what 
are the differences?


There is no difference in the outputs.



  nm -o both executables | grep cudaFree()


Non petsc program:

[hongzh@login3.summit tests]$ nm ex_simple | grep cudaFree
1ae0 t 0017.plt_call.cudaFree@@libcudart.so.10.1
  U cudaFree@@libcudart.so.10.1

Petsc program:

[hongzh@login3.summit tests]$ nm ex_simple_petsc | grep cudaFree
10016550 t 0017.plt_call.cudaFree@@libcudart.so.10.1
10017010 t 0017.plt_call.cudaFreeHost@@libcudart.so.10.1
124c3f48 V 
_ZGVZN6thrust2mr19get_global_resourceINS_26device_ptr_memory_resou
rceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_
8cuda_cub7pointerIvPT_vE8resource
124c3f50 V 
_ZGVZN6thrust2mr19get_global_resourceINS_6system4cuda6detail20cuda
_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEEPT_vE8r
esource
10726788 W 
_ZN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvE11do_allocateEmm
107267e8 W 
_ZN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvE13do_deallocateENS_10device_ptrIvEEmm
10726878 W 
_ZN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvED0Ev
10726848 W 
_ZN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvED1Ev
10729f78 W 
_ZN6thrust6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEEE11do_allocateEmm
1072a218 W 
_ZN6thrust6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEEE13do_deallocateES6_mm
1072a388 W 
_ZN6thrust6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEEED0Ev
1072a358 W 
_ZN6thrust6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEEED1Ev
12122300 V 
_ZTIN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEE
12122370 V 
_ZTIN6thrust6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIv
12122410 V 
_ZTSN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEE
121225f0 V 
_ZTSN6thrust6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIv
12120630 V 
_ZTVN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEE
121205b0 V 
_ZTVN6thrust6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIv
124c3f30 V 
_ZZN6thrust2mr19get_global_resourceINS_26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvPT_vE8resource
124c3f20 V 
_ZZN6thrust2mr19get_global_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL10cudaMallocEEXadL8cudaFreeEENS_8cuda_cub7pointerIvEEPT_vE8resource
  U cudaFree@@libcudart.so.10.1
  U cudaFreeHost@@libcudart.so.10.1

Hong








On Feb 12, 2020, at 1:51 PM, Munson, Todd via petsc-dev  
wrote:


There are some side effects when loading shared libraries, such as 
initializations of
static variables, etc.  Is something like that happening?

Another place is the initial runtime library that gets linked (libcrt0 maybe?). 
 I
think some MPI compilers insert their own version.

Todd.


On Feb 12, 2020, at 11:38 AM, Zhang, Hong via petsc-dev  
wrote:




On Feb 12, 2020, at 11:09 AM, Matthew Knepley  wrote:

On Wed, Feb 12, 2020 at 11:06 AM Zhang, Hong via petsc-dev 
 wrote:
Sorry for the long post. Here are replies I have got from OLCF so far. We still 
don’t know how to solve the problem.

One interesting thing that Tom noticed is PetscInitialize() may have called 
cudaFree(0) 32 times as NVPROF shows, and they all run very fast. These calls 
may be triggered by some other libraries like cublas. But if PETSc calls 
cudaFree() explicitly, it is always very slow.

It sounds really painful, but I would start removing lines from 
Pets

Re: [petsc-dev] Feed back on report on performance of vector operations on Summit requested

2019-10-09 Thread Karl Rupp via petsc-dev

Hi,

Table 2 reports negative latencies. This doesn't look right to me ;-)
If it's the outcome of a parameter fit to the performance model, then 
use a parameter name (e.g. alpha) instead of the term 'latency'.


Figure 11 has a very narrow range in the y-coordinate and thus 
exaggerates the variation greatly. "GPU performance" should be adjusted 
to something like "execution time" to explain the meaning of the y-axis.


Page 12: The latency for VecDot is higher than for VecAXPY because 
VecDot requires the result to be copied back to the host. This is an 
additional operation.


Regarding performance measurements: Did you synchronize after each 
kernel launch? I.e. did you run (approach A)

 for (many times) {
   synchronize();
   start_timer();
   kernel_launch();
   synchronize();
   stop_timer();
 }
and then take averages over the timings obtained, or did you (approach B)
 synchronize();
 start_timer();
 for (many times) {
   kernel_launch();
 }
 synchronize();
 stop_timer();
and then divide the obtained time by the number of runs?

Approach A will report a much higher latency than the latter, because 
synchronizations are expensive (i.e. your latency consists of kernel 
launch latency plus device synchronization latency). Approach B is 
slightly over-optimistic, but I've found it to better match what one 
observes for an algorithm involving several kernel launches.


Best regards,
Karli



On 10/10/19 12:34 AM, Smith, Barry F. via petsc-dev wrote:


    We've prepared a short report on the performance of vector 
operations on Summit and would appreciate any feed back including: 
inconsistencies, lack of clarity, incorrect notation or terminology, etc.


    Thanks

     Barry, Hannah, and Richard







Re: [petsc-dev] Why no SpGEMM support in AIJCUSPARSE and AIJVIENNACL?

2019-10-04 Thread Karl Rupp via petsc-dev

Hi Richard,


Do you have any experience with nsparse?

https://github.com/EBD-CREST/nsparse

I've seen claims that it is much faster than cuSPARSE for sparse
matrix-matrix products.


I haven't tried nsparse, no.

But since the performance comes from a hardware feature (cache), I 
would be surprised if there is a big performance leap over ViennaCL. 
(There's certainly some potential for some tweaking of ViennaCL's 
kernels; but note that even ViennaCL is much faster than cuSPARSE's 
spGEMM on average).


With the libaxb-wrapper we can just add nsparse as an operations 
backend and then easily try it out and compare against the other 
packages. In the end it doesn't matter which package provides the best 
performance; we just want to leverage it :-)
I'd be happy to add support for this (though I suppose I should play 
with it first to verify that it is, in fact, worthwhile). Karl, is your 
branch with libaxb ready for people to start using it, or should we wait 
for you to do more with it? (Or, would you like any help with it?)


I still need to add the matrix class to the merge request. Should only 
take me a couple of hours, but I've got an extremely important deadline 
on October 15 that will prevent me from doing anything before then.



I'd like to try to add support for a few things like cuSPARSE SpGEMM 
before I go to the Summit hackathon, but I don't want to write a bunch 
of code that will be thrown away once your libaxb approach is in place.


I should be able to provide a good playground on time for the Summit 
hackathon. In the meantime you can try the matrix market reader of 
nsparse directly and see what you get, especially compared to cuSPARSE 
and MKL.


Best regards,
Karli






Karl Rupp via petsc-dev  writes:


Hi Richard,

CPU spGEMM is about twice as fast even on the GPU-friendly case of a
single rank: 
http://viennacl.sourceforge.net/viennacl-benchmarks-spmm.html


I agree that it would be good to have a GPU-MatMatMult for the sake of
experiments. Under these performance constraints it's not top priority,
though.

Best regards,
Karli


On 10/3/19 12:00 AM, Mills, Richard Tran via petsc-dev wrote:

Fellow PETSc developers,

I am wondering why the AIJCUSPARSE and AIJVIENNACL matrix types do not
support the sparse matrix-matrix multiplication (SpGEMM, or 
MatMatMult()

in PETSc parlance) routines provided by cuSPARSE and ViennaCL,
respectively. Is there a good reason that I shouldn't add those? My
guess is that support was not added because SpGEMM is hard to do 
well on
a GPU compared to many CPUs (it is hard to compete with, say, Intel 
Xeon

CPUs with their huge caches) and it has been the case that one would
generally be better off doing these operations on the CPU. Since the
trend at the big supercomputing centers seems to be to put more and 
more
of the computational power into GPUs, I'm thinking that I should 
add the
option to use the GPU library routines for SpGEMM, though. Is there 
some
good reason to *not* do this that I am not aware of? (Maybe the 
CPUs are
better for this even on a machine like Summit, but I think we're at 
the

point that we should at least be able to experimentally verify this.)

--Richard




Re: [petsc-dev] Why no SpGEMM support in AIJCUSPARSE and AIJVIENNACL?

2019-10-03 Thread Karl Rupp via petsc-dev

Do you have any experience with nsparse?

https://github.com/EBD-CREST/nsparse

I've seen claims that it is much faster than cuSPARSE for sparse
matrix-matrix products.


I haven't tried nsparse, no.

But since the performance comes from a hardware feature (cache), I would 
be surprised if there is a big performance leap over ViennaCL. (There's 
certainly some potential for some tweaking of ViennaCL's kernels; but 
note that even ViennaCL is much faster than cuSPARSE's spGEMM on average).


With the libaxb-wrapper we can just add nsparse as an operations backend 
and then easily try it out and compare against the other packages. In 
the end it doesn't matter which package provides the best performance; 
we just want to leverage it :-)


Best regards,
Karli





Karl Rupp via petsc-dev  writes:


Hi Richard,

CPU spGEMM is about twice as fast even on the GPU-friendly case of a
single rank: http://viennacl.sourceforge.net/viennacl-benchmarks-spmm.html

I agree that it would be good to have a GPU-MatMatMult for the sake of
experiments. Under these performance constraints it's not top priority,
though.

Best regards,
Karli


On 10/3/19 12:00 AM, Mills, Richard Tran via petsc-dev wrote:

Fellow PETSc developers,

I am wondering why the AIJCUSPARSE and AIJVIENNACL matrix types do not
support the sparse matrix-matrix multiplication (SpGEMM, or MatMatMult()
in PETSc parlance) routines provided by cuSPARSE and ViennaCL,
respectively. Is there a good reason that I shouldn't add those? My
guess is that support was not added because SpGEMM is hard to do well on
a GPU compared to many CPUs (it is hard to compete with, say, Intel Xeon
CPUs with their huge caches) and it has been the case that one would
generally be better off doing these operations on the CPU. Since the
trend at the big supercomputing centers seems to be to put more and more
of the computational power into GPUs, I'm thinking that I should add the
option to use the GPU library routines for SpGEMM, though. Is there some
good reason to *not* do this that I am not aware of? (Maybe the CPUs are
better for this even on a machine like Summit, but I think we're at the
point that we should at least be able to experimentally verify this.)

--Richard


Re: [petsc-dev] Why no SpGEMM support in AIJCUSPARSE and AIJVIENNACL?

2019-10-02 Thread Karl Rupp via petsc-dev

Hi Richard,

CPU spGEMM is about twice as fast even on the GPU-friendly case of a 
single rank: http://viennacl.sourceforge.net/viennacl-benchmarks-spmm.html


I agree that it would be good to have a GPU-MatMatMult for the sake of 
experiments. Under these performance constraints it's not top priority, 
though.


Best regards,
Karli


On 10/3/19 12:00 AM, Mills, Richard Tran via petsc-dev wrote:

Fellow PETSc developers,

I am wondering why the AIJCUSPARSE and AIJVIENNACL matrix types do not 
support the sparse matrix-matrix multiplication (SpGEMM, or MatMatMult() 
in PETSc parlance) routines provided by cuSPARSE and ViennaCL, 
respectively. Is there a good reason that I shouldn't add those? My 
guess is that support was not added because SpGEMM is hard to do well on 
a GPU compared to many CPUs (it is hard to compete with, say, Intel Xeon 
CPUs with their huge caches) and it has been the case that one would 
generally be better off doing these operations on the CPU. Since the 
trend at the big supercomputing centers seems to be to put more and more 
of the computational power into GPUs, I'm thinking that I should add the 
option to use the GPU library routines for SpGEMM, though. Is there some 
good reason to *not* do this that I am not aware of? (Maybe the CPUs are 
better for this even on a machine like Summit, but I think we're at the 
point that we should at least be able to experimentally verify this.)


--Richard


Re: [petsc-dev] Should v->valid_GPU_array be a bitmask?

2019-10-01 Thread Karl Rupp via petsc-dev

Hi Junchao,

I recall that Jed already suggested to make this a bitmask ~7 years ago ;-)

On the other hand: If we touch valid_GPU_array, then we should also use 
a better name or refactor completely. Code like


 (V->valid_GPU_array & PETSC_OFFLOAD_GPU)

simply isn't intuitive (nor does it make sense) when read aloud.

Best regards,
Karli


On 10/2/19 5:24 AM, Zhang, Junchao via petsc-dev wrote:

Stafano recently modified the following code,

PetscErrorCode VecCreate_SeqCUDA(Vec V)
{
   PetscErrorCode ierr;

   PetscFunctionBegin;
   ierr = PetscLayoutSetUp(V->map);CHKERRQ(ierr);
   ierr = VecCUDAAllocateCheck(V);CHKERRQ(ierr);
   ierr = 
VecCreate_SeqCUDA_Private(V,((Vec_CUDA*)V->spptr)->GPUarray_allocated);CHKERRQ(ierr);

   ierr = VecCUDAAllocateCheckHost(V);CHKERRQ(ierr);
   ierr = VecSet(V,0.0);CHKERRQ(ierr);
   ierr = VecSet_Seq(V,0.0);CHKERRQ(ierr);
V->valid_GPU_array = PETSC_OFFLOAD_BOTH;
PetscFunctionReturn(0);
}

That means if one creates an SEQCUDA vector V and then immediately tests 
if (V->valid_GPU_array == PETSC_OFFLOAD_GPU), the test will fail. That 
is counterintuitive.  I think we should have


enum 
{PETSC_OFFLOAD_UNALLOCATED=0x0,PETSC_OFFLOAD_GPU=0x1,PETSC_OFFLOAD_CPU=0x2,PETSC_OFFLOAD_BOTH=0x3}


and then use if (V->valid_GPU_array & PETSC_OFFLOAD_GPU). What do you think?

--Junchao Zhang


Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-27 Thread Karl Rupp via petsc-dev

Hi Mark,

OK, so now the problem has shifted somewhat in that it now manifests 
itself on small cases. In earlier investigation I was drawn to 
MatTranspose but had a hard time pinning it down. The bug seems more 
stable now or you probably fixed what looks like all the other bugs.


I added print statements with norms of vectors in mg.c (v-cycle) and 
found that the diffs between the CPU and GPU runs came in MatRestrict, 
which calls MatMultTranspose. I added identical print statements in the 
two versions of MatMultTranspose and see this. (pinning to the CPU does 
not seem to make any difference). Note that the problem comes in the 2nd 
iteration where the *output* vector is non-zero coming in (this should 
not matter).


Karl, I zeroed out the output vector (yy) when I come into this method 
and it fixed the problem. This is with -n 4, and this always works with 
-n 3. See the attached process layouts. It looks like this comes when 
you use the 2nd socket.


So this looks like an Nvidia bug. Let me know what you think and I can 
pass it on to ORNL.


Hmm, there were some issues with MatMultTranspose_MPIAIJ at some point. 
I've addressed some of them, but I can't confidently say that all of the 
issues were fixed. Thus, I don't think it's a problem in NVIDIA's 
cuSparse, but rather something we need to fix in PETSc. Note that the 
problem shows up with multiple MPI ranks; if it were a problem in 
cuSparse, it would show up on a single rank as well.


Best regards,
Karli





06:49  /gpfs/alpine/geo127/scratch/adams$ jsrun*-n 4 *-a 4 -c 4 -g 1 
./ex56 -cells 8,12,16 *-ex56_dm_vec_type cuda -ex56_dm_mat_type aijcusparse*

[0] 3465 global equations, 1155 vertices
[0] 3465 equations in vector, 1155 vertices
   0 SNES Function norm 1.725526579328e+01
     0 KSP Residual norm 1.725526579328e+01
         2) call Restrict with |r| = 1.402719214830704e+01
                         MatMultTranspose_MPIAIJCUSPARSE |x in| = 
1.40271921483070e+01
*                        MatMultTranspose_MPIAIJ |y in| = 
0.00e+00
*                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = 
0.00e+00
                         *** MatMultTranspose_MPIAIJCUSPARSE |yy| = 
3.43436359545813e+00
                         MatMultTranspose_MPIAIJCUSPARSE final |yy| = 
1.29055494844681e+01

                 3) |R| = 1.290554948446808e+01
         2) call Restrict with |r| = 4.109771717986951e+00
                         MatMultTranspose_MPIAIJCUSPARSE |x in| = 
4.10977171798695e+00
*                        MatMultTranspose_MPIAIJ |y in| = 
0.00e+00
*                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = 
0.00e+00
                         *** MatMultTranspose_MPIAIJCUSPARSE |yy| = 
1.79415048609144e-01
                         MatMultTranspose_MPIAIJCUSPARSE final |yy| = 
9.01083013948788e-01

                 3) |R| = 9.010830139487883e-01
                 4) |X| = 2.864698671963022e+02
                 5) |x| = 9.76328911783e+02
                 6) post smooth |x| = 8.940011621494751e+02
                 4) |X| = 8.940011621494751e+02
                 5) |x| = 1.005081556495388e+03
                 6) post smooth |x| = 1.029043994031627e+03
     1 KSP Residual norm 8.102614049404e+00
         2) call Restrict with |r| = 4.402603749876137e+00
                         MatMultTranspose_MPIAIJCUSPARSE |x in| = 
4.40260374987614e+00
*                        MatMultTranspose_MPIAIJ |y in| = 
1.29055494844681e+01
*                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = 
0.00e+00
                         *** MatMultTranspose_MPIAIJCUSPARSE |yy| = 
1.68544559626318e+00
                         MatMultTranspose_MPIAIJCUSPARSE final |yy| = 
1.82129824300863e+00

                 3) |R| = 1.821298243008628e+00
         2) call Restrict with |r| = 1.068309793900564e+00
                         MatMultTranspose_MPIAIJCUSPARSE |x in| = 
1.06830979390056e+00
                         MatMultTranspose_MPIAIJ |y in| = 
9.01083013948788e-01
                         MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = 
0.00e+00
                         *** MatMultTranspose_MPIAIJCUSPARSE |yy| = 
1.40519177065298e-01
                         MatMultTranspose_MPIAIJCUSPARSE final |yy| = 
1.01853904152812e-01

                 3) |R| = 1.018539041528117e-01
                 4) |X| = 4.949616392884510e+01
                 5) |x| = 9.309440014159884e+01
                 6) post smooth |x| = 5.432486021529479e+01
                 4) |X| = 5.432486021529479e+01
                 5) |x| = 8.246142532204632e+01
                 6) post smooth |x| = 7.605703654091440e+01
   Linear solve did not converge due to DIVERGED_ITS iterations 1
Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
06:50  /gpfs/alpine/geo127/scratch/adams$ jsrun -n 4 -a 4 -c 4 -g 1 
./ex56 -cells 8,12,16

[0] 3465 global equations, 1155 vertices
[0] 3465 equations in vector, 1155 ver

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Karl Rupp via petsc-dev



I double checked that a clean build of your (master) branch has this 
error by my branch (mark/fix-cuda-with-gamg-pintocpu), which may include 
stuff from Barry that is not yet in master, works.


so did master work recently (i.e. right before my branch got merged)?

Best regards,
Karli





On Wed, Sep 25, 2019 at 5:26 AM Karl Rupp via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:




On 9/25/19 11:12 AM, Mark Adams via petsc-dev wrote:
 > I am using karlrupp/fix-cuda-streams, merged with master, and I
get this
 > error:
 >
 > Could not execute "['jsrun -g\\ 1 -c\\ 1 -a\\ 1 --oversubscribe -n 1
 > printenv']":
 > Error, invalid argument:  1
 >
 > My branch mark/fix-cuda-with-gamg-pintocpu seems to work but I
did edit
 > the jsrun command but Karl's branch still fails. (SUMMIT was down
today
 > so there could have been updates).
 >
 > Any suggestions?

Looks very much like a systems issue to me.

Best regards,
Karli



Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-25 Thread Karl Rupp via petsc-dev




On 9/25/19 11:12 AM, Mark Adams via petsc-dev wrote:
I am using karlrupp/fix-cuda-streams, merged with master, and I get this 
error:


Could not execute "['jsrun -g\\ 1 -c\\ 1 -a\\ 1 --oversubscribe -n 1 
printenv']":

Error, invalid argument:  1

My branch mark/fix-cuda-with-gamg-pintocpu seems to work but I did edit 
the jsrun command but Karl's branch still fails. (SUMMIT was down today 
so there could have been updates).


Any suggestions?


Looks very much like a systems issue to me.

Best regards,
Karli


Re: [petsc-dev] MatMult on Summit

2019-09-24 Thread Karl Rupp via petsc-dev

Hi Mark, Richard, Junchao, et al.,

here we go:
https://gitlab.com/petsc/petsc/merge_requests/2091

This fixes indeed all the inconsistencies in test results for SNES ex19 
and even ex56. A-priori I wasn't sure about the latter, but it looks 
like this was the only missing piece.


Mark, this should allow you to move forward with GPUs.

Best regards,
Karli



On 9/24/19 11:05 AM, Mark Adams wrote:

Yes, please, thank you.

On Tue, Sep 24, 2019 at 1:46 AM Mills, Richard Tran via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:


Karl, that would be fantastic. Much obliged!

--Richard

On 9/23/19 8:09 PM, Karl Rupp wrote:

Hi,

`git grep cudaStreamCreate` reports that vectors, matrices and
scatters create their own streams. This will almost inevitably
create races (there is no synchronization mechanism implemented),
unless one calls WaitForGPU() after each operation. Some of the
non-deterministic tests can likely be explained by this.

I'll clean this up in the next few hours if there are no objections.

Best regards,
Karli



On 9/24/19 1:05 AM, Mills, Richard Tran via petsc-dev wrote:

I'm no CUDA expert (not yet, anyway), but, from what I've read,
the default stream (stream 0) is (mostly) synchronous to host and
device, so WaitForGPU() is not needed in that case. I don't know
if there is any performance penalty in explicitly calling it in
that case, anyway.

In any case, it looks like there are still some cases where
potentially asynchronous CUDA library calls are being "timed"
without a WaitForGPU() to ensure that the calls actually
complete. I will make a pass through the aijcusparse and
aijviennacl code looking for these.

--Richard

On 9/23/19 3:28 PM, Zhang, Junchao wrote:

It looks cusparsestruct->stream is always created (not NULL).  I
don't know logic of the "if (!cusparsestruct->stream)".
--Junchao Zhang


On Mon, Sep 23, 2019 at 5:04 PM Mills, Richard Tran via
petsc-dev mailto:petsc-dev@mcs.anl.gov>
<mailto:petsc-dev@mcs.anl.gov> <mailto:petsc-dev@mcs.anl.gov>>
wrote:

    In MatMultAdd_SeqAIJCUSPARSE, before Junchao's changes, towards
    the end of the function it had

      if (!yy) { /* MatMult */
        if (!cusparsestruct->stream) {
      ierr = WaitForGPU();CHKERRCUDA(ierr);
        }
      }

    I assume we don't need the logic to do this only in the
MatMult()
    with no add case and should just do this all the time, for the
    purposes of timing if no other reason. Is there some reason
to NOT
    do this because of worries the about effects that these
    WaitForGPU() invocations might have on performance?

    I notice other problems in aijcusparse.cu
<http://aijcusparse.cu> <http://aijcusparse.cu>
<http://aijcusparse.cu>,
    now that I look closer. In
MatMultTransposeAdd_SeqAIJCUSPARSE(), I
    see that we have GPU timing calls around the
cusparse_csr_spmv()
    (but no WaitForGPU() inside the timed region). I believe
this is
    another area in which we get a meaningless timing. It looks
like
    we need a WaitForGPU() there, and then maybe inside the timed
    region handling the scatter. (I don't know if this stuff
happens
    asynchronously or not.) But do we potentially want two
    WaitForGPU() calls in one function, just to help with getting
    timings? I don't have a good idea of how much overhead this
adds.

    --Richard

    On 9/21/19 12:03 PM, Zhang, Junchao via petsc-dev wrote:

    I made the following changes:
    1) In MatMultAdd_SeqAIJCUSPARSE, use this code sequence at
the end
      ierr = WaitForGPU();CHKERRCUDA(ierr);
      ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
      ierr = PetscLogGpuFlops(2.0*a->nz);CHKERRQ(ierr);
      PetscFunctionReturn(0);
    2) In MatMult_MPIAIJCUSPARSE, use the following code sequence.
    The old code swapped the first two lines. Since with
    -log_view, MatMultAdd_SeqAIJCUSPARSE is blocking, I changed
the
    order to have better overlap.
      ierr =
   
VecScatterBegin(a->Mvctx,xx,a->lvec,INSERT_VALUES,SCATTER_FORWARD);CHKERRQ(ierr);

      ierr = (*a->A->ops->mult)(a->A,xx,yy);CHKERRQ(ierr);
      ierr =
   
VecScatterEnd(a->Mvctx,xx,a->lvec,INSERT_VALUES,SCATTER_FORWARD);CHKERRQ(ierr);

      ierr =
(*a->B->ops->multadd)(a->B,a->lvec,yy,yy);CHKERRQ(ierr);
    3) Log time directly in the test code so we can also know
    execution time without -log_view (hence cuda
synchronization). I
    manually calculated the Total Mflop/s for these cases for easy
    comparison.

    <>

   

Re: [petsc-dev] MatMult on Summit

2019-09-23 Thread Karl Rupp via petsc-dev

Hi,

`git grep cudaStreamCreate` reports that vectors, matrices and scatters 
create their own streams. This will almost inevitably create races 
(there is no synchronization mechanism implemented), unless one calls 
WaitForGPU() after each operation. Some of the non-deterministic tests 
can likely be explained by this.


I'll clean this up in the next few hours if there are no objections.

Best regards,
Karli



On 9/24/19 1:05 AM, Mills, Richard Tran via petsc-dev wrote:
I'm no CUDA expert (not yet, anyway), but, from what I've read, the 
default stream (stream 0) is (mostly) synchronous to host and device, so 
WaitForGPU() is not needed in that case. I don't know if there is any 
performance penalty in explicitly calling it in that case, anyway.


In any case, it looks like there are still some cases where potentially 
asynchronous CUDA library calls are being "timed" without a WaitForGPU() 
to ensure that the calls actually complete. I will make a pass through 
the aijcusparse and aijviennacl code looking for these.


--Richard

On 9/23/19 3:28 PM, Zhang, Junchao wrote:
It looks cusparsestruct->stream is always created (not NULL).  I don't 
know logic of the "if (!cusparsestruct->stream)".

--Junchao Zhang


On Mon, Sep 23, 2019 at 5:04 PM Mills, Richard Tran via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:


In MatMultAdd_SeqAIJCUSPARSE, before Junchao's changes, towards
the end of the function it had

  if (!yy) { /* MatMult */
    if (!cusparsestruct->stream) {
  ierr = WaitForGPU();CHKERRCUDA(ierr);
    }
  }

I assume we don't need the logic to do this only in the MatMult()
with no add case and should just do this all the time, for the
purposes of timing if no other reason. Is there some reason to NOT
do this because of worries the about effects that these
WaitForGPU() invocations might have on performance?

I notice other problems in aijcusparse.cu ,
now that I look closer. In MatMultTransposeAdd_SeqAIJCUSPARSE(), I
see that we have GPU timing calls around the cusparse_csr_spmv()
(but no WaitForGPU() inside the timed region). I believe this is
another area in which we get a meaningless timing. It looks like
we need a WaitForGPU() there, and then maybe inside the timed
region handling the scatter. (I don't know if this stuff happens
asynchronously or not.) But do we potentially want two
WaitForGPU() calls in one function, just to help with getting
timings? I don't have a good idea of how much overhead this adds.

--Richard

On 9/21/19 12:03 PM, Zhang, Junchao via petsc-dev wrote:

I made the following changes:
1) In MatMultAdd_SeqAIJCUSPARSE, use this code sequence at the end
  ierr = WaitForGPU();CHKERRCUDA(ierr);
  ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
  ierr = PetscLogGpuFlops(2.0*a->nz);CHKERRQ(ierr);
  PetscFunctionReturn(0);
2) In MatMult_MPIAIJCUSPARSE, use the following code sequence.
The old code swapped the first two lines. Since with
-log_view, MatMultAdd_SeqAIJCUSPARSE is blocking, I changed the
order to have better overlap.
  ierr =

VecScatterBegin(a->Mvctx,xx,a->lvec,INSERT_VALUES,SCATTER_FORWARD);CHKERRQ(ierr);
  ierr = (*a->A->ops->mult)(a->A,xx,yy);CHKERRQ(ierr);
  ierr =

VecScatterEnd(a->Mvctx,xx,a->lvec,INSERT_VALUES,SCATTER_FORWARD);CHKERRQ(ierr);
  ierr = (*a->B->ops->multadd)(a->B,a->lvec,yy,yy);CHKERRQ(ierr);
3) Log time directly in the test code so we can also know
execution time without -log_view (hence cuda synchronization). I
manually calculated the Total Mflop/s for these cases for easy
comparison.

<>



Event                Count      Time (sec)     Flop  
               --- Global ---  --- Stage   Total   GPU    -

CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess  
AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s

Count   Size   Count   Size  %F

---
6 MPI ranks,
MatMult              100 1.0 1.1895e+01 1.0 9.63e+09 1.1 2.8e+03
2.2e+05 0.0e+00 24 99 97 18  0 100100100100  0  4743       0
 0 0.00e+00    0 0.00e+00  0

VecScatterBegin      100 1.0 4.9145e-02 3.0 0.00e+00 0.0 2.8e+03
2.2e+05 0.0e+00  0  0 97 18  0   0  0100100  0     0       0
 0 0.00e+00    0 0.00e+00  0

VecScatterEnd        100 1.0 2.9441e+00 133 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  3  0  0  0  0  13  0  0  0  0     0       0
 0 0.00e+00    0 0.00e+00  0


24 MPI ranks
MatMult              100 1.0 3.1431e+00 1.0 2.63e+09 1.2 1.9e+04
5.9e+04 0.0e+00

Re: [petsc-dev] MatMult on Summit

2019-09-21 Thread Karl Rupp via petsc-dev




On 9/22/19 6:15 AM, Jed Brown wrote:

Karl Rupp via petsc-dev  writes:


Hi Junchao,

thanks, these numbers are interesting.

Do you have an easy way to evaluate the benefits of a CUDA-aware MPI vs.
a non-CUDA-aware MPI that still keeps the benefits of your
packing/unpacking routines?

I'd like to get a feeling of where the performance gains come from. Is
it due to the reduced PCI-Express transfer


It's NVLink, not PCI-express.


Indeed.




I wonder if the single-node latency bugs on AC922 are related to these
weird performance results.

https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0



Thanks for these numbers!
Intra-Node > Inter-Node is indeed weird. I haven't observed such an 
inversion before.


Best regards,
Karli


Re: [petsc-dev] MatMult on Summit

2019-09-21 Thread Karl Rupp via petsc-dev

Hi Junchao,

thanks, these numbers are interesting.

Do you have an easy way to evaluate the benefits of a CUDA-aware MPI vs. 
a non-CUDA-aware MPI that still keeps the benefits of your 
packing/unpacking routines?


I'd like to get a feeling of where the performance gains come from. Is 
it due to the reduced PCI-Express transfer for the scatters (i.e. 
packing/unpacking and transferring only the relevant entries) on each 
rank, or is it some low-level optimization that makes the MPI-part of 
the communication faster? Your current MR includes both; it would be 
helpful to know whether we can extract similar benefits for other GPU 
backends without having to require "CUDA-awareness" of MPI. If the 
benefits are mostly due to the packing/unpacking, we could carry over 
the benefits to other GPU backends (e.g. upcoming Intel GPUs) without 
having to wait for an "Intel-GPU-aware MPI".


Best regards,
Karli


On 9/21/19 6:22 AM, Zhang, Junchao via petsc-dev wrote:
I downloaded a sparse matrix (HV15R 
) from Florida Sparse Matrix 
Collection. Its size is about 2M x 2M. Then I ran the same MatMult 100 
times on one node of Summit with -mat_type aijcusparse -vec_type cuda. I 
found MatMult was almost dominated by VecScatter in this simple test. 
Using 6 MPI ranks + 6 GPUs,  I found CUDA aware SF could improve 
performance. But if I enabled Multi-Process Service on Summit and used 
24 ranks + 6 GPUs, I found CUDA aware SF hurt performance. I don't know 
why and have to profile it. I will also collect  data with multiple 
nodes. Are the matrix and tests proper?



Event                Count      Time (sec)     Flop 
          --- Global ---  --- Stage   Total   GPU    - CpuToGpu -   
- GpuToCpu - GPU
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen 
  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   
Count   Size  %F

---
6 MPI ranks (CPU version)
MatMult              100 1.0 1.1895e+01 1.0 9.63e+09 1.1 2.8e+03 2.2e+05 
0.0e+00 24 99 97 18  0 100100100100  0  4743       0      0 0.00e+00   
  0 0.00e+00  0
VecScatterBegin      100 1.0 4.9145e-02 3.0 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  0  0 97 18  0   0  0100100  0     0       0      0 0.00e+00   
  0 0.00e+00  0
VecScatterEnd        100 1.0 2.9441e+00133  0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0  13  0  0  0  0     0       0      0 0.00e+00   
  0 0.00e+00  0


6 MPI ranks + 6 GPUs + regular SF
MatMult              100 1.0 1.7800e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
0.0e+00  0 99 97 18  0 100100100100  0 318057   3084009 100 1.02e+02 
  100 2.69e+02 100
VecScatterBegin      100 1.0 1.2786e-01 1.3 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  0  0 97 18  0  64  0100100  0     0       0      0 0.00e+00 
  100 2.69e+02  0
VecScatterEnd        100 1.0 6.2196e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  22  0  0  0  0     0       0      0 0.00e+00   
  0 0.00e+00  0
VecCUDACopyTo        100 1.0 1.0850e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   5  0  0  0  0     0       0    100 1.02e+02   
  0 0.00e+00  0
VecCopyFromSome      100 1.0 1.0263e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  54  0  0  0  0     0       0      0 0.00e+00 
  100 2.69e+02  0


6 MPI ranks + 6 GPUs + CUDA-aware SF
MatMult              100 1.0 1.1112e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
0.0e+00  1 99 97 18  0 100100100100  0 509496   3133521   0 0.00e+00   
  0 0.00e+00 100
VecScatterBegin      100 1.0 7.9461e-02 1.1 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  1  0 97 18  0  70  0100100  0     0       0      0 0.00e+00   
  0 0.00e+00  0
VecScatterEnd        100 1.0 2.2805e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  17  0  0  0  0     0       0      0 0.00e+00   
  0 0.00e+00  0


24 MPI ranks + 6 GPUs + regular SF
MatMult              100 1.0 1.1094e-01 1.0 2.63e+09 1.2 1.9e+04 5.9e+04 
0.0e+00  1 99 97 25  0 100100100100  0 510337   951558  100 4.61e+01 
  100 6.72e+01 100
VecScatterBegin      100 1.0 4.8966e-02 1.8 0.00e+00 0.0 1.9e+04 5.9e+04 
0.0e+00  0  0 97 25  0  34  0100100  0     0       0      0 0.00e+00 
  100 6.72e+01  0
VecScatterEnd        100 1.0 7.2969e-02 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  42  0  0  0  0     0       0      0 0.00e+00   
  0 0.00e+00  0
VecCUDACopyTo        100 1.0 4.4487e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   3  0  0  0  0     0       0    100 4.61e+01   
  0 0.00e+00  0
VecCopyFromSome      100 1.0 4.3315e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  29  0  0  0  0     0       0      0 0.00e+00 
  100 6.72e+01  0


24 MPI ranks + 6 GPUs + CUDA-aware 

Re: [petsc-dev] hypre and CUDA

2019-08-15 Thread Karl Rupp via petsc-dev

Hi,

one way to test is to run a sequential example through nv-prof:
 $> nvprof ./ex56 ...

https://devblogs.nvidia.com/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/

If it uses the GPU, then you will get some information on the GPU 
kernels called. If it doesn't use the GPU, the list will be (almost) empty.


Best regards,
Karli



On 8/15/19 5:47 PM, Mark Adams via petsc-dev wrote:
I have configured with Hypre on SUMMIT, with cuda, and it ran. I'm now 
trying to verify that it used GPUs (I doubt it). Any ideas on how to 
verify this? Should I use the cuda vecs and mats, or does Hypre not 
care. Can I tell hypre not to use GPUs other than configuring an 
non-cude PETSc? I'm not sure how to run a job without GPUs, but I will 
look into it.


Mark


[petsc-dev] Schedule for next release

2019-08-14 Thread Karl Rupp via petsc-dev

Hi all,

let me propose the following schedule for the next release:

* until Sunday, September 15: New pull requests are considered for the 
upcoming release.


* from Monday, September 16, to Sunday, September 22: Fixing and merging 
of open pull requests received by September 15. Extended testing of 
`master`. Simple bug-fix pull requests are still considered.


* from Monday, September 23: Satish spins the release as soon as all 
fixes are in.


With this schedule we will have a release by the end of September. The 
'merging period' is shorter than for previous releases, because the pull 
request workflow requires less bugfixing right before the release.


Satish, does this schedule work for you? Or should we move everything 
forward by one week?


Any comments are of course welcome.

Best regards,
Karli


Re: [petsc-dev] Is master broken?

2019-08-13 Thread Karl Rupp via petsc-dev

Hi,

to clarify, these commits are already in master (either directly, or in 
a condensed way in the course of different pull requests, including 
additional fixes of memory leaks and output files):


> 5e3edcb81f added check in Inode to skip GPUs for not Factor
> 346e0b6564 changed reduction logic a little and cleaned up format
> 3a282b2d9d added guards for empty process solves in CUDA
> af6bc10070 same bug fix
> 0365c3a97f try a fix
> 9c37fbf3e7 bug fix
> 72f2ad35b3 fixed bug with CUDA transpose mat-vec
> ab62ce3476 remove vecset as per Barry
> c26191aaa4 use non-collective VecSet
> 8bcb2d50b7 fixed MPI lock from call to collective method
> 3c46958f6d fix bug with empty processor
> 54cfeb1831 added missing settypes

this commit is not in master because I couldn't get the example to run:
> fd2e5db618 added cude test to ex56

these commits are ViennaCL-specific and don't need further consideration 
(the wrapper will take care of it):


> 12042c4bfa removing ViennaCL fix to GAMG
> 9508265e8e adding support for MatTranspose
> e5a6000419 adding fix for ViennaCL in MG


The *only* commit with functionality that is not in master is:

> 57224a7035 fixed up pinning CUDA to CPUs

for which PR #1954 holds the discussion of why this commit has 
problematic parts.


I've opened a pull request for Barry's fixes to the builds on Summit in 
PR #1963. If there are no surprises, this PR will be in master by the 
end of the day.


**Overall**: Please start freshly off master (and merge PR 1963 if 
needed). Then, have a look at the pinning-to-CPUs-commit and decide 
whether it needs to be reworked as for what is discussed in PR #1954. Do 
not try to rebase

 mark/gamg-fix-viennacl-rebased
or any of its offsprings any further, as this will only cause headaches 
and create conflicts with code that has already been fixed.


Best regards,
Karli



On 8/12/19 5:23 PM, Balay, Satish wrote:

I don't really understand the workflow here [with merged branches (to
master) and cherry-picking other stuff]

I've attempted to rebase this branch against latest master. The result is at:

mark/gamg-fix-viennacl-rebased-v3

There were too many merge conflicts that I had to resolve. Its
possible I made mistakes here.

So if this branch is what you need - its best to check each commit
here and verify the changes - before using them.

As Karl suggests - its probably best to pick commits that you need and
fix them [if they have bad code - from merge conflict resolution]

[and some of them can be collapsed]

Satish

--

balay@sb /home/balay/petsc (mark/gamg-fix-viennacl-rebased-v3=)
$ git log --oneline master..
dee2b8b21b (HEAD -> mark/gamg-fix-viennacl-rebased-v3, 
origin/mark/gamg-fix-viennacl-rebased-v3, mark/gamg-fix-viennacl-rebased-v2) 
protected pinnedtocpu
03d489bde4 1) When detecting version info handle blanks introducted by 
preprocessor, error if needed version cannot be detected
056432fa93 Use outputPreprocess instead of preprocess since it prints source to 
log
94885b4f80 add back code missing from rebaseing over latest master?
2a748c2fa1 fixed compile errors
5e3edcb81f added check in Inode to skip GPUs for not Factor
fc14e5b821 removed comment
57224a7035 fixed up pinning CUDA to CPUs
346e0b6564 changed reduction logic a little and cleaned up format
3a282b2d9d added guards for empty process solves in CUDA
af6bc10070 same bug fix
0365c3a97f try a fix
9c37fbf3e7 bug fix
fd2e5db618 added cude test to ex56
72f2ad35b3 fixed bug with CUDA transpose mat-vec
ab62ce3476 remove vecset as per Barry
c26191aaa4 use non-collective VecSet
12042c4bfa removing ViennaCL fix to GAMG
3c46958f6d fix bug with empty processor
8bcb2d50b7 fixed MPI lock from call to collective method
54cfeb1831 added missing settypes
9508265e8e adding support for MatTranspose
e5a6000419 adding fix for ViennaCL in MG


On Mon, 12 Aug 2019, Karl Rupp via petsc-dev wrote:


Hi Mark,

most of the CUDA-related fixes from your PR are now in master. Thank you!

The pinning of GPU-matrices to CPUs is not in master because it had several
issues:

https://bitbucket.org/petsc/petsc/pull-requests/1954/cuda-fixes-to-pinning-onto-cpu/diff

The ViennaCL-related changes in mark/gamg-fix-viennacl-rebased can be safely
discarded as the new GPU wrapper will come in place over the next days. ex56
has not been pulled over as it's not running properly on GPUs yet (the pinning
in your branch effectively turned GPU matrices into normal PETSc matrices,
effectively running (almost) everything on the CPU again)

So at this point I recommend to start a new branch off master and manually
transfer over any bits from the pinning that you want to keep.

Best regards,
Karli


On 8/3/19 8:47 PM, Mark Adams wrote:

Karl,
Did you want me to do anything at this point? (on vacation this week) I will
verify that master is all fixed if you get all my stuff integrated when I
get back to work in a week.
Thanks,
Mark

On 

Re: [petsc-dev] Is master broken?

2019-08-12 Thread Karl Rupp via petsc-dev

Hi Mark,

most of the CUDA-related fixes from your PR are now in master. Thank you!

The pinning of GPU-matrices to CPUs is not in master because it had 
several issues:


https://bitbucket.org/petsc/petsc/pull-requests/1954/cuda-fixes-to-pinning-onto-cpu/diff

The ViennaCL-related changes in mark/gamg-fix-viennacl-rebased can be 
safely discarded as the new GPU wrapper will come in place over the next 
days. ex56 has not been pulled over as it's not running properly on GPUs 
yet (the pinning in your branch effectively turned GPU matrices into 
normal PETSc matrices, effectively running (almost) everything on the 
CPU again)


So at this point I recommend to start a new branch off master and 
manually transfer over any bits from the pinning that you want to keep.


Best regards,
Karli


On 8/3/19 8:47 PM, Mark Adams wrote:

Karl,
Did you want me to do anything at this point? (on vacation this week) I 
will verify that master is all fixed if you get all my stuff integrated 
when I get back to work in a week.

Thanks,
Mark

On Sat, Aug 3, 2019 at 10:50 AM Karl Rupp <mailto:r...@iue.tuwien.ac.at>> wrote:


If you ignore the initial ViennaCL-related commits and check against
current master (that just received cherry-picked updates from your PR),
then there are really only a few commits left that are not yet
integrated.

(I'll extract two more PRs on Monday, so master will soon have your
fixes in.)

Best regards,
Karli


On 8/3/19 5:21 AM, Balay, Satish wrote:
 > I've attempted to rebase this branch over latest master - and pushed
 > my changes to branch mark/gamg-fix-viennacl-rebased-v2
 >
 > You might want to check each of your commits in this branch to see if
 > they are ok. I had to add one extra commit - to make it match 'merge
 > of mark/gamg-fix-viennacl-rebased and master'.
 >
 > This branch has 21 commits. I think its best if you can collapse them
 > into reasonable chunks of changes. [presumably a single commit
for all
 > the changes is not the correct thing here. But the current set of 21
 > commits are all over the place]
 >
 > If you are able to migrate to this branch - its best to delete
the old
 > one [i.e origin/mark/gamg-fix-viennacl-rebased]
 >
 > Satish
 >
 > On Fri, 2 Aug 2019, Mark Adams via petsc-dev wrote:
 >
 >> I have been cherry-picking, etc, branch
mark/gamg-fix-viennacl-rebased and
 >> it is very messed up. Can someone please update this branch when
all the
 >> fixes are settled down? eg, I am seeing dozens of modified files
that I
 >> don't know anything about and I certainly don't want to put in a
PR for
 >> them.
 >>
 >> I also seem to lose my pinToCPU method for cuda matrices. I don't
 >> understand how that conflicted with anyone else but it did.
 >>
 >> Thanks,
 >> Mark
 >>
 >



Re: [petsc-dev] Is master broken?

2019-08-03 Thread Karl Rupp via petsc-dev

Hi Mark,

it's fine if you just double-check that all your fixes are in master 
when you're back :-)


Best regards and enjoy your vacation,
Karli


On 8/3/19 8:47 PM, Mark Adams wrote:

Karl,
Did you want me to do anything at this point? (on vacation this week) I 
will verify that master is all fixed if you get all my stuff integrated 
when I get back to work in a week.

Thanks,
Mark

On Sat, Aug 3, 2019 at 10:50 AM Karl Rupp <mailto:r...@iue.tuwien.ac.at>> wrote:


If you ignore the initial ViennaCL-related commits and check against
current master (that just received cherry-picked updates from your PR),
then there are really only a few commits left that are not yet
integrated.

(I'll extract two more PRs on Monday, so master will soon have your
fixes in.)

Best regards,
Karli


On 8/3/19 5:21 AM, Balay, Satish wrote:
 > I've attempted to rebase this branch over latest master - and pushed
 > my changes to branch mark/gamg-fix-viennacl-rebased-v2
 >
 > You might want to check each of your commits in this branch to see if
 > they are ok. I had to add one extra commit - to make it match 'merge
 > of mark/gamg-fix-viennacl-rebased and master'.
 >
 > This branch has 21 commits. I think its best if you can collapse them
 > into reasonable chunks of changes. [presumably a single commit
for all
 > the changes is not the correct thing here. But the current set of 21
 > commits are all over the place]
 >
 > If you are able to migrate to this branch - its best to delete
the old
 > one [i.e origin/mark/gamg-fix-viennacl-rebased]
 >
 > Satish
 >
 > On Fri, 2 Aug 2019, Mark Adams via petsc-dev wrote:
 >
 >> I have been cherry-picking, etc, branch
mark/gamg-fix-viennacl-rebased and
 >> it is very messed up. Can someone please update this branch when
all the
 >> fixes are settled down? eg, I am seeing dozens of modified files
that I
 >> don't know anything about and I certainly don't want to put in a
PR for
 >> them.
 >>
 >> I also seem to lose my pinToCPU method for cuda matrices. I don't
 >> understand how that conflicted with anyone else but it did.
 >>
 >> Thanks,
 >> Mark
 >>
 >



Re: [petsc-dev] Is master broken?

2019-08-03 Thread Karl Rupp via petsc-dev
If you ignore the initial ViennaCL-related commits and check against 
current master (that just received cherry-picked updates from your PR), 
then there are really only a few commits left that are not yet integrated.


(I'll extract two more PRs on Monday, so master will soon have your 
fixes in.)


Best regards,
Karli


On 8/3/19 5:21 AM, Balay, Satish wrote:

I've attempted to rebase this branch over latest master - and pushed
my changes to branch mark/gamg-fix-viennacl-rebased-v2

You might want to check each of your commits in this branch to see if
they are ok. I had to add one extra commit - to make it match 'merge
of mark/gamg-fix-viennacl-rebased and master'.

This branch has 21 commits. I think its best if you can collapse them
into reasonable chunks of changes. [presumably a single commit for all
the changes is not the correct thing here. But the current set of 21
commits are all over the place]

If you are able to migrate to this branch - its best to delete the old
one [i.e origin/mark/gamg-fix-viennacl-rebased]

Satish

On Fri, 2 Aug 2019, Mark Adams via petsc-dev wrote:


I have been cherry-picking, etc, branch mark/gamg-fix-viennacl-rebased and
it is very messed up. Can someone please update this branch when all the
fixes are settled down? eg, I am seeing dozens of modified files that I
don't know anything about and I certainly don't want to put in a PR for
them.

I also seem to lose my pinToCPU method for cuda matrices. I don't
understand how that conflicted with anyone else but it did.

Thanks,
Mark





Re: [petsc-dev] Is master broken?

2019-08-02 Thread Karl Rupp via petsc-dev
You should be able to just cherry-pick the commits from Barry's branch 
as well as the two other branches.




On 8/2/19 8:13 PM, Mark Adams wrote:

I picked these two into Barry's branch and it built.

I would like to get them into my cuda branch. Should I just pick them? 
And not worry about Barry's branch. Or will that not work.


On Fri, Aug 2, 2019 at 12:03 PM Karl Rupp <mailto:r...@iue.tuwien.ac.at>> wrote:


FYI: The two branches are currently testing in `next-tmp` and are
likely
to be merged to master in ~5 hours.

Best regards,
Karli


On 8/2/19 4:53 PM, Smith, Barry F. via petsc-dev wrote:
 >
 >    Yes, these are bugs in Stefano's work that got into master
because we didn't have comprehensive testing. There are two branches
in the PR list you can cherry pick that will fix this problem. Sorry
about this. We're trying to get them into master as quickly as
possible but 
 >
 >     Barry
 >
 >
 >> On Aug 2, 2019, at 8:39 AM, Mark Adams mailto:mfad...@lbl.gov>> wrote:
 >>
 >> closer,
 >>
 >> On Fri, Aug 2, 2019 at 9:13 AM Smith, Barry F.
mailto:bsm...@mcs.anl.gov>> wrote:
 >>
 >>    Mark,
 >>
 >>      Thanks, that was not expected to work, I was just verifying
the exact cause of the problem and it was what I was guessing.
 >>
 >>      I believe I have fixed it. Please pull that branch again
and let me know if it works. If it does we'll do rush testing and
get it into master.
 >>
 >>       Thanks
 >>
 >>       Barry
 >>
 >>
 >>> On Aug 1, 2019, at 11:08 AM, Mark Adams mailto:mfad...@lbl.gov>> wrote:
 >>>
 >>>
 >>>
 >>> On Thu, Aug 1, 2019 at 10:30 AM Smith, Barry F.
mailto:bsm...@mcs.anl.gov>> wrote:
 >>>
 >>>    Send
 >>>
 >>> ls arch-linux2-c-debug/include/
 >>>
 >>> That is not my arch name. It is something like
arch-summit-dbg64-pgi-cuda
 >>>
 >>>   arch-linux2-c-debug/include/petscpkg_version.h
 >>>
 >>> and configure.log
 >>>
 >>>
 >>>
 >>>> On Aug 1, 2019, at 5:23 AM, Mark Adams mailto:mfad...@lbl.gov>> wrote:
 >>>>
 >>>> I get the same error with a fresh clone of master.
 >>>>
 >>>> On Thu, Aug 1, 2019 at 6:03 AM Mark Adams mailto:mfad...@lbl.gov>> wrote:
 >>>> Tried again after deleting the arch dirs and still have it.
 >>>> This is my branch that just merged master. I will try with
just master.
 >>>> Thanks,
 >>>>
 >>>> On Thu, Aug 1, 2019 at 1:36 AM Smith, Barry F.
mailto:bsm...@mcs.anl.gov>> wrote:
 >>>>
 >>>>    It is generated automatically and put in
arch-linux2-c-debug/include/petscpkg_version.h  this include file is
included at top of the "bad" source  file crashes so in theory
everything is in order check that
arch-linux2-c-debug/include/petscpkg_version.h contains
PETSC_PKG_CUDA_VERSION_GE and similar macros. If not send configure.lo
 >>>>
 >>>> check what is in
arch-linux2-c-debug/include/petscpkg_version.h it nothing or broken
send configure.lo
 >>>>
 >>>>
 >>>>    Barry
 >>>>
 >>>>
 >>>>
 >>>>> On Jul 31, 2019, at 9:28 PM, Mark Adams via petsc-dev
mailto:petsc-dev@mcs.anl.gov>> wrote:
 >>>>>
 >>>>> I am seeing this when I pull master into my branch:
 >>>>>
 >>>>>

"/autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu
<http://densecuda.cu>"
 >>>>>            , line 243: error: function call is not allowed in
a constant
 >>>>>            expression
 >>>>>    #if PETSC_PKG_CUDA_VERSION_GE(10,1,0)
 >>>>>
 >>>>> and I see that this macro does not seem to be defined:
 >>>>>
 >>>>> 22:24 master= ~/Codes/petsc$ git grep PETSC_PKG_CUDA_VERSION_GE
 >>>>> src/mat/impls/dense/seq/cuda/densecuda.cu:#if
PETSC_PKG_CUDA_VERSION_GE(10,1,0)
 >>>>
 >>>
 >>
 >> 
 >



Re: [petsc-dev] Is master broken?

2019-08-02 Thread Karl Rupp via petsc-dev
FYI: The two branches are currently testing in `next-tmp` and are likely 
to be merged to master in ~5 hours.


Best regards,
Karli


On 8/2/19 4:53 PM, Smith, Barry F. via petsc-dev wrote:


   Yes, these are bugs in Stefano's work that got into master because we didn't 
have comprehensive testing. There are two branches in the PR list you can 
cherry pick that will fix this problem. Sorry about this. We're trying to get 
them into master as quickly as possible but 

Barry



On Aug 2, 2019, at 8:39 AM, Mark Adams  wrote:

closer,

On Fri, Aug 2, 2019 at 9:13 AM Smith, Barry F.  wrote:

   Mark,

 Thanks, that was not expected to work, I was just verifying the exact 
cause of the problem and it was what I was guessing.

 I believe I have fixed it. Please pull that branch again and let me know 
if it works. If it does we'll do rush testing and get it into master.

  Thanks

  Barry



On Aug 1, 2019, at 11:08 AM, Mark Adams  wrote:



On Thu, Aug 1, 2019 at 10:30 AM Smith, Barry F.  wrote:

   Send

ls arch-linux2-c-debug/include/

That is not my arch name. It is something like arch-summit-dbg64-pgi-cuda

  arch-linux2-c-debug/include/petscpkg_version.h

and configure.log




On Aug 1, 2019, at 5:23 AM, Mark Adams  wrote:

I get the same error with a fresh clone of master.

On Thu, Aug 1, 2019 at 6:03 AM Mark Adams  wrote:
Tried again after deleting the arch dirs and still have it.
This is my branch that just merged master. I will try with just master.
Thanks,

On Thu, Aug 1, 2019 at 1:36 AM Smith, Barry F.  wrote:

   It is generated automatically and put in 
arch-linux2-c-debug/include/petscpkg_version.h  this include file is included at top of 
the "bad" source  file crashes so in theory everything is in order check that 
arch-linux2-c-debug/include/petscpkg_version.h contains PETSC_PKG_CUDA_VERSION_GE and 
similar macros. If not send configure.lo

check what is in arch-linux2-c-debug/include/petscpkg_version.h it nothing or 
broken send configure.lo


   Barry




On Jul 31, 2019, at 9:28 PM, Mark Adams via petsc-dev  
wrote:

I am seeing this when I pull master into my branch:

"/autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu"
   , line 243: error: function call is not allowed in a constant
   expression
   #if PETSC_PKG_CUDA_VERSION_GE(10,1,0)

and I see that this macro does not seem to be defined:

22:24 master= ~/Codes/petsc$ git grep PETSC_PKG_CUDA_VERSION_GE
src/mat/impls/dense/seq/cuda/densecuda.cu:#if PETSC_PKG_CUDA_VERSION_GE(10,1,0)











Re: [petsc-dev] MatPinToCPU

2019-07-28 Thread Karl Rupp via petsc-dev

Hi Mark,

feel free to submit a fresh pull request now. I looked at your latest 
commit in the repository in order to cherry-pick it, but it looked like 
it had a few other bits in it as well.


Best regards,
Karli


On 7/28/19 6:27 PM, Mark Adams via petsc-dev wrote:
This is looking good. I'm not seeing the numerical problems, but I've 
just hid them by avoiding the GPU on coarse grids.


Should I submit a pull request now or test more or wait for Karl?

On Sat, Jul 27, 2019 at 7:37 PM Mark Adams > wrote:


Barry, I fixed CUDA to pin to CPUs correctly for GAMG at least.
There are some hacks here that we can work on.

I will start testing it tomorrow, but I am pretty sure that I have
not regressed. I am hoping that this will fix the numerical
problems, which seem to be associated with empty processors.

I did need to touch code outside of GAMG and CUDA. It might be nice
to test this in a next.

GAMG now puts all reduced processorg grids on the CPU. This could be
looked at in the future.


On Sat, Jul 27, 2019 at 1:00 PM Smith, Barry F. mailto:bsm...@mcs.anl.gov>> wrote:



 > On Jul 27, 2019, at 11:53 AM, Mark Adams mailto:mfad...@lbl.gov>> wrote:
 >
 >
 > On Sat, Jul 27, 2019 at 11:39 AM Smith, Barry F.
mailto:bsm...@mcs.anl.gov>> wrote:
 >
 >   Good catch. Thanks. Maybe the SeqCUDA has the same problem?
 >
 > THis is done  (I may have done it).
 >
 > Now it seems to me that when you call VecPinToCPU you are
setting up and don't have data, so this copy does not seem
necessary. Maybe remove the copy here:
 >
 > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
 > {
 >   PetscErrorCode ierr;
 >
 >   PetscFunctionBegin;
 >   V->pinnedtocpu = pin;
 >   if (pin) {
 >     ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); 

    The copy from GPU should actually only do anything if the
GPU already has data and PETSC_OFFLOAD_GPU. If the GPU does not
have data
the copy doesn't do anything. When one calls VecPinToCPU() one
doesn't know where the data is so the call must be made, but it
may do nothing

   Note that VecCUDACopyFromGPU() calls
VecCUDAAllocateCheckHost() not VecCUDAAllocateCheck() so the GPU
will not allocate space,
VecCUDAAllocateCheck() is called from VecCUDACopyToGPU().

    Yes, perhaps the naming could be more consistent:

1) in one place it is Host in an other place it is nothing
2) some places it is Host, Device, some places GPU,CPU

    Perhaps Karl can make these all consistent and simpler in
his refactorization


   Barry


 >
 > or
 >
 > Not allocate the GPU if it is pinned by added in a check here:
 >
 > PetscErrorCode VecCUDAAllocateCheck(Vec v)
 > {
 >   PetscErrorCode ierr;
 >   cudaError_t    err;
 >   cudaStream_t   stream;
 >   Vec_CUDA       *veccuda;
 >
 >   PetscFunctionBegin;
 >   if (!v->spptr) {
 >     ierr = PetscMalloc(sizeof(Vec_CUDA),&v->spptr);CHKERRQ(ierr);
 >     veccuda = (Vec_CUDA*)v->spptr;
 > if (v->valid_GPU_array != PETSC_OFFLOAD_CPU) {
 >     err =

cudaMalloc((void**)&veccuda->GPUarray_allocated,sizeof(PetscScalar)*((PetscBLASInt)v->map->n));CHKERRCUDA(err);
 >     veccuda->GPUarray = veccuda->GPUarray_allocated;
 >     err = cudaStreamCreate(&stream);CHKERRCUDA(err);
 >     veccuda->stream = stream;
 >     veccuda->hostDataRegisteredAsPageLocked = PETSC_FALSE;
 >     if (v->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {
 >       if (v->data && ((Vec_Seq*)v->data)->array) {
 >         v->valid_GPU_array = PETSC_OFFLOAD_CPU;
 >       } else {
 >         v->valid_GPU_array = PETSC_OFFLOAD_GPU;
 >       }
 >     }
 > }
 >   }
 >   PetscFunctionReturn(0);
 > }
 >
 >
 >
 >
 >
 > > On Jul 27, 2019, at 10:40 AM, Mark Adams mailto:mfad...@lbl.gov>> wrote:
 > >
 > > Yea, I just figured out the problem. VecDuplicate_MPICUDA
did not call PinToCPU or even copy pinnedtocpu. It just copied
ops, so I added and am testing:
 > >
 > >   ierr =
VecCreate_MPICUDA_Private(*v,PETSC_TRUE,w->nghost,0);CHKERRQ(ierr);
 > >   vw   = (Vec_MPI*)(*v)->data;
 > >   ierr = PetscMemcpy((*v)->ops,win->ops,sizeof(struct
_VecOps));CHKERRQ(ierr);
 > >   ierr = VecPinToCPU(*v,win->pinnedtocpu);CHKERRQ(ierr);
 > >
 > > Thanks,
 > >
 > > On Sat, Jul 27, 2019 at 11:33 AM Smith, Barry F.
mailto:bsm..

Re: [petsc-dev] valid_GPU_matrix in interface code

2019-07-23 Thread Karl Rupp via petsc-dev

Hi Stefano,

I have just noticed we have different occurrences of the 
valid_GPU_matrix flag in src/mat/interface and src/mat/utils
I think that how they are used now is wrong, as they assume that all 
those operations can only be executed on the CPU, irrespective of the 
specific type.
Is there any plan to fix this? If not, I can take care of it, I am 
currently writing a dense matrix for CUDA here 
https://bitbucket.org/petsc/petsc/branch/stefano_zampini/cuda


there is going to be a major reorganization of the GPU code over the 
next few weeks, cf.

 https://bitbucket.org/petsc/petsc/issues/322/working-group-gpus
This, however, is pretty much orthogonal to your development of a dense 
matrix type. Once your branch is ready, it can be rather easily adopted.


As for the valid_GPU_matrix flag: This will be fixed alongside the the 
GPU reorganization. However, it might take 2-3 weeks until I finally get 
to that part. I guess you will need it sooner?


Best regards,
Karli


Re: [petsc-dev] Bitbucket is doomed

2019-05-29 Thread Karl Rupp via petsc-dev

That's just a manifestation of Satish merging really well today ;-)

Best regards,
Karli



On 5/30/19 1:11 AM, Smith, Barry F. via petsc-dev wrote:


    I just got this same merged message sent to me three times.

    In recent days I've received several sent to me twice.

    It's not like we don't get enough email from Bitbucket already.

    Barry



Begin forwarded message:

*From: *Satish Balay >
*Subject: **Re: [Bitbucket] Pull request #1708: Device assignments by 
rank are now printed by -cuda_show_devices. (petsc/petsc)*

*Date: *May 29, 2019 at 8:55:43 AM CDT
*To: *mailto:bsm...@mcs.anl.gov>>

Satish Balay
*Satish Balay* merged pull request #1708:
Device assignments by rank are now printed by -cuda_show_devices. 
 


Approved by 
BarryFSmith

View this pull request 
 
or add a comment by replying to this email.


Unwatch this pull request 
 
to stop receiving email updates. 		Bitbucket 






Re: [petsc-dev] alternatives to alt files

2019-05-02 Thread Karl Rupp via petsc-dev




   Using alt files for testing is painful. Whenever you add, for example, a 
new variable to be output in a viewer it changes the output files and you need 
to regenerate the alt files for all the test configurations. Even though the 
run behavior of the code hasn't changed.

  I'm looking for suggestions on how to handle this kind of alternative 
output in a nicer way (alternative output usually comes from different 
iterations counts due to different precision and often even different 
compilers).

  I idea I was thinking of was instead of having "alt" files we have 
"patch" files that continue just the patch to the original output file instead of a 
complete copy. Thus in some situations the patch file would still apply even if the original output 
file changed thus requiring much less manual work in updating alt files. Essentially the test 
harness would test against the output file, if that fails it would apply the first patch and 
compare again, try the second patch etc.


yes, a 'patch' approach would simplify updates to the reference

However: I'm not sure whether we're tackling the right problem here. Our
diff-based testing isn't great. I'd favor more dedicated unit tests,
where the correctness check is embedded in the test (ex*.*}) itself
rather than determined by some text-based diff tool (which, to make
matters worse, even filters out floating point numbers...). Not all
tests can be written as such -- but many can and this would
significantly reduce the burden on alt files.


I agree, but I don't know how to change without it being a ton of work.
We have a huge amount of integration tests, which are basically
tutorials run in a particular way with output that seemed sensible to
the developer at the time.  There are a number of unit tests, but my
impression is that well over half of new tests developed in the past few
years are integration tests.  It would be better for us to have more
actual unit tests so that we'd be less reliant on the relatively
arbitrary convergence characteristics of the integration tests.


yes, moving to more unit tests is a lot of work. At the same time, we 
need a long-term direction; if we just make it easier to do more 
integration tests (what this thread is about) without added incentives 
for unit tests, we should not be surprised if we get even more 
integration tests instead of unit tests in the future ;-)


Best regards,
Karli


Re: [petsc-dev] alternatives to alt files

2019-05-02 Thread Karl Rupp via petsc-dev

Hi,


Scott and PETSc folks,

  Using alt files for testing is painful. Whenever you add, for example, a 
new variable to be output in a viewer it changes the output files and you need 
to regenerate the alt files for all the test configurations. Even though the 
run behavior of the code hasn't changed.

 I'm looking for suggestions on how to handle this kind of alternative 
output in a nicer way (alternative output usually comes from different 
iterations counts due to different precision and often even different 
compilers).

 I idea I was thinking of was instead of having "alt" files we have "patch" 
files that continue just the patch to the original output file instead of a complete copy. Thus in 
some situations the patch file would still apply even if the original output file changed thus 
requiring much less manual work in updating alt files. Essentially the test harness would test 
against the output file, if that fails it would apply the first patch and compare again, try the 
second patch etc.


yes, a 'patch' approach would simplify updates to the reference

However: I'm not sure whether we're tackling the right problem here. Our 
diff-based testing isn't great. I'd favor more dedicated unit tests, 
where the correctness check is embedded in the test (ex*.*}) itself 
rather than determined by some text-based diff tool (which, to make 
matters worse, even filters out floating point numbers...). Not all 
tests can be written as such -- but many can and this would 
significantly reduce the burden on alt files.


Best regards,
Karli



Re: [petsc-dev] [petsc-checkbuilds] PETSc blame digest (next) 2019-04-26

2019-04-26 Thread Karl Rupp via petsc-dev

Hi,

I fixed this warning after merge.

Best regards,
Karli


On 4/26/19 2:28 PM, PETSc checkBuilds via petsc-checkbuilds wrote:



Dear PETSc developer,

This email contains listings of contributions attributed to you by
`git blame` that caused compiler errors or warnings in PETSc automated
testing.  Follow the links to see the full log files. Please attempt to fix
the issues promptly or let us know at petsc-dev@mcs.anl.gov if you are unable
to resolve the issues.

Thanks,
   The PETSc development team



warnings attributed to commit https://bitbucket.org/petsc/petsc/commits/beae5ec
rm malloc of iidx array of size nrhs*M for sparse rhs matrix.

   src/mat/impls/aij/mpi/mumps/mumps.c:1068
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2019/04/26/build_next_arch-linux-pkgs-gcov_cg.log]
   /sandbox/petsc/petsc.next-2/src/mat/impls/aij/mpi/mumps/mumps.c:1068:54: 
warning: 'iidx' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2019/04/26/build_next_arch-freebsd-cxx-pkgs-opt_wii.log]
   
/usr/home/balay/petsc.next-2/src/mat/impls/aij/mpi/mumps/mumps.c:1068:54: 
warning: 'iidx' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2019/04/26/build_next_arch-linux-pkgs-valgrind_es.log]
   /sandbox/petsc/petsc.next/src/mat/impls/aij/mpi/mumps/mumps.c:1068:54: 
warning: 'iidx' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2019/04/26/build_next_arch-freebsd-cxx-cmplx-pkgs-dbg_wii.log]
   /usr/home/balay/petsc.next/src/mat/impls/aij/mpi/mumps/mumps.c:1068:54: 
warning: 'iidx' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2019/04/26/build_next_arch-c-exodus-dbg-builder_frog.log]
   /sandbox/petsc/petsc.next-3/src/mat/impls/aij/mpi/mumps/mumps.c:1068:54: 
warning: 'iidx' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2019/04/26/build_next_arch-linux-pkgs-cxx-mlib_el6.log]
   
/home/sandbox/petsc/petsc.next-3/src/mat/impls/aij/mpi/mumps/mumps.c:1068:54: 
warning: 'iidx' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2019/04/26/build_next_arch-freebsd-pkgs-opt_wii.log]
   
/usr/home/balay/petsc.next-3/src/mat/impls/aij/mpi/mumps/mumps.c:1068:54: 
warning: 'iidx' may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2019/04/26/build_next_arch-linux-pkgs-opt_crank.log]
   /sandbox/petsc/petsc.next/src/mat/impls/aij/mpi/mumps/mumps.c:1068:54: 
warning: 'iidx' may be used uninitialized in this function 
[-Wmaybe-uninitialized]


To opt-out from receiving these messages - send a request to 
petsc-dev@mcs.anl.gov.



Re: [petsc-dev] Thoughts on pushing current CI infrastructure to the next level

2019-04-25 Thread Karl Rupp via petsc-dev




On 4/25/19 6:53 PM, Jed Brown wrote:

Karl Rupp via petsc-dev  writes:


With some effort we can certainly address 1.) and to some extent 3.),
probably 4.) as well, but I don't know how to solve 2.) and 5.) with
Jenkins. Given that a significant effort is required for 1.), 3.) and
4.) anyway, I'm starting to get more and more comfortable with the idea
of rolling our own CI infrastructure (which has been suggested in some
of Barry's snarky remarks already ;-) ). Small Python scripts for
executing the tests and pushing results to Bitbucket as well as a
central result storage can replicate our existing setup with a few lines
of codes, while being much more flexible.


I think further commitment to Bitbucket would be a liability.


Suggestions? Github, Gitlab, something else?



On existing open source CI tools, I think looking at how the project
itself uses CI is a good indicator.  Some examples of recent PRs with
test failures, see what needs to be done to narrow down what failed.

https://github.com/buildbot/buildbot/pull/4726

https://github.com/drone/drone/pull/2363

https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/27652 (click Expand on 
the test failure)

https://github.com/jenkinsci/jenkins/pull/3991


All of these offer more than what we have now. And yet, many offer 
features we will never need and force upon us some constraints on our 
workflow. Maybe other CI tools are better suited for PETSc than Jenkins 
- I don't know, but maybe others know :-)


Our requirements for a package-manager-like PETSc with all its possible 
external packages and weird software stacks on supercomputers are 
different to 'standard' software, unfortunately.


Thanks for your input and best regards,
Karli


[petsc-dev] Thoughts on pushing current CI infrastructure to the next level

2019-04-25 Thread Karl Rupp via petsc-dev

Dear PETSc developers,

the current Jenkins server went live last summer. Since then, the 
stability of master and next has indeed improved. Who would have thought 
three years ago that `next` is almost as stable as `master`?


However, over the weeks and months some weaknesses of our current 
continuous integration infrastructure became apparent:


1.) Still no Jenkins tests on Windows, because the remote execution of a 
Java application has some issues with Cygwin (which we require for PETSc).


2.) Jenkins workers every once in a while hang on the target machine 
(this has been independently observed in a different setting by Jed as 
well).


3.) Nonscalability of the current setup: The Jenkins server clones a 
separate copy of the repository for each pull request and each test 
arch. Each clone of the PETSc repository is 300 MB, so if we aim at 40 
different arches (i.e. the current coverage of the nightly tests) to 
test for each pull request, 300 MB * 40 = 12 GB of memory is required 
*for each pull request* on the Jenkins master.


4.) Pull requests from external repositories in Bitbucket are currently 
tested by Jenkins, but the results are not visible on the pull requests 
page. This might be a Bitbucket issue rather than a Jenkins issue; and 
yet, it impedes our work flow.


5.) Adding additional workers requires significant configuration effort 
on the Jenkins master and is far from hassle-free. For example, it is 
currently impractical to add my office machine to the pool of workers, 
even though this machine is 99% idle.


With some effort we can certainly address 1.) and to some extent 3.), 
probably 4.) as well, but I don't know how to solve 2.) and 5.) with 
Jenkins. Given that a significant effort is required for 1.), 3.) and 
4.) anyway, I'm starting to get more and more comfortable with the idea 
of rolling our own CI infrastructure (which has been suggested in some 
of Barry's snarky remarks already ;-) ). Small Python scripts for 
executing the tests and pushing results to Bitbucket as well as a 
central result storage can replicate our existing setup with a few lines 
of codes, while being much more flexible.


What do other PETSc developers think about CI infrastructure? Maybe 
suggestions other than Jenkins?


Best regards,
Karli


Re: [petsc-dev] https://www.dursi.ca/post/hpc-is-dying-and-mpi-is-killing-it.html

2019-03-18 Thread Karl Rupp via petsc-dev

Hi Matt,

(...)


His slides have more,
"
PETSc is a widely used library for large sparse iterative solves.
         Excellent and comprehensive library of solvers
         It is the basis of a significant number of home-made
simulation codes
         It is notoriously hard to start getting running with;
nontrivial even for experts to install. 



This is a typical parochial take by someone with very limited 
experience. People who routinely install a lot of libraries
say that the install is very smooth. People like this who deal with only 
their own F90 and nothing else are scared. I would
point out that if you want nothing else, pip install petsc works fine. I 
can't believe we have spent this much time on an idiot.


Please mind your language to meet new PETSc standards:
 https://bitbucket.org/petsc/petsc/src/master/CODE_OF_CONDUCT.md

Best regards,
Karli


Re: [petsc-dev] Is there a good reason that BuildSystem's cuda.py requires GNU compilers?

2019-03-14 Thread Karl Rupp via petsc-dev

Hi Richard,

the check for the GNU compilers is mostly a historic relic. We haven't 
done any systematic tests with other compilers, so that test has just 
remained in place.


It would certainly be good if you could update the check to also work 
well with the default environment on Summit.


Thanks and best regards,
Karli


On 3/13/19 4:40 AM, Mills, Richard Tran via petsc-dev wrote:

Fellow PETSc developers,

If I try to configure PETSc with CUDA support on the ORNL Summit system 
using non-GNU compilers, I run into an error due to the following code 
in packages/cuda.py:


   def configureTypes(self):
     import config.setCompilers
     if not config.setCompilers.Configure.isGNU(self.setCompilers.CC, 
self.log):

   raise RuntimeError('Must use GNU compilers with CUDA')
   ...

Is this just because this code predates support for other host compilers 
with nvcc, or is there perhaps some more subtle reason that I, with my 
inexperience using CUDA, don't know about? I'm guessing that I just need 
to add support for using '-ccbin' appropriately to set the location of 
the non-GNU host compiler, but maybe there is something that I'm 
missing. I poked around in the petsc-dev mailing list archives and can 
find a few old threads on using non-GNU compilers, but I'm not sure what 
conclusions were reached.


Best regards,
Richard




Re: [petsc-dev] PETSc release by March 29, 2019

2019-03-05 Thread Karl Rupp via petsc-dev

Hi,


- its best to submit PRs early - if they are critical [i.e if the
branch should be in release] - or if they are big - and likely to
break builds.

- we should somehow use both next and next-tmp in a way to avoid some
   PRs clogging the process for others.

   perhaps starting March 18 - freeze access to next - and keep
   recreating next & next-tmp dynamically as needed with the goal of
   testing fewer branches together (ideally 1 branch at a time) - so
   that we can:

* easily identify the branch corresponding to test failures and
* easily identify branchs that are ready for graduation.


I can be more aggressive with reverting merges until the branches are 
fixed. It's a bit more effort, but certainly justified closer to the 
release. Usually it's not a big problem to have 2-3 new (smallish) PRs 
in next.




- We should accept (minor?) bug-fix PRs even after March 22 [i.e
   anything that would be acceptable in our maint work-flow shouldn't
   be frozen]

- And we should be able to drop troublesome PRs if they are blocking
   the release.


full ack :-)

Best regards,
Karli




Satish

On Tue, 5 Mar 2019, Karl Rupp via petsc-dev wrote:


Dear PETSc developers,

let me suggest Friday, March 22, as the cut-off-date for new Pull Requests for
the upcoming release. This allows for 7 days to iron out any remaining
glitches. (It only took us a few days to release after the cut-off date last
September, so this should be fine)

Also, a clearly communicated cut-off date helps to prevent "may I also squeeze
this in at the very last minute"-PRs, which I may not have the time to deal
with anyway.

Satish, does the above schedule work for you? Since you're creating the
tarballs, you've got the final word on this :-)

Best regards,
Karli




On 3/4/19 4:31 AM, Smith, Barry F. via petsc-dev wrote:


    Due to ECP deliverables there will be a PETSc release by March 29, 2019.

    Please prepare materials you wish to get into the release soon and check
on the progress of your current pull requests to make sure they do not block
beyond the release deadline.

      Thanks

       Barry

    If someone would like to propose an intermediate deadline before the 29th
for testing/etc purposes please feel free, I don't have the energy or
initiative.



Begin forwarded message:

*From: *Jed Brown via petsc-maint mailto:petsc-ma...@mcs.anl.gov>>
*Subject: **Re: [petsc-maint] Release 3.11?*
*Date: *March 3, 2019 at 10:07:26 AM CST
*To: *"Munson, Todd" mailto:tmun...@mcs.anl.gov>>
*Cc: *petsc-maint mailto:petsc-ma...@mcs.anl.gov>>
*Reply-To: *Jed Brown mailto:j...@jedbrown.org>>

Can you, or someone else involved at that level, please propose a timeline
on petsc-dev?

"Munson, Todd" mailto:tmun...@mcs.anl.gov>> writes:


Hi Jed,

Yes, we have a funding milestone due at the end of this month, so we
should push out a release.

Thanks, Todd.


On Mar 2, 2019, at 11:36 PM, Jed Brown mailto:j...@jedbrown.org>> wrote:

Is there a funding milestone to release 3.11 this month?  If so, we need
to publicize a timeline and mention it on petsc-dev?  If not, we can
feature release whenever we feel ready, but probably in the next few
months.






Re: [petsc-dev] PETSc release by March 29, 2019

2019-03-05 Thread Karl Rupp via petsc-dev

Dear PETSc developers,

let me suggest Friday, March 22, as the cut-off-date for new Pull 
Requests for the upcoming release. This allows for 7 days to iron out 
any remaining glitches. (It only took us a few days to release after the 
cut-off date last September, so this should be fine)


Also, a clearly communicated cut-off date helps to prevent "may I also 
squeeze this in at the very last minute"-PRs, which I may not have the 
time to deal with anyway.


Satish, does the above schedule work for you? Since you're creating the 
tarballs, you've got the final word on this :-)


Best regards,
Karli




On 3/4/19 4:31 AM, Smith, Barry F. via petsc-dev wrote:


    Due to ECP deliverables there will be a PETSc release by March 29, 
2019.


    Please prepare materials you wish to get into the release soon and 
check on the progress of your current pull requests to make sure they do 
not block beyond the release deadline.


     Thanks

      Barry

    If someone would like to propose an intermediate deadline before the 
29th for testing/etc purposes please feel free, I don't have the energy 
or initiative.




Begin forwarded message:

*From: *Jed Brown via petsc-maint >

*Subject: **Re: [petsc-maint] Release 3.11?*
*Date: *March 3, 2019 at 10:07:26 AM CST
*To: *"Munson, Todd" mailto:tmun...@mcs.anl.gov>>
*Cc: *petsc-maint >

*Reply-To: *Jed Brown mailto:j...@jedbrown.org>>

Can you, or someone else involved at that level, please propose a 
timeline on petsc-dev?


"Munson, Todd" mailto:tmun...@mcs.anl.gov>> writes:


Hi Jed,

Yes, we have a funding milestone due at the end of this month, so we
should push out a release.

Thanks, Todd.

On Mar 2, 2019, at 11:36 PM, Jed Brown > wrote:


Is there a funding milestone to release 3.11 this month?  If so, we need
to publicize a timeline and mention it on petsc-dev?  If not, we can
feature release whenever we feel ready, but probably in the next few
months.




Re: [petsc-dev] patch for wrong integer type

2019-02-04 Thread Karl Rupp via petsc-dev

Hi Fabian,

I just merged the patch to master and maint. Please let us know whether 
this solves the issue.


Best regards,
Karli


On 2/4/19 8:51 PM, Matthew Knepley via petsc-dev wrote:
On Mon, Feb 4, 2019 at 2:42 PM Fabian.Jakub via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:


Dear Petsc Team,

I recently had segfaults when dumping DMPlexs through the
PetscObjectViewer into hdf5 files

This happens to me with 64 bit integers and I think there is a PetscInt
where an int should be placed.


Thanks! I think Vaclav caught this last week

https://bitbucket.org/petsc/petsc/commits/15b861d2d1f810846f37f539a33922bed3cacdd7?at=haplav/fix-cray-warnings

Should be merged soon.

   Matt

Please have a look at the attached patch.

Yours,

Fabian



--
What most experimenters take for granted before they begin their 
experiments is infinitely more interesting than any results to which 
their experiments lead.

-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-dev] [SPAM?] Re: Checkpoint-restart with DMPlex objects

2018-12-18 Thread Karl Rupp via petsc-dev






I have not quickly found how is that "VTK ordering" defined but I 
hopefully it's a well-defined unambiguous cell-local numbering. I will 
try to find it out soon and get back to you.


Hope this helps:
https://www.vtk.org/wp-content/uploads/2015/04/file-formats.pdf
(page 9)

Best regards,
Karli


Re: [petsc-dev] Error running on Titan with GPUs & GNU

2018-10-31 Thread Karl Rupp via petsc-dev

Hi Mark,

ah, I was confused by the Python information at the beginning of 
configure.log. So it is picking up the correct compiler.


Have you tried uncommenting the check for GNU?

Best regards,
Karli


On 10/31/18 11:40 AM, Mark Adams wrote:
It looks like configure is not finding the correct cc. It does not seem 
hard to find.


06:37 master= /lustre/atlas/proj-shared/geo127/petsc$ cc --version
gcc (GCC) 6.3.0 20161221 (Cray Inc.)
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

06:37 master= /lustre/atlas/proj-shared/geo127/petsc$ which cc
/opt/cray/craype/2.5.13/bin/cc
06:38 master= /lustre/atlas/proj-shared/geo127/petsc$ which gcc
/opt/gcc/6.3.0/bin/gcc


On Wed, Oct 31, 2018 at 6:34 AM Mark Adams <mailto:mfad...@lbl.gov>> wrote:




On Wed, Oct 31, 2018 at 5:05 AM Karl Rupp mailto:r...@iue.tuwien.ac.at>> wrote:

Hi Mark,

please comment or remove lines 83 and 84 in
   config/BuildSystem/config/packages/cuda.py

Is there a compiler newer than GCC 4.3 available?


You mean 6.3?

06:33  ~$ module avail gcc

-
/opt/modulefiles -
gcc/4.8.1          gcc/4.9.3          gcc/6.1.0 
gcc/6.3.0(default) gcc/7.2.0
gcc/4.8.2          gcc/5.3.0          gcc/6.2.0          gcc/7.1.0 
         gcc/7.3.0



Best regards,
Karli



On 10/31/18 8:15 AM, Mark Adams via petsc-dev wrote:
 > After loading a cuda module ...
 >
 > On Wed, Oct 31, 2018 at 2:58 AM Mark Adams mailto:mfad...@lbl.gov>
 > <mailto:mfad...@lbl.gov <mailto:mfad...@lbl.gov>>> wrote:
 >
 >     I get an error with --with-cuda=1
 >
 >     On Tue, Oct 30, 2018 at 4:44 PM Smith, Barry F.
mailto:bsm...@mcs.anl.gov>
 >     <mailto:bsm...@mcs.anl.gov <mailto:bsm...@mcs.anl.gov>>>
wrote:
 >
 >         --with-cudac=1 should be --with-cuda=1
 >
 >
 >
 >          > On Oct 30, 2018, at 12:35 PM, Smith, Barry F. via
petsc-dev
 >         mailto:petsc-dev@mcs.anl.gov>
<mailto:petsc-dev@mcs.anl.gov <mailto:petsc-dev@mcs.anl.gov>>>
wrote:
 >          >
 >          >
 >          >
 >          >> On Oct 29, 2018, at 8:09 PM, Mark Adams
mailto:mfad...@lbl.gov>
 >         <mailto:mfad...@lbl.gov <mailto:mfad...@lbl.gov>>> wrote:
 >          >>
 >          >> And a debug build seems to work:
 >          >
 >          >    Well ok.
 >          >
 >          >    Are there newer versions of the Gnu compiler
for this
 >         system? Are there any other compilers on the system
that would
 >         likely be less buggy? IBM compilers? If this simple code
 >         generates a gross error with optimization who's to
say how many
 >         more subtle bugs may be induced in the library by the
buggy
 >         optimizer (there may be none but IMHO probability
says there
 >         will be others).
 >          >
 >          >    Is there any chance that valgrind runs on this
machine;
 >         you could run the optimized version through it and
see what it says.
 >          >
 >          >   Barry
 >          >
 >          >>
 >          >> 21:04 1 master=
/lustre/atlas/proj-shared/geo127/petsc$ make
 >   
  PETSC_DIR=/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda

 >         PETSC_ARCH="" test
 >          >> Running test examples to verify correct installation
 >          >> Using
 >   
  PETSC_DIR=/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda

 >         and PETSC_ARCH=
 >          >> ***Error detected during compile or
 >         link!***
 >          >> See
http://www.mcs.anl.gov/petsc/documentation/faq.html
 >          >>
 >   
  /lustre/atlas/proj-shared/geo127/petsc/src/snes/examples/tutorials

 >         ex19
 >          >>
 >   
  *

Re: [petsc-dev] Error running on Titan with GPUs & GNU

2018-10-31 Thread Karl Rupp via petsc-dev

Hi Mark,

please comment or remove lines 83 and 84 in
 config/BuildSystem/config/packages/cuda.py

Is there a compiler newer than GCC 4.3 available?

Best regards,
Karli



On 10/31/18 8:15 AM, Mark Adams via petsc-dev wrote:

After loading a cuda module ...

On Wed, Oct 31, 2018 at 2:58 AM Mark Adams > wrote:


I get an error with --with-cuda=1

On Tue, Oct 30, 2018 at 4:44 PM Smith, Barry F. mailto:bsm...@mcs.anl.gov>> wrote:

--with-cudac=1 should be --with-cuda=1



 > On Oct 30, 2018, at 12:35 PM, Smith, Barry F. via petsc-dev
mailto:petsc-dev@mcs.anl.gov>> wrote:
 >
 >
 >
 >> On Oct 29, 2018, at 8:09 PM, Mark Adams mailto:mfad...@lbl.gov>> wrote:
 >>
 >> And a debug build seems to work:
 >
 >    Well ok.
 >
 >    Are there newer versions of the Gnu compiler for this
system? Are there any other compilers on the system that would
likely be less buggy? IBM compilers? If this simple code
generates a gross error with optimization who's to say how many
more subtle bugs may be induced in the library by the buggy
optimizer (there may be none but IMHO probability says there
will be others).
 >
 >    Is there any chance that valgrind runs on this machine;
you could run the optimized version through it and see what it says.
 >
 >   Barry
 >
 >>
 >> 21:04 1 master= /lustre/atlas/proj-shared/geo127/petsc$ make
PETSC_DIR=/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda
PETSC_ARCH="" test
 >> Running test examples to verify correct installation
 >> Using
PETSC_DIR=/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda
and PETSC_ARCH=
 >> ***Error detected during compile or
link!***
 >> See http://www.mcs.anl.gov/petsc/documentation/faq.html
 >>
/lustre/atlas/proj-shared/geo127/petsc/src/snes/examples/tutorials
ex19
 >>

*
 >> cc -o ex19.o -c -g 
  -I/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda/include    `pwd`/ex19.c
 >> cc -g  -o ex19 ex19.o 
-L/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda/lib


-Wl,-rpath,/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda/lib
-L/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda/lib
-lpetsc -lHYPRE -lflapack -lfblas -lparmetis -lmetis -ldl
 >>

/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda/lib/libpetsc.a(dlimpl.o):
In function `PetscDLOpen':
 >>
/lustre/atlas1/geo127/proj-shared/petsc/src/sys/dll/dlimpl.c:108: 
warning:
Using 'dlopen' in statically linked applications requires at
runtime the shared libraries from the glibc version used for linking
 >>

/lustre/atlas/proj-shared/geo127/petsc_titan_dbg64idx_gnu_cuda/lib/libpetsc.a(send.o):
In function `PetscOpenSocket':
 >>

/lustre/atlas1/geo127/proj-shared/petsc/src/sys/classes/viewer/impls/socket/send.c:108:
warning: Using 'gethostbyname' in statically linked applications
requires at runtime the shared libraries from the glibc version
used for linking
 >> true ex19
 >> rm ex19.o
 >> Possible error running C/C++
src/snes/examples/tutorials/ex19 with 1 MPI process
 >> See http://www.mcs.anl.gov/petsc/documentation/faq.html
 >> lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
 >> Number of SNES iterations = 2
 >> Application 19081049 resources: utime ~1s, stime ~1s, Rss
~17112, inblocks ~36504, outblocks ~111043
 >> Possible error running C/C++
src/snes/examples/tutorials/ex19 with 2 MPI processes
 >> See http://www.mcs.anl.gov/petsc/documentation/faq.html
 >> lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
 >> Number of SNES iterations = 2
 >> Application 19081050 resources: utime ~1s, stime ~1s, Rss
~19816, inblocks ~36527, outblocks ~111043
 >> 5a6
 >>> Application 19081051 resources: utime ~1s, stime ~0s, Rss
~13864, inblocks ~36527, outblocks ~111043
 >>
/lustre/atlas/proj-shared/geo127/petsc/src/snes/examples/tutorials
 >> Possible problem with ex19_hypre, diffs above
 >> =
 >> ***Error detected during compile or
link!***
 >> See http://www.mcs.anl.gov/petsc/documentation/faq.html
 >>
/lustre/atlas/proj-shared/geo127/petsc/src/snes/examples/tutorials
ex5f
 >> **

Re: [petsc-dev] Code of Conduct [ACTION REQUIRED]

2018-10-26 Thread Karl Rupp

Dear PETSc folks,

after broad approval, the pull request for the Code of Conduct has been 
merged to master and is now active.


Best regards,
Karli


On 10/23/18 12:52 PM, Karl Rupp wrote:

Dear PETSc folks,

I ask all members of the PETSc team to review the following proposal for 
adopting a code of conduct:


https://bitbucket.org/petsc/petsc/pull-requests/1196/code-of-conduct-adopt-contributor-covenant/diff 



If you have questions, concerns, etc., please reply to this email thread.

ACTION REQUIRED: If you agree with adopting the proposed Code of 
Conduct, please click on "Approve" on the pull request webpage. This 
signals that the whole team agrees to and respects the code of conduct.


Thanks and best regards,
Karli


[petsc-dev] Code of Conduct [ACTION REQUIRED]

2018-10-23 Thread Karl Rupp

Dear PETSc folks,

I ask all members of the PETSc team to review the following proposal for 
adopting a code of conduct:


https://bitbucket.org/petsc/petsc/pull-requests/1196/code-of-conduct-adopt-contributor-covenant/diff

If you have questions, concerns, etc., please reply to this email thread.

ACTION REQUIRED: If you agree with adopting the proposed Code of 
Conduct, please click on "Approve" on the pull request webpage. This 
signals that the whole team agrees to and respects the code of conduct.


Thanks and best regards,
Karli


Re: [petsc-dev] PETSc 3.10: September 4, 2018, is the cutoff-date for new features

2018-09-13 Thread Karl Rupp
Thank you, Satish, for taking care of the finishing touches and for 
releasing PETSc 3.10 :-)


Best regards,
Karli




On 09/12/2018 05:01 PM, Satish Balay wrote:

v3.10 tag is pushed, maint & master branches are updated, and tarballs are now 
available for download.

Satish

On Wed, 12 Sep 2018, Balay, Satish wrote:


All release relevant PRs are now merged to master. And I'm ready to tag the 
release [and spin the tarballs]

Any potential (code, doc) fixes can be queued for 3.10.1

Satish

On Tue, 11 Sep 2018, Satish Balay wrote:


Reminder!

please check
http://www.mcs.anl.gov/petsc/documentation/changes/dev.html and update
src/docs/website/documentation/changes/dev.html with relavent changes.

Thanks,
Satish

On Mon, 10 Sep 2018, Satish Balay wrote:


Also - please check and update src/docs/website/documentation/changes/dev.html 
as needed.

Thanks,
Satish

On Wed, 5 Sep 2018, Karl Rupp wrote:


Dear PETSc developers,

please open any outstanding pull requests for the upcoming PETSc 3.10 release
in the next few hours. After that, please do not merge anything to `next` or
`master` unless it is integration work for existing open PRs.

You can open up new pull requests in the next days and weeks as usual. If
these are documentation enhancements, we will most likely integrate them
quickly. If these are bugfixes, we may still integrate them for PETSc 3.10,
decided on a per-case basis. New features, however, will not be integrated
until PETSc 3.10 is out.

Thanks and best regards,
Karli



On 08/30/2018 06:09 PM, Karl Rupp wrote:

Dear PETSc developers,

this is a gentle reminder for the cutoff-date on September 4.

Best regards,
Karli



On 07/27/2018 02:41 AM, Karl Rupp wrote:

Dear PETSc developers,

in order to ensure a PETSc 3.10 release no later than by the end of
September (possibly earlier), we agreed on September 4, 2018, as the
cut-off date for new features. Please make sure that a pull request has
been opened on Bitbucket by this time. This is the preferred model over
merging to next directly, since our Jenkins test infrastructure will test
each PR separately (thus providing faster feedback and thus allow for
faster integration).

And while we are at it: Please keep in mind that new features also require
appropriate documentation. :-)

Thanks and best regards,
Karli













Re: [petsc-dev] PETSc 3.10: September 4, 2018, is the cutoff-date for new features

2018-09-04 Thread Karl Rupp

Dear PETSc developers,

please open any outstanding pull requests for the upcoming PETSc 3.10 
release in the next few hours. After that, please do not merge anything 
to `next` or `master` unless it is integration work for existing open PRs.


You can open up new pull requests in the next days and weeks as usual. 
If these are documentation enhancements, we will most likely integrate 
them quickly. If these are bugfixes, we may still integrate them for 
PETSc 3.10, decided on a per-case basis. New features, however, will not 
be integrated until PETSc 3.10 is out.


Thanks and best regards,
Karli



On 08/30/2018 06:09 PM, Karl Rupp wrote:

Dear PETSc developers,

this is a gentle reminder for the cutoff-date on September 4.

Best regards,
Karli



On 07/27/2018 02:41 AM, Karl Rupp wrote:

Dear PETSc developers,

in order to ensure a PETSc 3.10 release no later than by the end of 
September (possibly earlier), we agreed on September 4, 2018, as the 
cut-off date for new features. Please make sure that a pull request 
has been opened on Bitbucket by this time. This is the preferred model 
over merging to next directly, since our Jenkins test infrastructure 
will test each PR separately (thus providing faster feedback and thus 
allow for faster integration).


And while we are at it: Please keep in mind that new features also 
require appropriate documentation. :-)


Thanks and best regards,
Karli


Re: [petsc-dev] PETSc 3.10: September 4, 2018, is the cutoff-date for new features

2018-08-30 Thread Karl Rupp

Dear PETSc developers,

this is a gentle reminder for the cutoff-date on September 4.

Best regards,
Karli



On 07/27/2018 02:41 AM, Karl Rupp wrote:

Dear PETSc developers,

in order to ensure a PETSc 3.10 release no later than by the end of 
September (possibly earlier), we agreed on September 4, 2018, as the 
cut-off date for new features. Please make sure that a pull request has 
been opened on Bitbucket by this time. This is the preferred model over 
merging to next directly, since our Jenkins test infrastructure will 
test each PR separately (thus providing faster feedback and thus allow 
for faster integration).


And while we are at it: Please keep in mind that new features also 
require appropriate documentation. :-)


Thanks and best regards,
Karli


[petsc-dev] PETSc 3.10: September 4, 2018, is the cutoff-date for new features

2018-07-26 Thread Karl Rupp

Dear PETSc developers,

in order to ensure a PETSc 3.10 release no later than by the end of 
September (possibly earlier), we agreed on September 4, 2018, as the 
cut-off date for new features. Please make sure that a pull request has 
been opened on Bitbucket by this time. This is the preferred model over 
merging to next directly, since our Jenkins test infrastructure will 
test each PR separately (thus providing faster feedback and thus allow 
for faster integration).


And while we are at it: Please keep in mind that new features also 
require appropriate documentation. :-)


Thanks and best regards,
Karli


Re: [petsc-dev] Bad scaling of GAMG in FieldSplit

2018-07-26 Thread Karl Rupp

Hi Pierre,


I’m using GAMG on a shifted Laplacian with these options:
-st_fieldsplit_pressure_ksp_type preonly
-st_fieldsplit_pressure_pc_composite_type additive
-st_fieldsplit_pressure_pc_type composite
-st_fieldsplit_pressure_sub_0_ksp_pc_type jacobi
-st_fieldsplit_pressure_sub_0_pc_type ksp
-st_fieldsplit_pressure_sub_1_ksp_pc_gamg_square_graph 10
-st_fieldsplit_pressure_sub_1_ksp_pc_type gamg
-st_fieldsplit_pressure_sub_1_pc_type ksp

and I end up with the following logs on 512 (top) and 2048 (bottom) processes:
MatMult  1577790 1.0 3.1967e+03 1.2 4.48e+12 1.6 7.6e+09 5.6e+03 
0.0e+00  7 71 75 63  0   7 71 75 63  0 650501
MatMultAdd204786 1.0 1.3412e+02 5.5 1.50e+10 1.7 5.5e+08 2.7e+02 
0.0e+00  0  0  5  0  0   0  0  5  0  0 50762
MatMultTranspose  204786 1.0 4.6790e+01 4.3 1.50e+10 1.7 5.5e+08 2.7e+02 
0.0e+00  0  0  5  0  0   0  0  5  0  0 145505
[..]
KSPSolve_FS_3   7286 1.0 7.5506e+02 1.0 9.14e+11 1.8 7.3e+09 1.5e+03 
2.6e+05  2 14 71 16 34   2 14 71 16 34 539009

MatMult  1778795 1.0 3.5511e+03 4.1 1.46e+12 1.9 4.0e+10 2.4e+03 
0.0e+00  7 66 75 61  0   7 66 75 61  0 728371
MatMultAdd222360 1.0 2.5904e+0348.0 4.31e+09 1.9 2.4e+09 1.3e+02 
0.0e+00 14  0  4  0  0  14  0  4  0  0  2872
MatMultTranspose  222360 1.0 1.8736e+03421.8 4.31e+09 1.9 2.4e+09 1.3e+02 
0.0e+00  0  0  4  0  0   0  0  4  0  0  3970
[..]
KSPSolve_FS_3   7412 1.0 2.8939e+03 1.0 2.66e+11 2.1 3.5e+10 6.1e+02 
2.7e+05 17 11 67 14 28  17 11 67 14 28 148175

MatMultAdd and MatMultTranspose (performed by GAMG) somehow ruin the scalability of the overall solver. The pressure space “only” has 3M unknowns so I’m guessing that’s why GAMG is having a hard time strong scaling. 


3M unknowns divided by 512 processes implies less than 10k unknowns per 
process. It is not unusual to see strong scaling roll off at this size. 
Also note that the time per call(!) for "MatMult" is the same for both 
cases, indicating that your run into a latency-limited regime.


Also, have a look at the time ratios: With 2048 processes, MatMultAdd 
and MatMultTranspose show a time ratio of 48 and 421, respectively. 
Maybe one of your MPI ranks is getting a huge workload?




For the other fields, the matrix is somehow distributed nicely, i.e., I don’t 
want to change the overall distribution of the matrix.
Do you have any suggestion to improve the performance of GAMG in that scenario? 
I had two ideas in mind but please correct me if I’m wrong or if this is not 
doable:
1) before setting up GAMG, first use a PCTELESCOPE to avoid having too many 
processes work on this small problem
2) have the sub_0_ and the sub_1_ work on two different nonoverlapping 
communicators of size PETSC_COMM_WORLD/2, do the solve concurrently, and then 
sum the solutions (only worth doing because of -pc_composite_type additive). I 
have no idea if this easily doable with PETSc command line arguments


1) is the more flexible approach, as you have better control over the 
system sizes after 'telescoping'.


Best regards,
Karli


Re: [petsc-dev] PETSc goes Jenkins

2018-07-20 Thread Karl Rupp

Hi Patrick,

Once tuning is complete, how is one intended to interpret the nice green 
check marks? "The library compiles" or "All the tests passed"?


So far the green checkmark essentially means "The library compiles". We 
are working towards "All the tests passed", but that requires some more 
tinkering in processing test output. That is, we need to extract 
information about failed tests and present that in a prominent way.


Best regards,
Karli




I ask because in the demo PR there is the reassuring check mark and "3 
of 3 builds passed", even though failed tests are reported (timeouts).



2018-07-20 3:35 GMT+02:00 Karl Rupp <mailto:r...@iue.tuwien.ac.at>>:


Hi all,

we now have a first step towards full continuous integration via
Jenkins completed. Thus, every new pull request that is (re-)based
on a commit in master not older than today will be automatically
tested with a subset of common tests that are intended to expose the
most frequent issues. This, in particular, includes configurations
with 64 bit integers as well as complex arithmetic.

The integration of Jenkins into Bitbucket is smooth: You will notice
on our demo pull request


https://bitbucket.org/petsc/petsc/pull-requests/1039/jenkinsfile-for-build-pipelines-tied-to/diff

<https://bitbucket.org/petsc/petsc/pull-requests/1039/jenkinsfile-for-build-pipelines-tied-to/diff>
that on the right it says "3 of 3 builds passed". If you click on
the link, you will get further details on the individual builds and
find further links to the test output stored on the Jenkins server.

Implications on our development workflow: Currently 'next' gets
(ab)used for all kinds of portability tests. As a consequence, every
buggy merge clogs the whole integration pipeline, making it hard to
integrate other PRs. With the Jenkins server in place, all pull
requests will receive a good share of portability testing *before*
they reach next. This reduces the burden on next, (hopefully)
leading to faster code integration.

Corollary: I strongly encourage all PETSc developers to use issue
pull requests rather than merging to next directly (use your own
judgment for exceptions!).

Please note that we are still fine-tuning various aspects of the
Jenkins infrastructure (location of the Jenkins server, which test
nodes to use, which configurations to test, etc.). Most of these
things are changes under the hood, though. If something still
bubbles up and causes the testing to choke, please be considerate
with us ;-)

Finally, I'd like to explicitly thank Alp Dener for his help on
getting Jenkins to run smoothly. Any credit should go to him.

Best regards,
Karli




[petsc-dev] PETSc goes Jenkins

2018-07-19 Thread Karl Rupp

Hi all,

we now have a first step towards full continuous integration via Jenkins 
completed. Thus, every new pull request that is (re-)based on a commit 
in master not older than today will be automatically tested with a 
subset of common tests that are intended to expose the most frequent 
issues. This, in particular, includes configurations with 64 bit 
integers as well as complex arithmetic.


The integration of Jenkins into Bitbucket is smooth: You will notice on 
our demo pull request


https://bitbucket.org/petsc/petsc/pull-requests/1039/jenkinsfile-for-build-pipelines-tied-to/diff
that on the right it says "3 of 3 builds passed". If you click on the 
link, you will get further details on the individual builds and find 
further links to the test output stored on the Jenkins server.


Implications on our development workflow: Currently 'next' gets (ab)used 
for all kinds of portability tests. As a consequence, every buggy merge 
clogs the whole integration pipeline, making it hard to integrate other 
PRs. With the Jenkins server in place, all pull requests will receive a 
good share of portability testing *before* they reach next. This reduces 
the burden on next, (hopefully) leading to faster code integration.


Corollary: I strongly encourage all PETSc developers to use issue pull 
requests rather than merging to next directly (use your own judgment for 
exceptions!).


Please note that we are still fine-tuning various aspects of the Jenkins 
infrastructure (location of the Jenkins server, which test nodes to use, 
which configurations to test, etc.). Most of these things are changes 
under the hood, though. If something still bubbles up and causes the 
testing to choke, please be considerate with us ;-)


Finally, I'd like to explicitly thank Alp Dener for his help on getting 
Jenkins to run smoothly. Any credit should go to him.


Best regards,
Karli


Re: [petsc-dev] Could we replace SNESTestJacobian() in maint with that in master?

2018-07-19 Thread Karl Rupp

Hi Fande,



It looks like SNESTestJacobian() in master is more reliable than
that in maint.  Especially, the petsc options name is changed.
For example, from "-snes_test_jacobian_display" to
"-snes_test_jacobian_view".

There are some MOOSE tests that fail with the maint version of
SNESTestJacobian(), but they are just fine with the master
version of SNESTestJacobian().


are you able to provide us with a standalone test that exposes the
problem so that we can fix it?


I do not think it is a good idea to fix the thing in maint that has  
been already fixed in master.  It is really hard to just copy the whole 
SNESTestJacobian() in maint to master?? 


sorry, it was not clear from your email that the issue is local to the 
implementation of SNESTestJacobian and that it is enough to "just copy 
it over".


The change from "-snes_test_jacobian_display" to 
"-snes_test_jacobian_view" is a problem, however, because existing code 
interfacing PETSc 3.9 may now break.


I could do a PR, if you guys 
agree with this.


Thanks for the PR. I'll pull it in and add the old option names as a 
fallback so that existing code remains working.


Best regards,
Karli


Re: [petsc-dev] Could we replace SNESTestJacobian() in maint with that in master?

2018-07-18 Thread Karl Rupp

Hi Fande,

It looks like SNESTestJacobian() in master is more reliable than that in 
maint.  Especially, the petsc options name is changed. For example, from 
"-snes_test_jacobian_display" to "-snes_test_jacobian_view".


There are some MOOSE tests that fail with the maint version of 
SNESTestJacobian(), but they are just fine with the master version of 
SNESTestJacobian().


are you able to provide us with a standalone test that exposes the 
problem so that we can fix it?



What is the release date for PETSc-3.10.x? I could just skip 
PETSc-3.9.x, if we would have the next release soon.


Late September 2018.

Best regards,
Karli


Re: [petsc-dev] PetscSF and/or VecScatter with device pointers

2018-07-14 Thread Karl Rupp

Hi,



we're starting to explore (with Andreas cc'd) residual assembly on
GPUs.  The question naturally arises: how to do GlobalToLocal and
LocalToGlobal.

I have:

A PetscSF describing the communication pattern.

A Vec holding the data to communicate.  This will have an up-to-date
device pointer.

I would like:

PetscSFBcastBegin/End (and ReduceBegin/End, etc...) to (optionally)
work with raw device pointers.  I am led to believe that modern MPIs
can plug directly into device memory, so I would like to avoid copying
data to the host, doing the communication there, and then going back
up to the device.


I don't know how the CUDA software stack has advanced recently, but 
usually you want to try your best at avoiding any latency hits due to 
PCI Express. That is, packing the ghost data you want to communicate (as 
described by the SF) on the GPU, sending the packed data over, then 
unpacking on the host (note: here one could further optimize if needed) 
will most likely be much better in terms of latency and efficient use of 
low PCI-Express bandwidth than what Unified Memory approaches can provide.


If you want to use OpenCL, you'll have to do the above anyway.



Given that I think that the window implementation (which just
delegates the MPI for all the packing) is not considered prime time
(mostly due to MPI implementation bugs, I think), I think this means
implementing a version of PetscSF_Basic that can handle the
pack/unpack directly on the device, and then just hands off to MPI.

The next thing is how to put a higher-level interface on top of this.
What, if any, suggestions are there for doing something where the
top-level API is agnostic to whether the data are on the host or the
device.

We had thought something like:

- Make PetscSF handle device pointers (possibly with new implementation?)

- Make VecScatter use SF.

Calling VecScatterBegin/End on a Vec with up-to-date device pointers
just uses the SF directly.


There are already optimizations for VecScatter when using CUDA available 
already. I'm happy to help you with tweaking that to SF within the next 
week if needed.




Have there been any thoughts about how you want to do multi-GPU
interaction?


Just use MPI with one GPU per MPI rank :-)

Best regards,
Karli


Re: [petsc-dev] GAMG error with MKL

2018-07-07 Thread Karl Rupp

Hi all,


(...)Since it looks like MPI endpoints are going to be a long time (or 
possibly forever) in coming, I think we need (a) stopgap plan(s) to 
support this crappy MPI + OpenMP model in the meantime. One possible 
approach is to do what Mark is trying with to do with MKL: Use a third 
party library that provides optimized OpenMP implementations of 
computationally expensive kernels. It might make sense to also consider 
using Karl's ViennaCL library in this manner, which we already use to 
support GPUs, but which I believe (Karl, please let me know if I am 
off-base here) we could also use to provide OpenMP-ized linear algebra 
operations on CPUs as well. Such approaches won't use threads for lots 
of the things that a PETSc code will do, but might be able to provide 
decent resource utilization for the most expensive parts for some codes.


A lot of tweaks for making GPUs run well immediately translate to making 
OpenMP code run well. At least in theory. In practice I've observed just 
the same issues as we've seen in the past: If we run with MPI+OpenMP 
instead of just plain MPI, performance is less reproducible, lower on 
average, etc.


Still, I think that injecting OpenMP kernels via a third-party library 
is probably the "best" way of offering OpenMP:

 - it keeps the PETSc code base clean
 - tuning the OpenMP-kernels is now somebody else's problem
 - it helps with providing GPU support, because plugin interfaces improve

Yes, "OpenMP to help GPU support" and vice versa feels like "running 
even faster in the wrong direction". At the same time, however, we have 
to acknowledge that nobody will listen to our opinions/experiences/facts 
if we don't offer something that works OK (not necessarily great) with 
whatever they start with - too often MPI+OpenMP.


Best regards,
Karli


Re: [petsc-dev] building with MKL

2018-07-01 Thread Karl Rupp

Hi Mark,

have a look at config/examples/arch-linux-knl.py, which contains on line 20:
 '--with-blaslapack-dir='+os.environ['MKLROOT'],

It's important that you specify the BLAS library *and* the MKL include 
directory (either via --with-blaslapack-dir or via a pair of 
--with-blaslapack-include and --with-blaslapack-lib), otherwise it's not 
possible to compile the aijmkl code.


Best regards,
Karli




On 06/30/2018 09:55 PM, Mark Adams wrote:

It builds and runs but looks like PETSc does not register aijmkl matrices.


---
 > [0]PETSC ERROR: - Error Message 
--
 > [0]PETSC ERROR: Unknown type. Check for miss-spelling or missing 
package: 
http://www.mcs.anl.gov/petsc/documentation/installation.html#external

 > [0]PETSC ERROR: Unknown Mat type given: aijmkl
 > [0]PETSC ERROR: See 
http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.

 > [0]PETSC ERROR: Petsc Release Version 3.9.2, unknown
 > [0]PETSC ERROR: 
/global/u2/m/madams/petsc_install/petsc/src/snes/examples/tutorials/./ex19 
on a  named nid02516 by madams Sat Jun 30 12:48:10 2018
 > [0]PETSC ERROR: Configure options --known-level1-dcache-size=32768 
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8 
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2 
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8 
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8 
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4 
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 
--known-mpi-int64_t=1 --known-mpi-c-double-complex=1 
--known-has-attribute-aligned=1 --with-cc=cc --with-cxx=CC --with-fc=ftn 
COPTFLAGS="  -g -O0 -hcpu=mic-knl -qopenmp-simd" CXXOPTFLAGS="-g -O0 
-hcpu=mic-knl -qopenmp-simd" FOPTFLAGS="  -g -O0 -hcpu=mic-knl 
-qopenmp-simd" --download-metis=1 
--with-hypre-dir=/global/homes/m/madams/tmp/hypre-2.14.0 
--download-parmetis=1 
--with-blaslapack-lib=/opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_thread.a 
--with-debugging=1 --with-mpiexec=srun --with-batch=1 
--known-mpi-shared-libraries=1 --known-64-bit-blas-indices=0 
--with-64-bit-indices=1 PETSC_ARCH=arch-cori-knl-dbg64-intel-omp 
--with-openmp=1 --download-p4est=0 --with-x=0 
--prefix=/global/homes/m/madams/petsc_install/petsc-cori-knl-dbg64-intel-omp 
PETSC_DIR=/global/homes/m/madams/petsc_install/petsc
 > [0]PETSC ERROR: #1 MatSetType() line 61 in 
/global/u2/m/madams/petsc_install/petsc/src/mat/interface/matreg.c
 > [0]PETSC ERROR: #2 MatSetFromOptions() line 229 in 
/global/u2/m/madams/petsc_install/petsc/src/mat/utils/gcreate.c
 > [0]PETSC ERROR: #3 DMCreateMatrix_DA() line 793 in 
/global/u2/m/madams/petsc_install/petsc/src/dm/impls/da/fdda.c
 > [0]PETSC ERROR: #4 DMCreateMatrix() line 1262 in 
/global/u2/m/madams/petsc_install/petsc/src/dm/interface/dm.c
 > [0]PETSC ERROR: #5 SNESSetUpMatrices() line 646 in 
/global/u2/m/madams/petsc_install/petsc/src/snes/interface/snes.c
 > [0]PETSC ERROR: #6 SNESSetUp_NEWTONLS() line 296 in 
/global/u2/m/madams/petsc_install/petsc/src/snes/impls/ls/ls.c
 > [0]PETSC ERROR: #7 SNESSetUp() line 2908 in 
/global/u2/m/madams/petsc_install/petsc/src/snes/interface/snes.c
 > [0]PETSC ERROR: #8 SNESSolve() line 4300 in 
/global/u2/m/madams/petsc_install/petsc/src/snes/interface/snes.c
 > [0]PETSC ERROR: #9 main() line 161 in 
/global/homes/m/madams/petsc_install/petsc/src/snes/examples/tutorials/ex19.c

 > [0]PETSC ERROR: PETSc Option Table entries:
 > [0]PETSC ERROR: -da_refine 3
 > [0]PETSC ERROR: -ksp_monitor
 > [0]PETSC ERROR: -mat_type aijmkl
 > [0]PETSC ERROR: -options_left
 > [0]PETSC ERROR: -pc_type gamg
 > [0]PETSC ERROR: -snes_monitor_short
 > [0]PETSC ERROR: -snes_view
 > [0]PETSC ERROR: End of Error Message ---se

On Sat, Jun 30, 2018 at 3:08 PM Mark Adams > wrote:


OK, that got further.

On Sat, Jun 30, 2018 at 3:03 PM Mark Adams mailto:mfad...@lbl.gov>> wrote:

Like this?


'--with-blaslapack-lib=/opt/intel/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64/libmkl_intel_thread.a',



On Sat, Jun 30, 2018 at 3:00 PM Mark Adams mailto:mfad...@lbl.gov>> wrote:


Specify either "--with-blaslapack-dir" or
"--with-blaslapack-lib --with-blaslapack-include".
But not both!


Get rid of the dir option, and give the full path to the
library.


What is the syntax for giving the full path?



Re: [petsc-dev] HDF5 download error

2018-06-20 Thread Karl Rupp

Hi Mark,

the FTP server at MCS is down today. It should come back up later today.

Best regards,
Karli

On 06/20/2018 01:17 PM, Mark Adams wrote:

This looks like it is a problem with NERSC, this does not work:

04:14 cori04 maint= ~/petsc_install/petsc$ ping ftp.mcs.anl.gov 

PING ftp.mcs.anl.gov  (140.221.6.23) 56(84) 
bytes of data.



On Wed, Jun 20, 2018 at 7:08 AM Mark Adams > wrote:


I get this error downloading HDF5 on Cori at NERSC and it has worked
before. This is on maint.

=== 


=== 

                       Trying to download

https://support.hdfgroup.org/ftp/HDF5/current18/src/hdf5-1.8.18.tar.gz
for HDF5   
  ===                                                                                      ===                                                                                            Trying to download http://ftp.mcs.anl.gov/pub/petsc/externalpackages/hdf5-1.8.18.tar.gz for HDF5                                                               ===                                                                                      ===                                                                                            Trying to download ftp://ftp.mcs.anl.gov/pub/petsc/externalpackages/hdf5-1.8.18.tar.gz for HDF5                                                                ===                                                                                                                                                                                                                                                           ***

          UNABLE to CONFIGURE with GIVEN OPTIONS    (see
configure.log for details):

---
Error during download/extract/detection of HDF5:
file could not be opened successfully
Downloaded package HDF5 from:
https://support.hdfgroup.org/ftp/HDF5/current18/src/hdf5-1.8.18.tar.gz
is not a tarball.
[or installed python cannot process compressed files]
* If you are behind a firewall - please fix your proxy and rerun
./configure
   For example at LANL you may need to set the environmental
variable http_proxy (or HTTP_PROXY?) to http://proxyout.lanl.gov
* You can run with --with-packages-dir=/adirectory and ./configure
will instruct you what packages to download manually
* or you can download the above URL manually, to
/yourselectedlocation/hdf5-1.8.18.tar.gz
   and use the configure option:
   --download-hdf5=/yourselectedlocation/hdf5-1.8.18.tar.gz
Unable to download package HDF5 from:
http://ftp.mcs.anl.gov/pub/petsc/externalpackages/hdf5-1.8.18.tar.gz
* If URL specified manually - perhaps there is a typo?
* If your network is disconnected - please reconnect and rerun
./configure
* Or perhaps you have a firewall blocking the download
* You can run with --with-packages-dir=/adirectory and ./configure
will instruct you what packages to download manually
* or you can download the above URL manually, to
/yourselectedlocation/hdf5-1.8.18.tar.gz
   and use the configure option:
   --download-hdf5=/yourselectedlocation/hdf5-1.8.18.tar.gz
Unable to download package HDF5 from:
ftp://ftp.mcs.anl.gov/pub/petsc/externalpackages/hdf5-1.8.18.tar.gz
* If URL specified manually - perhaps there is a typo?
* If your network is disconnected - please reconnect and rerun
./configure
* Or perhaps you have a firewall blocking the download
* You can run with --with-packages-dir=/adirectory and ./configure
will instruct you what packages to download manually
* or you can download the above URL manually, to
/yourselectedlocation/hdf5-1.8.18.tar.gz
   and use the configure option:
   --download-hdf5=/yourselectedlocation/hdf5-1.8.18.tar.gz

***



Re: [petsc-dev] Tiny pull requests via Bitbucket web interface

2018-06-06 Thread Karl Rupp

Hi all,

yes, I support Patrick's idea of actively encouraging such simple pull 
requests. Particularly when it comes to documentation, it would be very 
handy to also add a link to the manual pages on the top right. For example,


http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetValue.html
currently states "Report Typos and Errors". We could add another line, 
e.g. "Fix Typos and Errors via Bitbucket" pointing to the respective 
source file (maybe even the source line can be pointed to).


Best regards,
Karli



On 06/05/2018 07:02 PM, Patrick Sanan wrote:
After Karl's nice talk on contributing to PETSc, I was reminded of a 
similar talk that I saw at the Julia conference. The title was also 
something like "Contributing is easy!" but used an even more extreme 
example of making your first contribution. It was (as Barry encouraged) 
a small documentation change, and they demonstrated how to do this via 
the GitHub web interface.



This could be a great way to lower the "activation energy" for these 
kinds of tiny, trivially-reviewable changes.



The practical steps with Bitbucket are approximately :

  * go to the PETSc Bitbucket site
  * navigate to the source file you want to change
  * "edit"
  * make sure you are at "master" (I had to select this from the
pull-down, otherwise "edit" was greyed out and and gives a hint on
mouseover)
  * make your small, innocuous edit
  * "commit"
  * select "create a pull request" if needbe, and fill out comments /
reviewers as usual.

I believe that if you don't have write access, you can still do this and 
it will create a fork for you automatically.



Here's a test:

https://bitbucket.org/petsc/petsc/pull-requests/975/docs-manual-makefile-fix-typo-in-error/diff


Thoughts? Should this be actively encouraged?



Re: [petsc-dev] Figures 20, 21 in Chapter 13.1 of PETSc manual are out of date

2018-04-18 Thread Karl Rupp

Hi Junchao,

1) The manual says the example is src/ksp/ksp/examples/ex10.c, but it 
actually links to src/ksp/ksp/examples/tutorial/ex10.c. This is a minor 
issue.
2) One could not use the same command line options (-f0 medium -f1 
arco6) as shown in the figures. There are no such matrices so one can't 
simply copy & paste.
3) It seems ex10.c had gone through large changes and it can not produce 
a log summary similar to the figures anymore (e.g., no stages like 
"Event Stage 4: KSPSetUp 1" ).


such things (unfortunately) tend to happen over time. Please feel free 
to fix it :-)


Best regards,
Karli


Re: [petsc-dev] upcoming release and testing [v2]

2018-04-06 Thread Karl Rupp

Hi again,

the reason for the higher number of timeouts is likely to be due to the 
higher number of GPU tests. GPU builds that formerly only used CUSP now 
run against the CUDA backend, which has a higher number of tests. Also, 
the CUDA backend uses CUBLAS and CUSPARSE, whereas CUSP used its own 
kernels. As far as I know, CUBLAS and CUSPARSE initialization is fairly 
slow on the M2090.


Best regards,
Karli


On 04/06/2018 09:13 PM, Karl Rupp wrote:

Hi,


The CUDA tests are hanging/timing-out more often now. For eg:
http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/06/examples_next_arch-cuda-double_es.log 



And I did see some build where they didn't get killed due to timeout. 
For eg:
http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/05/examples_next_arch-cuda-double_es.log 



This is on M2090.  I can see them getting stuck on es.mcs [when I run 
manually - and check with nvidia-smi]


When i run these tests manually on GTX1050 (frog.mcs) - they zip 
through..
Any idea why they get stuck on M2090? [more frequently than random 
hangs..]


no, I don't know why this is the case. All my local tests finish 
quickly, too. I noticed last summer that there is higher startup 
overhead on the M2090 than on more recent GPUs, but that was in the 
seconds regime, not in minutes.


Are the tests run in parallel? If so, then maybe the parallel 
initialization of GPUs is slowing things down.


Best regards,
Karli


Re: [petsc-dev] upcoming release and testing [v2]

2018-04-06 Thread Karl Rupp

Hi,


The CUDA tests are hanging/timing-out more often now. For eg:
http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/06/examples_next_arch-cuda-double_es.log

And I did see some build where they didn't get killed due to timeout. For eg:
http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/04/05/examples_next_arch-cuda-double_es.log

This is on M2090.  I can see them getting stuck on es.mcs [when I run manually 
- and check with nvidia-smi]

When i run these tests manually on GTX1050 (frog.mcs) - they zip through..
Any idea why they get stuck on M2090? [more frequently than random hangs..]


no, I don't know why this is the case. All my local tests finish 
quickly, too. I noticed last summer that there is higher startup 
overhead on the M2090 than on more recent GPUs, but that was in the 
seconds regime, not in minutes.


Are the tests run in parallel? If so, then maybe the parallel 
initialization of GPUs is slowing things down.


Best regards,
Karli


Re: [petsc-dev] upcoming release and testing

2018-04-05 Thread Karl Rupp

Hi Satish,

FYI: I added a mention of GPU backends available in the release and 
fixed missing ul-tags in src/docs/website/documentation/changes/39.html 
in your balay/release-3.9 branch.


Best regards,
Karli

On 04/02/2018 08:18 PM, Satish Balay wrote:

All,

It would be good if http://www.mcs.anl.gov/petsc/documentation/changes/dev.html 
is cheked and updated with any obvious missing stuff.

Thanks,
Satish



Re: [petsc-dev] upcoming release and testing

2018-04-02 Thread Karl Rupp

Hi Satish,

CUDA and ViennaCL can be enabled for the first time in this release.

Best regards,
Karli



On 04/02/2018 08:15 PM, Satish Balay wrote:

Karl,

Are we disabling CUDA usage for this release aswell?

Thanks,
Satish



Re: [petsc-dev] upcoming release and testing

2018-04-01 Thread Karl Rupp

Hi Satish,


I'll try to send follow up emails on master brakages.


Karl,

http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/31/examples_master_arch-c-exodus-dbg-builder_es.log

not ok ksp_ksp_tests-ex43_1
#   terminate called after throwing an instance of 
'thrust::system::system_error'
# what():  cudaFree in free: an illegal memory access was encountered


Is this related to cuda change? [I haven't explored yet]


this looks like a one-off test failure. The previous day that test ran 
through. Also, I only removed CUSP stuff, but did not alter the existing 
CUDA backend.


Best regards,
Karli


Re: [petsc-dev] PETSc blame digest (next) 2018-03-28

2018-03-28 Thread Karl Rupp

Hi,

fixes for these are now in next.

Best regards,
Karli


On 03/28/2018 04:00 PM, PETSc checkBuilds wrote:



Dear PETSc developer,

This email contains listings of contributions attributed to you by
`git blame` that caused compiler errors or warnings in PETSc automated
testing.  Follow the links to see the full log files. Please attempt to fix
the issues promptly or let us know at petsc-dev@mcs.anl.gov if you are unable
to resolve the issues.

Thanks,
   The PETSc development team



warnings attributed to commit https://bitbucket.org/petsc/petsc/commits/2ea74b3
CUSP: Converted all CUSP-tests to CUDA.

   src/snes/examples/tutorials/ex47cu.cu:134
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/28/examples_next_arch-cuda-single_es.log]
   /sandbox/petsc/petsc.next-2/src/snes/examples/tutorials/ex47cu.cu:134: 
undefined reference to `VecCUDARestoreArrayRead'
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/28/examples_next_arch-cuda-double_es.log]
   /sandbox/petsc/petsc.next/src/snes/examples/tutorials/ex47cu.cu:134: 
undefined reference to `VecCUDARestoreArrayRead'

   src/snes/examples/tutorials/ex47cu.cu:135
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/28/examples_next_arch-cuda-single_es.log]
   /sandbox/petsc/petsc.next-2/src/snes/examples/tutorials/ex47cu.cu:135: 
undefined reference to `VecCUDARestoreArrayWrite'
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/28/examples_next_arch-cuda-double_es.log]
   /sandbox/petsc/petsc.next/src/snes/examples/tutorials/ex47cu.cu:135: 
undefined reference to `VecCUDARestoreArrayWrite'

   src/snes/examples/tutorials/ex47cu.cu:98
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/28/examples_next_arch-cuda-single_es.log]
   /sandbox/petsc/petsc.next-2/src/snes/examples/tutorials/ex47cu.cu:98: 
undefined reference to `VecCUDAGetArrayRead'
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/28/examples_next_arch-cuda-double_es.log]
   /sandbox/petsc/petsc.next/src/snes/examples/tutorials/ex47cu.cu:98: 
undefined reference to `VecCUDAGetArrayRead'

   src/snes/examples/tutorials/ex47cu.cu:99
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/28/examples_next_arch-cuda-single_es.log]
   /sandbox/petsc/petsc.next-2/src/snes/examples/tutorials/ex47cu.cu:99: 
undefined reference to `VecCUDAGetArrayWrite'
 
[http://ftp.mcs.anl.gov/pub/petsc/nightlylogs/archive/2018/03/28/examples_next_arch-cuda-double_es.log]
   /sandbox/petsc/petsc.next/src/snes/examples/tutorials/ex47cu.cu:99: 
undefined reference to `VecCUDAGetArrayWrite'


To opt-out from receiving these messages - send a request to 
petsc-dev@mcs.anl.gov.



Re: [petsc-dev] Target next PETSc/TAO release for March 30

2018-03-27 Thread Karl Rupp

Hi Stefano,

alright, thank you, the branch is now in master.

Best regards,
Karli


On 03/26/2018 06:04 PM, Stefano Zampini wrote:

Karl,

thanks. I have deleted the "not working properly commit" and merged to 
next for testing the last commit.

Ready the go...

Stefano

2018-03-26 10:31 GMT+02:00 Karl Rupp <mailto:r...@iue.tuwien.ac.at>>:


Hi Stefano,


Next is quite messy at the moment. How are we going to handle
merging to master? I have a couple of branches that are sitting
in next for a while, and that can be merged safely
: stefano_zampini/fix-matis-dmda-l2g
and stefano_zampini/add-pod-test


I merged stefano_zampini/add-pod-test to master.

stefano_zampini/fix-matis-dmda-l2g has two commits that are not in
next. The latest one says "still does not work properly", hence I
don't think it's a good idea to merge to master right away... ;-)

Best regards,
Karli



Il 23 Mar 2018 8:21 PM, "Jed Brown" mailto:j...@jedbrown.org> <mailto:j...@jedbrown.org
<mailto:j...@jedbrown.org>>> ha scritto:

     I think it's feasible, but are we ever going to have more
than a week's
     notice so that we can have a known merge window followed by
a few days
     of feature freeze?

     Richard Tran Mills mailto:rtmi...@anl.gov> <mailto:rtmi...@anl.gov
<mailto:rtmi...@anl.gov>>> writes:

      > All,
      >
      > To meet deliverables for one the DOE projects providing
major
     funding for
      > PETSc development, Barry, Todd, and I agree that we
should target
     Friday,
      > March 30 for the next PETSc/TAO release.
      >
      > Does anyone see any major obstacles to this? Does anyone
have
     anything that
      > will require some extra help/coordination?
      >
      > Thanks,
      > Richard





--
Stefano


Re: [petsc-dev] Target next PETSc/TAO release for March 30

2018-03-26 Thread Karl Rupp

Hi Stefano,


Next is quite messy at the moment. How are we going to handle merging to 
master? I have a couple of branches that are sitting in next for a 
while, and that can be merged safely 
: stefano_zampini/fix-matis-dmda-l2g and stefano_zampini/add-pod-test


I merged stefano_zampini/add-pod-test to master.

stefano_zampini/fix-matis-dmda-l2g has two commits that are not in next. 
The latest one says "still does not work properly", hence I don't think 
it's a good idea to merge to master right away... ;-)


Best regards,
Karli




Il 23 Mar 2018 8:21 PM, "Jed Brown" > ha scritto:


I think it's feasible, but are we ever going to have more than a week's
notice so that we can have a known merge window followed by a few days
of feature freeze?

Richard Tran Mills mailto:rtmi...@anl.gov>> writes:

 > All,
 >
 > To meet deliverables for one the DOE projects providing major
funding for
 > PETSc development, Barry, Todd, and I agree that we should target
Friday,
 > March 30 for the next PETSc/TAO release.
 >
 > Does anyone see any major obstacles to this? Does anyone have
anything that
 > will require some extra help/coordination?
 >
 > Thanks,
 > Richard




Re: [petsc-dev] [petsc-maint] Installing with CUDA on a cluster

2018-03-12 Thread Karl Rupp

Hi Satish,


diff --git a/config/BuildSystem/config/packages/cuda.py 
b/config/BuildSystem/config/packages/cuda.py
index f5d1395e54..b80ef88c35 100644
--- a/config/BuildSystem/config/packages/cuda.py
+++ b/config/BuildSystem/config/packages/cuda.py
@@ -13,7 +13,7 @@ class Configure(config.package.Package):
  self.complex  = 1
  self.cudaArch = ''
  self.CUDAVersion  = ''
-self.CUDAMinVersion   = '5000' # Minimal cuda version is 5.0
+self.CUDAMinVersion   = '7050' # Minimal cuda version is 7.5
  self.hastests = 0
  self.hastestsdatafiles= 0
  return
@@ -160,7 +160,6 @@ class Configure(config.package.Package):
  
def configureLibrary(self):

  config.package.Package.configureLibrary(self)
-if self.defaultScalarType.lower() == 'complex': self.CUDAMinVersion = 
'7050'
  self.checkCUDAVersion()
  self.checkNVCCDoubleAlign()
  self.configureTypes()
diff --git a/config/BuildSystem/config/packages/cusp.py 
b/config/BuildSystem/config/packages/cusp.py
index e6bf4cc118..9ed82f7e78 100644
--- a/config/BuildSystem/config/packages/cusp.py
+++ b/config/BuildSystem/config/packages/cusp.py
@@ -13,7 +13,7 @@ class Configure(config.package.Package):
  self.cxx = 0
  self.complex = 0   # Currently CUSP with complex numbers is not 
supported
  self.CUSPVersion = ''
-self.CUSPMinVersion  = '400' # Minimal cusp version is 0.4
+self.CUSPMinVersion  = '500' # Minimal cusp version is 0.5.0
  return
  
def setupDependencies(self, framework):

<<<<


yep, that's good.



One issue is - I do not know if we have any users using [or requring]
older version. [due to code issues or cuda+os compatibility issues]

And same with cuda.

So I guess we could keep the current defaults - and update them when
any relavent issues come up..


well, our GPU stuff is disabled in the release, so it should be clear 
that these things are in the flux. For GPU systems it's reasonable to 
require a software stack that is not more than ~3 years old, 
particularly since driver updates are much easier than full OS upgrades.


Best regards,
Karli




And will plan on using M2090 testbed [with either cuda-7.5 or cuda-8
-arch=sm_20] for forseeable future.

Satish

On Mon, 12 Mar 2018, Karl Rupp wrote:


Hi Satish,

thanks for the pull request. I approve the changes, improved appending the
-Wno-deprecated-gpu-targets to also work on my machine, and have merged
everything to next.



* My fixes should alleviate some of the CUSP installation issues. I
don't know enough about CUSP interface wrt the useful features vs
other burderns - and if its good to drop it or not. [If needed - we
can add in more version dependencies in configure]


This should be fine for now. In the long term CUSP may be completely
superseded by NVIDIA's AMGX. Let's see how things develop...




* Wrt CUDA - currently my test is with CUDA-7.5. I can try migrating a
couple of tests to CUDA-9.1 [on frog]. But what about older
releases?  Any reason we should drop them? I.e any reason to up the
following values?

  self.CUDAMinVersion   = '5000' # Minimal cuda version is 5.0
  self.CUSPMinVersion  = '400' # Minimal cusp version is 0.4


See the answer here for a list of CUDA capabilities and defaults:
https://stackoverflow.com/questions/28932864/cuda-compute-capability-requirements

We definitely don't need to support compute architecture 1.x (~10 years old),
as there is no double precision support and hence fairly useless for our
purposes. Thus, we should be absolutely fine with requiring CUDA 7.0 or
higher.




We do change it for complex build [we don't have a test for this case]

  if self.defaultScalarType.lower() == 'complex': self.CUDAMinVersion =
  '7050'


I don't remember the exact reason, but I remember that there is one for
requiring CUDA 7.5 here. Let's use CUDA 7.5 as the minimum for both real and
complex then?




* Our test GUP is M2090 - with Compute capability (version) 2.0.
CUDA-7.5 works on it. CUDA-8 gives deprecated warnings. CUDA-9 does
not work? So what do we do for such old hardware? Do we keep
CUDA-7.5 is the minimum supported version for extended time? [At
some point we could switch to minimum version CUDA-8 - if we can get
rid of the warnings]


Your PR silences the deprecation warnings.
Compute capability 2.0 is fine for our tests for some time to come. We should
certainly upgrade at some point, yet my experience with GPUs is that older
GPUs are actually the better test environment, as they tend to reveal bugs
quicker than newer hardware.

Best regards,
Karli




* BTW: Wrt --with-cuda-arch, I'm hoping we can get rid of it in favor
of CUDAFLAGS [with defaults similar to CFLAGS d

Re: [petsc-dev] [petsc-maint] Installing with CUDA on a cluster

2018-03-12 Thread Karl Rupp

Hi Satish,

thanks for the pull request. I approve the changes, improved appending 
the -Wno-deprecated-gpu-targets to also work on my machine, and have 
merged everything to next.




* My fixes should alleviate some of the CUSP installation issues. I
   don't know enough about CUSP interface wrt the useful features vs
   other burderns - and if its good to drop it or not. [If needed - we
   can add in more version dependencies in configure]


This should be fine for now. In the long term CUSP may be completely 
superseded by NVIDIA's AMGX. Let's see how things develop...





* Wrt CUDA - currently my test is with CUDA-7.5. I can try migrating a
   couple of tests to CUDA-9.1 [on frog]. But what about older
   releases?  Any reason we should drop them? I.e any reason to up the
   following values?

 self.CUDAMinVersion   = '5000' # Minimal cuda version is 5.0
 self.CUSPMinVersion  = '400' # Minimal cusp version is 0.4


See the answer here for a list of CUDA capabilities and defaults:
https://stackoverflow.com/questions/28932864/cuda-compute-capability-requirements

We definitely don't need to support compute architecture 1.x (~10 years 
old), as there is no double precision support and hence fairly useless 
for our purposes. Thus, we should be absolutely fine with requiring CUDA 
7.0 or higher.





   We do change it for complex build [we don't have a test for this case]

 if self.defaultScalarType.lower() == 'complex': self.CUDAMinVersion = 
'7050'


I don't remember the exact reason, but I remember that there is one for 
requiring CUDA 7.5 here. Let's use CUDA 7.5 as the minimum for both real 
and complex then?





* Our test GUP is M2090 - with Compute capability (version) 2.0.
   CUDA-7.5 works on it. CUDA-8 gives deprecated warnings. CUDA-9 does
   not work? So what do we do for such old hardware? Do we keep
   CUDA-7.5 is the minimum supported version for extended time? [At
   some point we could switch to minimum version CUDA-8 - if we can get
   rid of the warnings]


Your PR silences the deprecation warnings.
Compute capability 2.0 is fine for our tests for some time to come. We 
should certainly upgrade at some point, yet my experience with GPUs is 
that older GPUs are actually the better test environment, as they tend 
to reveal bugs quicker than newer hardware.


Best regards,
Karli




* BTW: Wrt --with-cuda-arch, I'm hoping we can get rid of it in favor
   of CUDAFLAGS [with defaults similar to CFLAGS defaults] - but its
   not clear if I can easily untangle the dependencies we have [wrt CPP
   - and others]
   
   Or can we get rid of this default alltogether [currently

   -arch=sm_20] - and expect nvcc to have sane defaults? Then we can
   probably eliminate all this complicated code. [If cuda-7.5 and higer
   do this properly - we could use that as the minimum supported version?]

Satish

On Sat, 10 Mar 2018, Karl Rupp wrote:


Hi all,

a couple of notes here, particularly for Manuel:

  * CUSP is repeatedly causing such installation problems, hence we will soon
drop it as a vector backend and instead only provide a native
CUBLAS/CUSPARSE-based backend.

  * you can use this native CUDA backend already now. Just configure with only
--with-cuda=1 --with-cuda-arch=sm_60 (sm_30 should also work and is compatible
with Tesla K20 GPUs you may find on other clusters).

  * The multigrid preconditioner from CUSP is selected via
-pc_type sa_cusp
Make sure you also use -vec_type cusp -mat_type aijcusp
If you don't need the multigrid preconditioner from CUSP, please
just reconfigure and use the native CUDA-backend with -vec_type cuda -mat_type
aijcusparse

  * Right now only one of {native CUDA, CUSP, ViennaCL} can be activated at
configure time. This will be fixed later this month.

If you're looking for a GPU-accelerated multigrid preconditioner: I just heard
yesterday that NVIDIA's AMGX is now open source. I'll provide a wrapper within
PETSc soon.

As Matt already said: Don't expect much more than a modest speedup over your
existing CPU-based code - provided that your setup is GPU-friendly and your
problem size is appropriate.

Best regards,
Karli




On 03/10/2018 03:38 AM, Satish Balay wrote:

I've updated configure so that --download-cusp gets the
correct/compatible cusp version - for cuda 7,8 vs 9

The changes are in branch balay/cuda-cusp-cleanup - and merged to next.

Satish

On Wed, 7 Mar 2018, Satish Balay wrote:


--download-cusp gets hardly ever used so likely broken.

It needs to be updated to somehow use the correct cusp version based
on the cuda version thats being used.

[and since we can't easily check for cusp compatibility - we should
probably
remove checkCUSPVersion() code]

When using Cuda-9 - you can try options:

--download-cusp=1 --download-cusp-commit=116b090








Re: [petsc-dev] plans for preconditioners for SeqSELL

2018-03-06 Thread Karl Rupp


Karl, are you thinking of a matrix subclass that has everything that an 
AIJ matrix does, but also keeps a SELL copy around for operations like 
MatMult()? Would it make sense to just keep another Mat inside (like how 
MPIAIJ keeps multiple Mat instances) that *is* of type MATSELL, that 
gets built/updated as needed? Would this involve carrying too much 
baggage around, associated with a complete Mat instance?


What I have in mind is to put the SELL datastructures into a A->spptr, 
very much like you did for AIJMKL.



I like the idea 
of having a MATSELL type available that is lean (insofar as not having 
storing an AIJ matrix) for those cases when a user really knows that the 
AIJ stuff will not be needed. But maybe it makes sense to be able to use 
that inside another matrix class. Perhaps we could have something, 
called, say, MATAIJMUTABLE that uses AIJ but might also create copies in 
SELL (or other formats, potentially) when appropriate -- perhaps based 
on a some performance model indicating which format is fastest for 
MatMult() or whatever.


The actual overhead of also storing a SELL datastructure in terms of 
memory footprint is at most 2x. When you keep in mind that extra memory 
during the matrix assembly is also needed in the stash, then the impact 
on overall memory consumption is about 50 percent extra. Given that SELL 
is a performance optimization for SpMV (and hence the SELL datastructure 
is only populated if you call MatMult on the particular matrix), I'm not 
too worried about the increased memory footprint at this time.



Having a class that's an AIJ but can also use SELL is more convenient 
than adding a fallback to AIJ format inside MATSELL. I wonder if the 
latter option might be preferable in some circumstances, however, 
because it can avoid the extra memory footprint of also keeping the 
matrix in AIJ -- maybe AIJ operations are rarely needed and the AIJ 
conversion can just happen on a lazy basis.


I advocate an 'optimize as needed' approach here. Let's first make SELL 
compatible with the full range of AIJ operations and preconditioners.


Best regards,
Karli





--Richard

On 3/4/18 2:58 AM, Karl Rupp wrote:

Hi all,

I'm getting increasingly concerned about SELL not being a subclass of 
AIJ. As such, we have to deal with all these fallback operations now, 
whereas as a subclass of AIJ we could just selectively make use of the 
SELL format where we really benefit from it. "Use AIJ by default 
unless we have something optimized for SELL" is just much more 
appropriate for the few use cases of SELL than the current "SELL has 
to implement everything and usually this means to manually convert 
back to AIJ".


If there are no objections I'd like to clean this up. (Subclassing AIJ 
was unfortunately not available at the time Hong started his great 
work on SELL)


Best regards,
Karli



On 03/03/2018 07:52 AM, Richard Tran Mills wrote:

Resurrecting a few weeks old thread:

Stefano, did you get around to coding something up to do an automatic 
conversion to SeqAIJ for operations unsupported by SELL format? I did 
some hacking the other day to try to get PCGAMG to use SELL inside 
the smoothers and this turns out to be way more complicated than I'd 
like and very bug prone (I haven't found all of mine, anyway). I 
think it may be preferable to be able to pass a SELL matrix to PCGAMG 
and have an internal conversion happen in the SELL matrix to AIJ 
format for doing the MatPtAP and LU solves. Support for this would 
certainly make it easier for users in a lot other cases as well, and 
might make the use of SELL much more likely. If no one has already 
done some work on this, I'll take a stab at it.


--Richard

On Mon, Feb 12, 2018 at 10:04 AM, Richard Tran Mills <mailto:rtmi...@anl.gov>> wrote:


    On Mon, Feb 12, 2018 at 8:47 AM, Smith, Barry F. mailto:bsm...@mcs.anl.gov>> wrote:



    > On Feb 12, 2018, at 10:25 AM, Stefano Zampini 
mailto:stefano.zamp...@gmail.com>> wrote:

    >
    > Barry,
    >
    > for sure Amat,Pmat is the right approach; however, with 
complicated user codes, we are not always in control of having a 
different Jacobian matrix.
    > Since Mat*SELL does not currently support any 
preconditioning except PCSOR and PCJACOBI, we ask the user to put 
codes like

    >
    > if (type is SELL)
    >  create two matrices (and maybe modify the code in many 
other parts)

    > else
    >   ok with the previous code

    I don't disagree with what you are saying and am not opposed
    to the proposed work.

    Perhaps we need to do a better job with making the mat,pmat
    approach simpler or better documented so more people use it
    naturally in their applications.


    I wrote some code like that in some of the Jacobian/function
    routines in P

Re: [petsc-dev] Handling pull requests in a better way

2018-03-06 Thread Karl Rupp

Hi Richard,


I'm a bit late to the discussion, but I want to point out one of the 
issues I've encountered with pull requests: Often a pull request is 
submitted with multiple reviewers listed, and it's sometimes not clear 
how many of the reviewers need to look at it. I've spent some time 
looking over pull requests that I'm a reviewer for, and then when things 
look satisfactory, I mark it "approved". But sometimes my expertise only 
pertains to a portion of the pull request, and at least one of the other 
reviewers needs to look at it. Maybe those reviewers, in turn, figure 
they don't need to look because I've already approved it. And sometimes 
a pull request lists multiple reviewers, simply because any single one 
of several people could probably review the request and then merge it. 
In short: I think that one problem we have with pull requests is that 
it's not exactly clear how many people need to bless a particular 
request before approving and merging to next. On any request where Barry 
is also listed as a reviewer, even if I've thoroughly reviewed something 
and approved, I feel most comfortable waiting for Barry's blessing 
before any "integration" happens. But this is not scalable in Barrys. 
What is a better system?


well, we have several subsystems in PETSc that are best understood by 
devs other than Barry, so I'm not too worried about the scalability.


The shared responsibility of PR integration was (imho) the major problem 
of handling PRs in the past, much like a chore. By explicitly assigning 
integration responsibility to one (who may decide to delegate it), there 
should be no more implicit (and possibly circular) waiting for others to 
take final action (as you described above).


Best regards,
Karli


Re: [petsc-dev] Handling pull requests in a better way

2018-03-04 Thread Karl Rupp

Hi,

since nobody explicitly objected and since nobody volunteered for the PR 
integrator role, I'll take over this role for the next month or two. 
Let's evaluate the process then.


Best regards,
Karli


On 03/01/2018 12:33 PM, Karl Rupp wrote:

Dear PETSc folks,

I think we can do a better job when it comes to handling pull requests 
(PRs). We have several PRs piling up, which after some time (imho) get 
merged relatively carelessly instead of reaping the full benefits of a 
thorough review.


In order to improve the integration of pull requests, I propose to 
nominate a PR integrator, who is a-priori responsible for *all* incoming 
PRs. The PR integrator is free to delegate a particular PR integration 
to someone with the relevant domain-specific knowledge (e.g. Matt for 
DMPlex-related things) by appropriate comments on Bitbucket. In case of 
delays, the PR integrator is also responsible for issuing reminders over 
time (like Barry has done in the past).


The idea is to make daily progress with the PRs. One integration step 
per day (e.g. testing or merging to next) is presumably enough to handle 
the load, whereas things get messy if we let things pile up. Automated 
testing may help a bit in the future, but it doesn't release us from 
properly reviewing the contributed code.


Any objections to my PR integrator proposal? Any volunteers? ;-)
If nobody else wants to be the highly esteemed PR integrator, I can do 
it. ;-)


Best regards,
Karli


Re: [petsc-dev] plans for preconditioners for SeqSELL

2018-03-04 Thread Karl Rupp
   >
 >   Why not use the mat, pmat feature of the solvers to pass in
both matrices and have the solvers handle using two formats
simultaneously instead of burdening the MatSELL code with tons
of special code for automatically converting to AIJ for solvers etc?
 >
 >
     > >
 > > 2018-02-12 18:06 GMT+03:00 Stefano Zampini
mailto:stefano.zamp...@gmail.com>>:
 > >
 > >
 > > 2018-02-12 17:36 GMT+03:00 Jed Brown mailto:j...@jedbrown.org>>:
 > > Karl Rupp mailto:r...@iue.tuwien.ac.at>> writes:
 > >
 > > > Hi Stefano,
 > > >
 > > >> Is there any plan to write code for native ILU/ICC etc
for SeqSELL, at least to have BJACOBI in parallel?
 > > >
 > > > (imho) ILU/ICC is a pain to do with SeqSELL. Point-Jacobi
should be
 > > > possible, yes. SELL is really just tailored to MatMults
and a pain for
 > > > anything that is not very similar to a MatMult...
 > >
 > > There is already MatSOR_*SELL.  MatSolve_SeqSELL wouldn't
be any harder.
 > > I think it would be acceptable to convert to SeqAIJ,
factor, and convert
 > > the factors back to SELL.
 > >
 > > Yes, this was my idea. Today I have started coding
something. I'll push the branch whenever I have anything working
 > >
 > >
 > >
 > > --
 > > Stefano
 > >
 > >
 > >
 > > --
 > > Stefano
 >
 >
 >
 >
 > --
 > Stefano





Re: [petsc-dev] Handling pull requests in a better way

2018-03-01 Thread Karl Rupp

Hey,


I think people who are integrators should be responsible for their own pull 
requests.


yes, agreed, that's the most efficient way.

Still, the majority of open PRs is from developers who are not 
integrators. These are the ones that tend to hang around for too long.


Best regards,
Karli







On Mar 1, 2018, at 5:33 AM, Karl Rupp  wrote:

Dear PETSc folks,

I think we can do a better job when it comes to handling pull requests (PRs). 
We have several PRs piling up, which after some time (imho) get merged 
relatively carelessly instead of reaping the full benefits of a thorough review.

In order to improve the integration of pull requests, I propose to nominate a 
PR integrator, who is a-priori responsible for *all* incoming PRs. The PR 
integrator is free to delegate a particular PR integration to someone with the 
relevant domain-specific knowledge (e.g. Matt for DMPlex-related things) by 
appropriate comments on Bitbucket. In case of delays, the PR integrator is also 
responsible for issuing reminders over time (like Barry has done in the past).

The idea is to make daily progress with the PRs. One integration step per day 
(e.g. testing or merging to next) is presumably enough to handle the load, 
whereas things get messy if we let things pile up. Automated testing may help a 
bit in the future, but it doesn't release us from properly reviewing the 
contributed code.

Any objections to my PR integrator proposal? Any volunteers? ;-)
If nobody else wants to be the highly esteemed PR integrator, I can do it. ;-)

Best regards,
Karli




[petsc-dev] Handling pull requests in a better way

2018-03-01 Thread Karl Rupp

Dear PETSc folks,

I think we can do a better job when it comes to handling pull requests 
(PRs). We have several PRs piling up, which after some time (imho) get 
merged relatively carelessly instead of reaping the full benefits of a 
thorough review.


In order to improve the integration of pull requests, I propose to 
nominate a PR integrator, who is a-priori responsible for *all* incoming 
PRs. The PR integrator is free to delegate a particular PR integration 
to someone with the relevant domain-specific knowledge (e.g. Matt for 
DMPlex-related things) by appropriate comments on Bitbucket. In case of 
delays, the PR integrator is also responsible for issuing reminders over 
time (like Barry has done in the past).


The idea is to make daily progress with the PRs. One integration step 
per day (e.g. testing or merging to next) is presumably enough to handle 
the load, whereas things get messy if we let things pile up. Automated 
testing may help a bit in the future, but it doesn't release us from 
properly reviewing the contributed code.


Any objections to my PR integrator proposal? Any volunteers? ;-)
If nobody else wants to be the highly esteemed PR integrator, I can do 
it. ;-)


Best regards,
Karli


[petsc-dev] Release schedule?

2018-02-26 Thread Karl Rupp

Hi,

what is the current release schedule? Are we supposed to release a new 
PETSc version in March (maybe ECP-related)?


I'd like to prevent a mess similar to last summer ("release tomorrow no 
matter what!") and get GPU-features/cleanup ready on time.


Thanks and best regards,
Karli


Re: [petsc-dev] Request for comments on paper on PETSc TS

2018-02-21 Thread Karl Rupp

Hi,

thanks, that's a nice manuscript! I like the general setup and flow of 
discussion. Here are a couple of further comments; feel free to consider 
or ignore as you see fit:


Page 1: The references to MATLAB and NAG are incomplete, as they only 
show the year. Instead of "MATLAB [2014]" it should be something like 
"MATLAB [MATLAB 2014]" in order to be consistent with other references. 
Compare also with the chemkin reference on page 25.


Page 2: Consider removing the sentence "The document is organized as 
follows."


Page 3: Finish item "XXXSetUp(...) ... to be used" with a dot like for 
the other items in the list.


Page 4: (shift) is a somewhat ugly notation, because the parentheses are 
also used for mathematical expressions. Consider "[shift]" as an 
alternative.


Page 4: Swap order of the expressions (shift)F_{\dot u^n}(...) and 
F_{u^n}(...) in order to match the order used in the first line of the 
page: F_u + (shift)F_{\dot u}


Page 4: The sentence "For example, the backward Euler method \dot{u}^n = 
(u^n - u^{n-1})/\Delta t." is not a complete sentence.


Page 5/6: The code listing may benefit from more explanations, 
especially for readers who are not familiar with PETSc.


Page 7: The opening sentence "Oregonator: stiff ..." is not a full sentence.

Page 7/8: Again, a discussion of the code example is desirable.

Page 11: The table heading for Table 2 is too generic and does not 
properly explain the table content. Also, a table header like for Table 
1 should be added.


Page 12: Consider rotating Table 3 by 90 degrees and make it a full page 
table. This way the columns "Embed.", "Dense Output" and "Remarks" can 
be integrated. Also, prssp2 should have an entry in column 'SA'.


Page 14: Consider the same tweaks to Table 4 as for Table 3.

Page 13-16: The code listing is fairly heavy. Consider a reduction to 
the relevant parts. More importantly: Explain what is going on: Which 
PDE is solved ("a reaction-diffusion equation" is too vague), how are 
things discretized, etc.


Page 17: Replace "... registered vis the PETSc API." by "... registered 
via the PETSc API."


Page 17: Add blank after "estimation" in Table 5.

Page 18: Replace "... with adjoint method" by "... with the adjoint method".

Page 18: Replace "The features of PETSc adjoint solver" by "The features 
of the PETSc adjoint solver"


Page 19: "The paper [Marin et al. 2017] contains ..." is not a very 
elegant formulation. Consider to reformulate the sentence to something 
like "Details on using the infrastructure discussed here for solving 
PDE-constrained optimization problems utilizing the spectral element 
method can be found in the literature [Marin et al. 2017]."


Page 20: 'TSSetCostIntegrand()' is not formatted correctly.

Page 20: "apporach" -> "approach"

Page 21: "Figure 4 presents ..." should be supplemented by further 
explanations of the bouncing ball example, e.g.: Where does the 
event-handling kick in? What would happen without proper event handling? 
How much effort is required to set this up?
This would also convert an unpleasant one-sentence paragraph into a more 
pleasant multi-sentence paragraph.


Page 22: Consider rephrasing "Often users of ODE solver packages do not 
know ..." into "Users of ODE solver packages often do not know ..."


Page 22: Replace "... by the PETSc libraries" by "... by PETSc."

Page 23: Explain the TSView() output.

Page 25: Here I noticed the spelling "time-step". In earlier parts of 
the paper it was spelled "timestep". Please unify.


Page 26: Explain the code listing.

Page 26: A conclusion should be added, summarizing the paper and 
outlining the potential impact of having TS available for tacking 
problems in CSE.


References: Some entries contain full first names, others only the first 
letter. Please unify. Similarly, check for consistent capitalization of 
titles, e.g. "Using PETSc to Develop Scalable Applications ..." vs. 
"Evaluation of overlapping restricted additive ..."


Best regards,
Karli




On 02/19/2018 10:49 PM, Smith, Barry F. wrote:



    PETSc developers,

  We have recently completed a draft manuscript on the PETSc TS 
component, that includes discussion of adjoints and forward 
sensitivities. It is attached. We'd appreciate any feed back on the 
manuscript.


     Thanks

  Barry






Re: [petsc-dev] CUDA white paper of interest

2018-02-14 Thread Karl Rupp

Hi Jonathan,

thanks for your message and the pointer.

The incomplete factorizations have been around for a while, and with 
recent hardware they tend to be less competitive (note that they use a 
Tesla 2050 in their benchmarks, which is ~7 years old).


The fine-grained parallel version here:
 http://epubs.siam.org/doi/abs/10.1137/140968896
is an attractive alternative (and available in the master-branch of 
PETSc through ViennaCL), yet it also has drawbacks.


Best regards,
Karli


On 02/14/2018 12:02 AM, Jonathan Perry-Houts wrote:

Hi all,

I'm not sure if this is the right place to post this, but I wanted to
point out a new white paper I stumbled across about preconditioned
iterative solvers on GPU's:
http://docs.nvidia.com/cuda/incomplete-lu-cholesky/index.html
The speed-ups are not huge, but they're not negligible either. I thought
it might be of interest to some of you.

Cheers,
Jonathan



Re: [petsc-dev] plans for preconditioners for SeqSELL

2018-02-12 Thread Karl Rupp

Hi Stefano,


Is there any plan to write code for native ILU/ICC etc for SeqSELL, at least to 
have BJACOBI in parallel?


(imho) ILU/ICC is a pain to do with SeqSELL. Point-Jacobi should be 
possible, yes. SELL is really just tailored to MatMults and a pain for 
anything that is not very similar to a MatMult...


Best regards,
Karli



Re: [petsc-dev] why does veccuda.py exist?

2018-02-02 Thread Karl Rupp



Why can't the VECCUDA type coexist with the VECCUSP or VECVIENNACL types? If it 
can't coexist, can the code be reworked to allow it to coexist?


Currently it can't coexist because some variables are conditionally compiled 
and may be multiply defined (e.g. spptr).


   Hmm, I don't think so. The use of spprt shouldn't mean there cannot be both 
VECCUDA and VECCUSP at the same time (with different vectors obviously).


There is no fundamental reason why it can't work. The relevant GPU code 
just hasn't been cleaned up yet.


Currently we have in vecimpl.h:

#if defined(PETSC_HAVE_CUSP)
  PetscCUSPFlag  valid_GPU_array;/* indicates where the 
most recently modified vector data is (GPU or CPU) */
  void   *spptr; /* if we're using CUSP, then this is 
the special pointer to the array on the GPU */

#elif defined(PETSC_HAVE_VIENNACL)
  PetscViennaCLFlag  valid_GPU_array;/* indicates where the 
most recently modified vector data is (GPU or CPU) */
  void   *spptr; /* if we're using ViennaCL, then this 
is the special pointer to the array on the GPU */

#elif defined(PETSC_HAVE_VECCUDA)
  PetscCUDAFlag  valid_GPU_array;/* indicates where the 
most recently modified vector data is (GPU or CPU) */
  void   *spptr; /* if we're using CUDA, then this is 
the special pointer to the array on the GPU */

#endif


Words can't tell how ugly this is. Fixing the spptr-thing is trivial, 
valid_GPU_array requires a bit more work to achieve consistent behavior 
across multiple different GPU vector types.




I'm willing to do the refactorization and simplification but I need to know 
there is not some secret reason for these complications.


Unless you have to deliver something specific within the next few days, I'll 
(finally!) do it next week together with getting rid of VECCUSP.


So are we giving up on CUSP? And just using CUDA directly and ViennaCL?


We don't need both VECCUDA and VECCUSP. VECCUDA does not require an 
external library (part of the CUDA SDK!) and is at least as fast as 
VECCUSP, so the latter is obsolete (feature-wise they are the same).


However, I'm not saying that we give up on CUSP completely. CUSP's 
SA-AMG preconditioner is still useful. It just doesn't need a separate 
VECCUSP backend to operate (just like e.g. Hypre doesn't need a separate 
VECHYPRE).




These two steps should be done concurrently to avoid needless work.


There is no hurry; except I hate ugliness hanging around once I see it ;( 
It makes my skin itch, just knowing it exists ;)


Pain relief is on the way! ;-)

Best regards,
Karli


Re: [petsc-dev] why does veccuda.py exist?

2018-02-02 Thread Karl Rupp

Hey,


I'm am totally confused by

1) the existence of veccuda.py


if I remember correctly, its purpose is to make sure that one of the GPU 
backends is enabled if a user configures --with-cuda.




2) the fact that veccuda.py depends on some packages but is not a package and 
is not in packages/


I don't know this. In any case, veccuda.py is an artifact of a too rigid 
GPU backend implementation and should be removed once the GPU backend 
implementation is fixed.




Why can't the VECCUDA type coexist with the VECCUSP or VECVIENNACL types? If it 
can't coexist, can the code be reworked to allow it to coexist?


Currently it can't coexist because some variables are conditionally 
compiled and may be multiply defined (e.g. spptr).




Can we get rid of the veccuda.py and the PETSC_HAVE_VECCUDA flag and just 
always have the VECCUDA type if cuda is available?


Yes, that's possible after some refactorization.



I'm willing to do the refactorization and simplification but I need to know 
there is not some secret reason for these complications.


Unless you have to deliver something specific within the next few days, 
I'll (finally!) do it next week together with getting rid of VECCUSP. 
These two steps should be done concurrently to avoid needless work.


Best regards,
Karli


Re: [petsc-dev] [SPAM?] Re: [SPAM *****] Re: Issue with Lapack names

2017-12-18 Thread Karl Rupp




 > > This is related to a message I sent 2 years ago to petsc-maint
"Inconsistent naming of one Lapack subroutine", where I advocated
renaming LAPACKungqr_ --> LAPACKorgqr_. But that thread did not end
up in any modification...
 > >
 > > I can't find the thread. I also do not understand the problem.
Are you saying that the check succeeds but the routines is still
missing?
 >
 > No, the opposite. The routines are there, but since configure
decided (wrongly) that they are missing, the check would fail at run
time complaining that the routines are missing.
 >
 > Ah. Why does the check fail? It does succeed for a number of them.

I don't know the exact reason, but it has to do with the names of
real/complex subroutines. I guess the test is checking for dungqr,
which does not exist - it should check for either dorgqr or zungqr.
Before that commit, there were only checks for "real" names, but
after the commit there are a mix of real and complex subroutines.


Now I really want to punch one of the LAPACK guys in the face. Which one...

Karl, I think it is enough right now to change the complex names, like 
ungqr to orgqr as Jose suggests. Will this work for you?


works for me, yes.
If possible, I'd like to preserve the auto-generated nature of this 
list. If 'dungqr' is the only exception, then please adjust the list of 
tests accordingly *and* add a comment to BlasLapack.py saying why 
'dungqr' is special.


Best regards,
Karli




 >
 >   Thanks,
 >
 >     Matt
 >
 > Jose
 >
 > >
 > >   Thanks,
 > >
 > >      Matt
 > >
 > >
 > > Jose
 > > --
 > > What most experimenters take for granted before they begin
their experiments is infinitely more interesting than any results to
which their experiments lead.
 > > -- Norbert Wiener
 > >
 > > https://www.cse.buffalo.edu/~knepley/

 >
 >
 >
>
> --
> What most experimenters take for granted before they begin their 
experiments is infinitely more interesting than any results to which their 
experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/





--
What most experimenters take for granted before they begin their 
experiments is infinitely more interesting than any results to which 
their experiments lead.

-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-dev] PETSc Quarterly Telecon

2017-12-18 Thread Karl Rupp

Hi all,

here are my coarse-grained notes from the meeting:

* 'next' branch: Currently we are fighting the problem of integration 
testing vs. basic testing in the same branch.

  Barry: Run basic tests before merging to next!
  'make alltest' for ~3 configurations should cover most problems.

* Test harness:
  Migration to new test harness will take another month or two.
  Most directories already converted, but a couple of tricky 
directories are left.
  Discussion about speeding up tests, new test harness should provide 
better fine-grained control.
  Jed makes a case for Github instead of Bitbucket. Will be considered 
in the future, but no urgency.


* NUMFOCUS:
  Fell off the urgency list.
  Still considered to be a good idea.

* PETSc User Meeting:
  Contact sponsors by the end of the year.
  Make a list for people to contact at institutions in Europe.

* SIAM Annual Meeting:
  Richard and Karl will continue the idea of hosting a minisymposium.
  Other participants of the call are most likely not going there.

* Documentation:
  Patrick wants to set up better introductory tutorials.
  Deal.ii-style tutorials desirable, yet there's a significant 
maintenance cost attached to it.

  Split up tutorial slides into smaller chunks (one PDF per chapter).
  Stop duplicating code tutorials where only a few lines of code are 
actually different.



Best regards,
Karli



On 12/18/2017 05:46 PM, Richard Tran Mills wrote:

Karl,

Thanks for organizing this. Unfortunately, I'll be on a plane at that 
time so I won't be able to join. I would be grateful if someone could 
write some brief meeting minutes for those of us who are interested but 
cannot make it.


Thanks,
Richard

On Mon, Dec 18, 2017 at 2:51 AM, Karl Rupp <mailto:r...@iue.tuwien.ac.at>> wrote:


Hi all,

the PETSc quarterly telecon will take place today, Monday, at 2pm
Chicago time.

If anyone of you has a good software to hold the telecon (maybe
BlueJeans or Zoom), please post a link to the channel. If no link has
been posted by 1:30pm CT, I'll post a Google Hangouts link.

Incomplete list of topics to discuss:
   * User Meeting 2018
   * Organizational Ideas (e.g. NUMFOCUS)
   * Test Harness
   * Documentation improvements
   * Integration management for branch 'next'

Please feel free to propose additional items to discuss by replying to
this email.

    Best regards,
Karli




On 12/13/2017 07:12 AM, Karl Rupp wrote:
 > Dear PETSc folks,
 >
 > the next quarterly teleconference is due. It is intended to
discuss "the
 > bigger picture" rather than technical details. Thus, feel free to
join
 > even if you have not yet contributed thousands of lines of code
(yet) ;-)
 >
 > Please state your availability by Sunday, December 17, here:
 > https://www.when2meet.com/?6542211-1rhUE
<https://www.when2meet.com/?6542211-1rhUE>
 > All times are relative to the PETSc meridian (Chicago/Central Time).
 >
 > (and let Jed know if you think that Doodle has a better interface
;-) )
 >
 > If we can't find a reasonable time slot due to Christmas, I'll
open up a
 > another poll in early January.
 >
 > Thanks and best regards,
 > Karli




Re: [petsc-dev] PETSc Quarterly Telecon

2017-12-18 Thread Karl Rupp

Additional topic to discuss:
  * Minisymposium at SIAM Annual Meeting



On 12/18/2017 11:51 AM, Karl Rupp wrote:

Hi all,

the PETSc quarterly telecon will take place today, Monday, at 2pm 
Chicago time.


If anyone of you has a good software to hold the telecon (maybe 
BlueJeans or Zoom), please post a link to the channel. If no link has 
been posted by 1:30pm CT, I'll post a Google Hangouts link.


Incomplete list of topics to discuss:
  * User Meeting 2018
  * Organizational Ideas (e.g. NUMFOCUS)
  * Test Harness
  * Documentation improvements
  * Integration management for branch 'next'

Please feel free to propose additional items to discuss by replying to 
this email.


Best regards,
Karli




On 12/13/2017 07:12 AM, Karl Rupp wrote:

Dear PETSc folks,

the next quarterly teleconference is due. It is intended to discuss 
"the bigger picture" rather than technical details. Thus, feel free to 
join even if you have not yet contributed thousands of lines of code 
(yet) ;-)


Please state your availability by Sunday, December 17, here: 
https://www.when2meet.com/?6542211-1rhUE

All times are relative to the PETSc meridian (Chicago/Central Time).

(and let Jed know if you think that Doodle has a better interface ;-) )

If we can't find a reasonable time slot due to Christmas, I'll open up 
a another poll in early January.


Thanks and best regards,
Karli


Re: [petsc-dev] PETSc Quarterly Telecon

2017-12-18 Thread Karl Rupp

Hi all,

the PETSc quarterly telecon will take place today, Monday, at 2pm 
Chicago time.


If anyone of you has a good software to hold the telecon (maybe 
BlueJeans or Zoom), please post a link to the channel. If no link has 
been posted by 1:30pm CT, I'll post a Google Hangouts link.


Incomplete list of topics to discuss:
 * User Meeting 2018
 * Organizational Ideas (e.g. NUMFOCUS)
 * Test Harness
 * Documentation improvements
 * Integration management for branch 'next'

Please feel free to propose additional items to discuss by replying to 
this email.


Best regards,
Karli




On 12/13/2017 07:12 AM, Karl Rupp wrote:

Dear PETSc folks,

the next quarterly teleconference is due. It is intended to discuss "the 
bigger picture" rather than technical details. Thus, feel free to join 
even if you have not yet contributed thousands of lines of code (yet) ;-)


Please state your availability by Sunday, December 17, here: 
https://www.when2meet.com/?6542211-1rhUE

All times are relative to the PETSc meridian (Chicago/Central Time).

(and let Jed know if you think that Doodle has a better interface ;-) )

If we can't find a reasonable time slot due to Christmas, I'll open up a 
another poll in early January.


Thanks and best regards,
Karli


[petsc-dev] PETSc Quarterly Telecon

2017-12-12 Thread Karl Rupp

Dear PETSc folks,

the next quarterly teleconference is due. It is intended to discuss "the 
bigger picture" rather than technical details. Thus, feel free to join 
even if you have not yet contributed thousands of lines of code (yet) ;-)


Please state your availability by Sunday, December 17, here: 
https://www.when2meet.com/?6542211-1rhUE

All times are relative to the PETSc meridian (Chicago/Central Time).

(and let Jed know if you think that Doodle has a better interface ;-) )

If we can't find a reasonable time slot due to Christmas, I'll open up a 
another poll in early January.


Thanks and best regards,
Karli


Re: [petsc-dev] Random123

2017-09-05 Thread Karl Rupp

Hi Toby,

FYI: the Random123 license is just what is commonly referred to as the 
2-clause BSD license:

 https://opensource.org/licenses/BSD-2-Clause
I don't see any problem with making a repo for it and using it for a 
PetscRandom implementation (IANAL).


Best regards,
Karli



On 09/05/2017 06:26 PM, Tobin Isaac wrote:

I just came across Random123 [1]: portable, parallel, high-quality pseudorandom 
number generators.  It's developed by DE Shaw, but the license [2] looks to me 
like we should be able to make a repo for it and use it as a PetscRandom 
implementation.  Does anyone savvier than me want to look at the license and 
see if I'm missing something?

Cheers,
   Toby

[1]: https://doi.org/10.1145/2063384.2063405
[2]: 
http://www.deshawresearch.com/downloads/download_random123.cgi/Random123_License.txt



Re: [petsc-dev] Routines that change matrix entries without requiring another MatAssemblyBegin/End?

2017-07-30 Thread Karl Rupp


Well, I thought of AIJMKL keeping track of the state for which 
mkl_sparse_optimize() was called. If the matrix state changes, the next call to 
MatMult()


is MatMult the only operation that needs this check, or do many Mat methods 
need this check?


Looks like it is needed for MatMult(), MatMatMul() and triangular solves:
https://software.intel.com/en-us/mkl-developer-reference-fortran-inspector-executor-sparse-blas-analysis-routines



For parallel matrices are all the operations that need to do this kind of 
update collective over the Mat?


According to the list above, all operations are collective.



will detect that the current state does not match the state it was optimized 
for and hence trigger another optimization. This isn't too different from what 
we do for GPU stuff.


We do this for cusparse

static PetscErrorCode MatMult_SeqAIJCUSPARSE(Mat A,Vec xx,Vec yy)
{

   PetscFunctionBegin;
   /* The line below is necessary due to the operations that modify the matrix 
on the CPU (axpy, scale, etc) */
   ierr = MatSeqAIJCUSPARSECopyToGPU(A);CHKERRQ(ierr);

But for CUSP (deprecated) we do it at MatAssembly time

PetscErrorCode MatAssemblyEnd_SeqAIJCUSP(Mat A,MatAssemblyType mode)
{
   PetscErrorCode ierr;

   PetscFunctionBegin;
   ierr = MatAssemblyEnd_SeqAIJ(A,mode);CHKERRQ(ierr);
   ierr = MatCUSPCopyToGPU(A);CHKERRQ(ierr);


   as we do it for ViennaCL

PetscErrorCode MatAssemblyEnd_SeqAIJViennaCL(Mat A,MatAssemblyType mode)
{
   PetscErrorCode ierr;

   PetscFunctionBegin;
   ierr = MatAssemblyEnd_SeqAIJ(A,mode);CHKERRQ(ierr);
   ierr = MatViennaCLCopyToGPU(A);CHKERRQ(ierr);

   For SeqAIJPERM we do that extra processing of the data in MatAssemblyEnd and 
don't need to do anything if only the numerical values change.

   So we seem to have some inconsistencies in how we handle this which will 
lead to more errors.


ViennaCL should adopt the CUSPARSE model then (and CUSP be removed).

Best regards,
Karli


Re: [petsc-dev] Routines that change matrix entries without requiring another MatAssemblyBegin/End?

2017-07-29 Thread Karl Rupp



   In theory PETSc Mat have two "state" values,

1) nonzerostate - this is increased anytime the nonzero structure changes

2) state - this is increased anytime any numerical values are changed

These are used by PCSetUp() to determine if the preconditioner needs to be 
updated and if that involves a new nonzero structure.
The reason I say "in theory" is that it looks like certain changes to matrices 
do not properly update the state.

Hi Barry: Sounds good. In the meantime, I'm simply making a corresponding 
_SeqAIJMKL version of all of the functions in question that will just call the 
_SeqAIJ version and then call a function that creates an updated MKL sparse 
matrix handle. It appears that several of these do not have function prototypes 
in any of the .h files (they haven't been needed outside of aij.c). I assume I 
can just add PETSC_INTERN prototypes in src/mat/impls/aij/seq/aij.h (where we 
have things like prototypes for MatAssemblyEnd_SeqAIJ)?

   I think this is the correct place. If they have static in front of them you 
will need to remove that also.


I'm just wondering: Isn't fixing the state variables the faster, easier and 
more maintainable approach?


  So you fix the state variable by reseting with each each function that changes the 
matrix entries, how does that magically cause the MKL "convert to MKL format" 
in these cases? It is a slightly orthogonal issue I think.


Well, I thought of AIJMKL keeping track of the state for which 
mkl_sparse_optimize() was called. If the matrix state changes, the next 
call to MatMult() will detect that the current state does not match the 
state it was optimized for and hence trigger another optimization. This 
isn't too different from what we do for GPU stuff.




   Or for every PETSc object do we optionally provide a callback function that is called 
with each change to state? Currently this would not be efficient because there may be 
several (many) changes to state before we want the subtype data structure to be updated. 
At the moment we have a "hard" MatAssemblyEnd() that handles changes in nonzero 
structure, may be we need something similar for changes in numerical value? BTW this is 
also true for CUDA/CL representations?


A callback for every state change is too greedy, yes. As long as state 
and nonzerostate are reliable, we should be good in terms of outside 
dependencies on the matrix values and structure, respectively.


Best regards,
Karli


Re: [petsc-dev] Routines that change matrix entries without requiring another MatAssemblyBegin/End?

2017-07-28 Thread Karl Rupp

Hey,



   In theory PETSc Mat have two "state" values,

1) nonzerostate - this is increased anytime the nonzero structure changes

2) state - this is increased anytime any numerical values are changed

These are used by PCSetUp() to determine if the preconditioner needs to be 
updated and if that involves a new nonzero structure.

The reason I say "in theory" is that it looks like certain changes to matrices 
do not properly update the state.

Hi Barry: Sounds good. In the meantime, I'm simply making a corresponding 
_SeqAIJMKL version of all of the functions in question that will just call the 
_SeqAIJ version and then call a function that creates an updated MKL sparse 
matrix handle. It appears that several of these do not have function prototypes 
in any of the .h files (they haven't been needed outside of aij.c). I assume I 
can just add PETSC_INTERN prototypes in src/mat/impls/aij/seq/aij.h (where we 
have things like prototypes for MatAssemblyEnd_SeqAIJ)?


   I think this is the correct place. If they have static in front of them you 
will need to remove that also.


I'm just wondering: Isn't fixing the state variables the faster, easier 
and more maintainable approach?


Best regards,
Karli


Re: [petsc-dev] GPU regression tests

2017-07-28 Thread Karl Rupp

Hi Alejandro,


I have tested the branch and it seems to fix the problem. Our software runs fine
with it.


great, thanks for the quick feedback.

Best regards,
Karli


  1   2   3   4   5   6   >