Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly

Zhang, Junchao via petsc-dev Wed, 10 Jul 2019 14:52:08 -0700

Fande,
  I ran your code with two processes and found the poor performance of 
PetscSortIntWithArrayPair() was due to duplicates.  In particular,  rank 0 has 
array length = 0 and rank 1 has array length = 4,180,070. On rank 1, each 
unique array value has ~95 duplicates; The duplicates are already clustered 
together before sorting.
  The old quick sort algorithm has an O(n^2) complexity on these duplicates. I 
added a three-way-partition algorithm to split the array into <, =, >pivot 
parts and get these numbers:


Master:
$ time mpirun -n 2 ./ex_petsc_only
real 0m55.359s
user 1m7.807s
sys 0m42.651s

Master:
$ time mpirun -n 2 ./ex_petsc_only -matstash_legacy
real 0m0.987s
user 0m1.565s
sys 0m0.285s

Three way partition
$ time mpirun -n 2 ./ex_petsc_only
real 0m1.015s
user 0m1.535s
sys 0m0.392s

We can see the new sort algorithm gives a 55x speedup and can almost catch up 
with -matstash_legacy. So I think it is good to have no matter whether we will 
do hashing or not in mat assembly.
Please try branch jczhang/feature-better-quicksort-pivot on your side to see if 
it helps and if the segfault persist.
Thanks.

--Junchao Zhang


On Tue, Jul 9, 2019 at 10:18 AM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
Hi Junchao,

If you are struggling on building the right package environment, here is a 
native PETSc example.


Fande,

On Mon, Jul 8, 2019 at 3:00 PM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
I guess John has a pure PETSc example,

John, could you share the pure PETSc example with Junchao?


Fande,

On Mon, Jul 8, 2019 at 2:58 PM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
Yes, here it is https://github.com/fdkong/matrixsparsity


You need to follow instructions here to install MOOSE 
https://www.mooseframework.org/getting_started/installation/mac_os.html


Thanks for your help.


Fande



On Mon, Jul 8, 2019 at 2:28 PM Zhang, Junchao 
<[email protected]<mailto:[email protected]>> wrote:
Is the code public for me to test?
--Junchao Zhang


On Mon, Jul 8, 2019 at 3:06 PM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Junchao,

Tried your code. I did not hit seg fault this time, but the assembly was still 
slow


time mpirun -n 2 ./matrix_sparsity-opt   -matstash_legacy
Close matrix for np = 2 ...
Matrix successfully closed

real 0m2.009s
user 0m3.324s
sys 0m0.575s




 time mpirun -n 2 ./matrix_sparsity-opt
Close matrix for np = 2 ...
Matrix successfully closed

real 3m39.235s
user 6m42.184s
sys 0m35.084s




Fande,




On Mon, Jul 8, 2019 at 8:47 AM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
Will let you know soon.

Thanks,

Fande,

On Mon, Jul 8, 2019 at 8:41 AM Zhang, Junchao 
<[email protected]<mailto:[email protected]>> wrote:
Fande or John,
  Could any of you have a try? Thanks
--Junchao Zhang


---------- Forwarded message ---------
From: Junchao Zhang <[email protected]<mailto:[email protected]>>
Date: Thu, Jul 4, 2019 at 8:21 AM
Subject: Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly
To: Fande Kong <[email protected]<mailto:[email protected]>>


Fande,
  I wrote tests but could not reproduce the error. I pushed a commit that 
changed the MEDIAN macro to a function to make it easier to debug.  Could you 
run and debug it again? It should be easy to see what is wrong in gdb.
  Thanks.
--Junchao Zhang


On Wed, Jul 3, 2019 at 6:48 PM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
Process 3915 resuming
Process 3915 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
(code=2, address=0x7ffee9b91fc8)
    frame #0: 0x000000010cbaa031 
libpetsc.3.011.dylib`PetscSortIntWithArrayPair_Private(L=0x0000000119fc5480, 
J=0x000000011bfaa480, K=0x000000011ff74480, right=13291) at sorti.c:298
   295      }
   296      PetscFunctionReturn(0);
   297    }
-> 298    i    = MEDIAN(L,right);
   299    SWAP3(L[0],L[i],J[0],J[i],K[0],K[i],tmp);
   300    vl   = L[0];
   301    last = 0;
(lldb)


On Wed, Jul 3, 2019 at 4:32 PM Zhang, Junchao 
<[email protected]<mailto:[email protected]>> wrote:
Could you debug it or paste the stack trace? Since it is a segfault, it should 
be easy.
--Junchao Zhang


On Wed, Jul 3, 2019 at 5:16 PM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Junchao,

But there is still segment fault. I guess you could write some continuous 
integers to test your changes.


Fande

On Wed, Jul 3, 2019 at 12:57 PM Zhang, Junchao 
<[email protected]<mailto:[email protected]>> wrote:
Fande and John,
  Could you try jczhang/feature-better-quicksort-pivot? It passed Jenkins tests 
and I could not imagine why it failed on yours.
  Hash table has its own cost. We'd better get quicksort right and see how it 
performs before rewriting code.
--Junchao Zhang


On Tue, Jul 2, 2019 at 2:37 PM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault: 11 
(signal 11)

Segmentation fault :-)


As Jed said, it might be a good idea to rewrite the code using the hashing 
table.


Fande,


On Tue, Jul 2, 2019 at 1:27 PM Zhang, Junchao 
<[email protected]<mailto:[email protected]>> wrote:
Try this to see if it helps:

diff --git a/src/sys/utils/sorti.c b/src/sys/utils/sorti.c
index 1b07205a..90779891 100644
--- a/src/sys/utils/sorti.c
+++ b/src/sys/utils/sorti.c
@@ -294,7 +294,8 @@ static PetscErrorCode 
PetscSortIntWithArrayPair_Private(PetscInt *L,PetscInt *J,
     }
     PetscFunctionReturn(0);
   }
-  SWAP3(L[0],L[right/2],J[0],J[right/2],K[0],K[right/2],tmp);
+  i = MEDIAN(L,right);
+  SWAP3(L[0],L[i],J[0],J[i],K[0],K[i],tmp);
   vl   = L[0];
   last = 0;
   for (i=1; i<=right; i++) {


On Tue, Jul 2, 2019 at 12:14 PM Fande Kong via petsc-dev 
<[email protected]<mailto:[email protected]>> wrote:
BTW,

PetscSortIntWithArrayPair is used in MatStashSortCompress_Private.

Any way to avoid to use PetscSortIntWithArrayPair in 
MatStashSortCompress_Private?

Fande,

On Tue, Jul 2, 2019 at 11:09 AM Fande Kong 
<[email protected]<mailto:[email protected]>> wrote:
Hi Developers,

John just noticed that the matrix assembly was slow when having sufficient 
amount of off-diagonal entries. It was not a MPI issue since I was  able to 
reproduce the issue using two cores on my desktop, that is, "mpirun -n 2".

I turned  on a profiling, and 99.99% of the time was spent on 
PetscSortIntWithArrayPair (recursively calling).   It took THREE MINUTES  to 
get the assembly done. And then changed to use the option "-matstash_legacy" to 
restore
the code to the old assembly routine, and the same code took ONE SECOND to get 
the matrix assembly done.

Should write any better sorting algorithms?


Fande,

Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly

Reply via email to