[julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread dextorious
I hope to look at this when I get some time, but as a preliminary note, 
merely applying the @inbounds and @simd macros to the main for loop yields 
an increase in performance of about 15-20% on my machine.


Re: [julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Jiahao Chen
Thanks all for the suggestions so far. Yes, I'm using julia 0.4-dev for the 
basis of this discussion.


Re: [julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Jiahao Chen
On Wed, Mar 25, 2015 at 1:13 PM, Jason Riedy ja...@lovesgoodfood.com wrote:
 Similarly for moving the row scaling and next pivot search into
 the loop.

I tried to manually inline idxmaxabs. It made absolutely no difference
on my machine. The row scaling takes ~0.05% of total execution time.

Thanks,

Jiahao Chen
Staff Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory


[julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Jiahao Chen
 The swap could be done without temporaries, but I assume you're also 
trying to match the look of the pseudocode?

It would be interesting to see how fast the code can get without 
significantly altering its look, or alternatively how much one would have 
to change to achieve speedups.

I profiled the code for a 500 x 500 random matrix and the swaps took ~ 0.5% 
of the execution time, IIRC. I'm not too concerned with those particular 
lines.


[julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Matt Bauman
The swap could be done without temporaries, but I assume you're also trying 
to match the look of the pseudocode?

On Wednesday, March 25, 2015 at 11:22:41 AM UTC-4, Jiahao Chen wrote:

 Here is some code I wrote for completely pivoted LU factorizations. 
 Can you make it even faster? 

 Anyone who can demonstrate verifiable speedups (or find bugs relative 
 to the textbook description) while sticking to pure Julia code wins an 
 acknowledgment in an upcoming paper I'm writing about Julia, and a 
 small token of my appreciation with no cash value. :) 

 Reference: G. H. Golub and C. F. Van Loan, Matrix Computations 4/e, 
 Algorithm 3.4.3, p. 132. 

 Thanks, 

 Jiahao Chen 
 Staff Research Scientist 
 MIT Computer Science and Artificial Intelligence Laboratory 



[julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Jiahao Chen
Also, Andreas just pointed out the loop in indmaxabs traverses the matrix 
in row major order, not column major. (for j in s, i in r is faster)


Re: [julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Tim Holy
If you want it to look nice and are running on 0.4, just switching to

slice(A, 1:n, k) ↔ slice(A, 1:n, λ)

should also get you a performance boost (especially for large matrices). 
Obviously you could do even better by devectorizing, but it wouldn't be as 
pretty.

Off-topic, but your use of unicode for this is very elegant, and eye-opening 
for me.

Best,
--Tim

On Wednesday, March 25, 2015 09:24:09 AM Matt Bauman wrote:
 The swap could be done without temporaries, but I assume you're also trying
 to match the look of the pseudocode?
 
 On Wednesday, March 25, 2015 at 11:22:41 AM UTC-4, Jiahao Chen wrote:
  Here is some code I wrote for completely pivoted LU factorizations.
  Can you make it even faster?
  
  Anyone who can demonstrate verifiable speedups (or find bugs relative
  to the textbook description) while sticking to pure Julia code wins an
  acknowledgment in an upcoming paper I'm writing about Julia, and a
  small token of my appreciation with no cash value. :)
  
  Reference: G. H. Golub and C. F. Van Loan, Matrix Computations 4/e,
  Algorithm 3.4.3, p. 132.
  
  Thanks,
  
  Jiahao Chen
  Staff Research Scientist
  MIT Computer Science and Artificial Intelligence Laboratory



[julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Jason Riedy
And Tim Holy writes:
 Obviously you could do even better by devectorizing, but it
 wouldn't be as pretty.

Similarly for moving the row scaling and next pivot search into
the loop.



Re: [julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Tim Holy
Actually, didn't the original implementation have a couple of bugs?

- A[1:n, k] makes a copy, so I'm not sure you were actually swapping elements 
in the original A
- If A[i,j]  0, you're storing a negative value in themax, making it easy for 
the next nonnegative value to beat it. You presumably want to store 
abs(A[i,j]).

See attached (which is also faster, on 0.4).

--Tim

On Wednesday, March 25, 2015 11:37:07 AM you wrote:
 If you want it to look nice and are running on 0.4, just switching to
 
 slice(A, 1:n, k) ↔ slice(A, 1:n, λ)
 
 should also get you a performance boost (especially for large matrices).
 Obviously you could do even better by devectorizing, but it wouldn't be as
 pretty.
 
 Off-topic, but your use of unicode for this is very elegant, and eye-opening
 for me.
 
 Best,
 --Tim
 
 On Wednesday, March 25, 2015 09:24:09 AM Matt Bauman wrote:
  The swap could be done without temporaries, but I assume you're also
  trying
  to match the look of the pseudocode?
  
  On Wednesday, March 25, 2015 at 11:22:41 AM UTC-4, Jiahao Chen wrote:
   Here is some code I wrote for completely pivoted LU factorizations.
   Can you make it even faster?
   
   Anyone who can demonstrate verifiable speedups (or find bugs relative
   to the textbook description) while sticking to pure Julia code wins an
   acknowledgment in an upcoming paper I'm writing about Julia, and a
   small token of my appreciation with no cash value. :)
   
   Reference: G. H. Golub and C. F. Van Loan, Matrix Computations 4/e,
   Algorithm 3.4.3, p. 132.
   
   Thanks,
   
   Jiahao Chen
   Staff Research Scientist
   MIT Computer Science and Artificial Intelligence Laboratory
x ↔ y = for i=1:length(x) #Define swap function
  x[i], y[i] = y[i], x[i]
end

function idxmaxabs(A, r)
r1 = r[1]
μ, λ = r1, r1
themax = abs(A[r1, r1])
@inbounds for j in r, i in r
a = abs(A[i,j])
if a  themax
μ, λ, themax = i, j, A[i, j]
end
end
return μ, λ
end

function lucompletepiv!(A)
  n=size(A, 1)
  rowpiv=zeros(Int, n-1)
  colpiv=zeros(Int, n-1)
  for k=1:n-1
μ, λ = idxmaxabs(A, k:n)
rowpiv[k] = μ
slice(A, k, 1:n) ↔ slice(A, μ, 1:n)
colpiv[k] = λ
slice(A, 1:n, k) ↔ slice(A, 1:n, λ)
if A[k,k] ≠ 0
  ρ = k+1:n
  scale!(1/A[k,k], sub(A, ρ, k))
  @inbounds for j in ρ
  Akj = A[k, j]
  @simd for i in ρ
  A[i, j] -= A[i, k] * Akj
  end
  end
end
  end
  return (A, rowpiv, colpiv)
end


[julia-users] Re: Can you make this linear algebra code faster?

2015-03-25 Thread Jason Riedy
And Jiahao Chen writes:
 I tried to manually inline idxmaxabs. It made absolutely no difference
 on my machine. The row scaling takes ~0.05% of total execution time.

Simply inlining, sure, but you could scale inside the outer loop
and find next the pivot in the inner loop.  Making only a single
pass over the data should save more than 0.05% once you leave
cache.  But as long as you're in cache (500x500 is approx. 2MiB),
not much will matter.

Ultimately, I'm not sure who's interested in complete pivoting
for LU.  That choice alone kills performance on modern machines
for negligible benefit.  You likely would find more interest for
column-pivoted QR or rook pivoting in LDL^T.