[julia-users] Re: Can you make this linear algebra code faster?
I hope to look at this when I get some time, but as a preliminary note, merely applying the @inbounds and @simd macros to the main for loop yields an increase in performance of about 15-20% on my machine.
Re: [julia-users] Re: Can you make this linear algebra code faster?
Thanks all for the suggestions so far. Yes, I'm using julia 0.4-dev for the basis of this discussion.
Re: [julia-users] Re: Can you make this linear algebra code faster?
On Wed, Mar 25, 2015 at 1:13 PM, Jason Riedy ja...@lovesgoodfood.com wrote: Similarly for moving the row scaling and next pivot search into the loop. I tried to manually inline idxmaxabs. It made absolutely no difference on my machine. The row scaling takes ~0.05% of total execution time. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory
[julia-users] Re: Can you make this linear algebra code faster?
The swap could be done without temporaries, but I assume you're also trying to match the look of the pseudocode? It would be interesting to see how fast the code can get without significantly altering its look, or alternatively how much one would have to change to achieve speedups. I profiled the code for a 500 x 500 random matrix and the swaps took ~ 0.5% of the execution time, IIRC. I'm not too concerned with those particular lines.
[julia-users] Re: Can you make this linear algebra code faster?
The swap could be done without temporaries, but I assume you're also trying to match the look of the pseudocode? On Wednesday, March 25, 2015 at 11:22:41 AM UTC-4, Jiahao Chen wrote: Here is some code I wrote for completely pivoted LU factorizations. Can you make it even faster? Anyone who can demonstrate verifiable speedups (or find bugs relative to the textbook description) while sticking to pure Julia code wins an acknowledgment in an upcoming paper I'm writing about Julia, and a small token of my appreciation with no cash value. :) Reference: G. H. Golub and C. F. Van Loan, Matrix Computations 4/e, Algorithm 3.4.3, p. 132. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory
[julia-users] Re: Can you make this linear algebra code faster?
Also, Andreas just pointed out the loop in indmaxabs traverses the matrix in row major order, not column major. (for j in s, i in r is faster)
Re: [julia-users] Re: Can you make this linear algebra code faster?
If you want it to look nice and are running on 0.4, just switching to slice(A, 1:n, k) ↔ slice(A, 1:n, λ) should also get you a performance boost (especially for large matrices). Obviously you could do even better by devectorizing, but it wouldn't be as pretty. Off-topic, but your use of unicode for this is very elegant, and eye-opening for me. Best, --Tim On Wednesday, March 25, 2015 09:24:09 AM Matt Bauman wrote: The swap could be done without temporaries, but I assume you're also trying to match the look of the pseudocode? On Wednesday, March 25, 2015 at 11:22:41 AM UTC-4, Jiahao Chen wrote: Here is some code I wrote for completely pivoted LU factorizations. Can you make it even faster? Anyone who can demonstrate verifiable speedups (or find bugs relative to the textbook description) while sticking to pure Julia code wins an acknowledgment in an upcoming paper I'm writing about Julia, and a small token of my appreciation with no cash value. :) Reference: G. H. Golub and C. F. Van Loan, Matrix Computations 4/e, Algorithm 3.4.3, p. 132. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory
[julia-users] Re: Can you make this linear algebra code faster?
And Tim Holy writes: Obviously you could do even better by devectorizing, but it wouldn't be as pretty. Similarly for moving the row scaling and next pivot search into the loop.
Re: [julia-users] Re: Can you make this linear algebra code faster?
Actually, didn't the original implementation have a couple of bugs? - A[1:n, k] makes a copy, so I'm not sure you were actually swapping elements in the original A - If A[i,j] 0, you're storing a negative value in themax, making it easy for the next nonnegative value to beat it. You presumably want to store abs(A[i,j]). See attached (which is also faster, on 0.4). --Tim On Wednesday, March 25, 2015 11:37:07 AM you wrote: If you want it to look nice and are running on 0.4, just switching to slice(A, 1:n, k) ↔ slice(A, 1:n, λ) should also get you a performance boost (especially for large matrices). Obviously you could do even better by devectorizing, but it wouldn't be as pretty. Off-topic, but your use of unicode for this is very elegant, and eye-opening for me. Best, --Tim On Wednesday, March 25, 2015 09:24:09 AM Matt Bauman wrote: The swap could be done without temporaries, but I assume you're also trying to match the look of the pseudocode? On Wednesday, March 25, 2015 at 11:22:41 AM UTC-4, Jiahao Chen wrote: Here is some code I wrote for completely pivoted LU factorizations. Can you make it even faster? Anyone who can demonstrate verifiable speedups (or find bugs relative to the textbook description) while sticking to pure Julia code wins an acknowledgment in an upcoming paper I'm writing about Julia, and a small token of my appreciation with no cash value. :) Reference: G. H. Golub and C. F. Van Loan, Matrix Computations 4/e, Algorithm 3.4.3, p. 132. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory x ↔ y = for i=1:length(x) #Define swap function x[i], y[i] = y[i], x[i] end function idxmaxabs(A, r) r1 = r[1] μ, λ = r1, r1 themax = abs(A[r1, r1]) @inbounds for j in r, i in r a = abs(A[i,j]) if a themax μ, λ, themax = i, j, A[i, j] end end return μ, λ end function lucompletepiv!(A) n=size(A, 1) rowpiv=zeros(Int, n-1) colpiv=zeros(Int, n-1) for k=1:n-1 μ, λ = idxmaxabs(A, k:n) rowpiv[k] = μ slice(A, k, 1:n) ↔ slice(A, μ, 1:n) colpiv[k] = λ slice(A, 1:n, k) ↔ slice(A, 1:n, λ) if A[k,k] ≠ 0 ρ = k+1:n scale!(1/A[k,k], sub(A, ρ, k)) @inbounds for j in ρ Akj = A[k, j] @simd for i in ρ A[i, j] -= A[i, k] * Akj end end end end return (A, rowpiv, colpiv) end
[julia-users] Re: Can you make this linear algebra code faster?
And Jiahao Chen writes: I tried to manually inline idxmaxabs. It made absolutely no difference on my machine. The row scaling takes ~0.05% of total execution time. Simply inlining, sure, but you could scale inside the outer loop and find next the pivot in the inner loop. Making only a single pass over the data should save more than 0.05% once you leave cache. But as long as you're in cache (500x500 is approx. 2MiB), not much will matter. Ultimately, I'm not sure who's interested in complete pivoting for LU. That choice alone kills performance on modern machines for negligible benefit. You likely would find more interest for column-pivoted QR or rook pivoting in LDL^T.