[julia-users] Re: Optim/DualNumbers question (maybe a math question?)
Thomas, If you're trying to invert (factorize) a matrix as part of your objective function, it's generally a much better idea to reformulate your optimization problem and use a different algorithm. Anywhere that you're essentially trying to use inv(A)*x, you should try introducing a new variable y and corresponding linear equality constraint x = A * y. Most of the algorithms in Optim If your objective function in terms of x and y is linear, then you can use Clp, GLPK, etc. If it's still some complicated nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might work too, but it's not tied into the convenient JuMP reverse-mode autodiff framework yet. -Tony On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote: I'm really excited to use DualNumbers and the autodiff option in Optim.jl. However, the functions I would like to optimize often make use of linear algebra, including cholesky factorizations. At present, DualNumbers.jl does not implement a cholesky factorization. My first pass at a workaround for this was to just write out a standard cholesky solver (i.e., just writing out the equations listed in wikipedia). This seems to work fine - i.e., my choldn() function factors a DualNumber array into upper and lower triangular matrices whose product is the input. However, my choldn function listed below is MUCH slower than the LAPACK call built into chol(). I can't say I'm surprised by this, but the speed hit makes the whole enterprise of using autodiff with DualNumbers considerably less useful. I was hoping to find some neat linear algebra trick that would let me compute a DualNumber cholesky factorization without having to resort to non-LAPACK code, but I haven't found it yet. That is, I figured that I could compute the cholesky factorization of the real part of the matrix and then separately compute a matrix which represented the DualNumber part. However, I've not yet found a math text describing such a beast. Any ideas? Am I wrong in thinking that a Cholesky decomposition of a square array with DualNumbers is a well-defined mathematical object? I've included my code for choldn() below. Thanks in advance. CODE function choldn!(A::Array) N = size(A,1); T = eltype(A); tmp = zero(T); for j = 1:N # cols for i = j:N # rows if i == j if i == 1 A[i,j] = sqrt(A[i,j]); else tmp = zero(T); for k = 1:j-1 tmp += A[j,k] ^ 2; end A[i,j] = sqrt(A[i,j] - tmp); end else if j == 1 A[i,j] = A[i,j] / A[j,j]; else tmp = zero(T); for k = 1:j-1 tmp += A[i,k] * A[j,k]; end A[i,j] = (A[i,j] - tmp) / A[j,j]; end end end end # then zero out the non-cholesky stuff for i = 1:N if i N for j = i+1:N A[i,j] = zero(T); end end end end
[julia-users] Re: Optim/DualNumbers question (maybe a math question?)
Whoops, didn't finish writing a sentence there... was saying that most of the algorithms in Optim don't implement equality constraints, but you've got other choices of optimization solvers. On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote: Thomas, If you're trying to invert (factorize) a matrix as part of your objective function, it's generally a much better idea to reformulate your optimization problem and use a different algorithm. Anywhere that you're essentially trying to use inv(A)*x, you should try introducing a new variable y and corresponding linear equality constraint x = A * y. Most of the algorithms in Optim If your objective function in terms of x and y is linear, then you can use Clp, GLPK, etc. If it's still some complicated nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might work too, but it's not tied into the convenient JuMP reverse-mode autodiff framework yet. -Tony On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote: I'm really excited to use DualNumbers and the autodiff option in Optim.jl. However, the functions I would like to optimize often make use of linear algebra, including cholesky factorizations. At present, DualNumbers.jl does not implement a cholesky factorization. My first pass at a workaround for this was to just write out a standard cholesky solver (i.e., just writing out the equations listed in wikipedia). This seems to work fine - i.e., my choldn() function factors a DualNumber array into upper and lower triangular matrices whose product is the input. However, my choldn function listed below is MUCH slower than the LAPACK call built into chol(). I can't say I'm surprised by this, but the speed hit makes the whole enterprise of using autodiff with DualNumbers considerably less useful. I was hoping to find some neat linear algebra trick that would let me compute a DualNumber cholesky factorization without having to resort to non-LAPACK code, but I haven't found it yet. That is, I figured that I could compute the cholesky factorization of the real part of the matrix and then separately compute a matrix which represented the DualNumber part. However, I've not yet found a math text describing such a beast. Any ideas? Am I wrong in thinking that a Cholesky decomposition of a square array with DualNumbers is a well-defined mathematical object? I've included my code for choldn() below. Thanks in advance. CODE function choldn!(A::Array) N = size(A,1); T = eltype(A); tmp = zero(T); for j = 1:N # cols for i = j:N # rows if i == j if i == 1 A[i,j] = sqrt(A[i,j]); else tmp = zero(T); for k = 1:j-1 tmp += A[j,k] ^ 2; end A[i,j] = sqrt(A[i,j] - tmp); end else if j == 1 A[i,j] = A[i,j] / A[j,j]; else tmp = zero(T); for k = 1:j-1 tmp += A[i,k] * A[j,k]; end A[i,j] = (A[i,j] - tmp) / A[j,j]; end end end end # then zero out the non-cholesky stuff for i = 1:N if i N for j = i+1:N A[i,j] = zero(T); end end end end
Re: [julia-users] Re: Optim/DualNumbers question (maybe a math question?)
Thanks for the suggestion, Tony. The problem I'm working with is that of fitting a gaussian process to data, so my objective function (the log likelihood of a GP given some data x) includes both an x' inv(K(theta)) x term and a log determinant of K(theta). Here K() is a big dense positive definite weight matrix and theta is the parameter(s) controlling the weights. As far as I can tell, the fastest and most numerically stable way to implement these calculations is with a cholesky decomposition of K(theta). In my code where I've already done the work of finding derivatives by hand, I use a PDMat for K and the associated fast routines (like invquad and logdet) available for PDMats. To use the DualNumbers autodiff code, I needed a way to cholesky factorize a symmetric matrix of dual numbers (which I assume was something like K(theta + epsilon) in the autodiff code). However if you think there's a smarter way to do this kind of calculation that avoids a cholesky factorization I would definitely take a look. On Jun 3, 2014, at 5:41 AM, Tony Kelman t...@kelman.net wrote: Whoops, didn't finish writing a sentence there... was saying that most of the algorithms in Optim don't implement equality constraints, but you've got other choices of optimization solvers. On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote: Thomas, If you're trying to invert (factorize) a matrix as part of your objective function, it's generally a much better idea to reformulate your optimization problem and use a different algorithm. Anywhere that you're essentially trying to use inv(A)*x, you should try introducing a new variable y and corresponding linear equality constraint x = A * y. Most of the algorithms in Optim If your objective function in terms of x and y is linear, then you can use Clp, GLPK, etc. If it's still some complicated nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might work too, but it's not tied into the convenient JuMP reverse-mode autodiff framework yet. -Tony On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote: I'm really excited to use DualNumbers and the autodiff option in Optim.jl. However, the functions I would like to optimize often make use of linear algebra, including cholesky factorizations. At present, DualNumbers.jl does not implement a cholesky factorization. My first pass at a workaround for this was to just write out a standard cholesky solver (i.e., just writing out the equations listed in wikipedia). This seems to work fine - i.e., my choldn() function factors a DualNumber array into upper and lower triangular matrices whose product is the input. However, my choldn function listed below is MUCH slower than the LAPACK call built into chol(). I can't say I'm surprised by this, but the speed hit makes the whole enterprise of using autodiff with DualNumbers considerably less useful. I was hoping to find some neat linear algebra trick that would let me compute a DualNumber cholesky factorization without having to resort to non-LAPACK code, but I haven't found it yet. That is, I figured that I could compute the cholesky factorization of the real part of the matrix and then separately compute a matrix which represented the DualNumber part. However, I've not yet found a math text describing such a beast. Any ideas? Am I wrong in thinking that a Cholesky decomposition of a square array with DualNumbers is a well-defined mathematical object? I've included my code for choldn() below. Thanks in advance. CODE function choldn!(A::Array) N = size(A,1); T = eltype(A); tmp = zero(T); for j = 1:N # cols for i = j:N # rows if i == j if i == 1 A[i,j] = sqrt(A[i,j]); else tmp = zero(T); for k = 1:j-1 tmp += A[j,k] ^ 2; end A[i,j] = sqrt(A[i,j] - tmp); end else if j == 1 A[i,j] = A[i,j] / A[j,j]; else tmp = zero(T); for k = 1:j-1 tmp += A[i,k] * A[j,k]; end A[i,j] = (A[i,j] - tmp) / A[j,j]; end end end end # then zero out the non-cholesky stuff for i = 1:N if i N for j = i+1:N A[i,j] = zero(T); end end end end
Re: [julia-users] Re: Optim/DualNumbers question (maybe a math question?)
Alright if K is dense then Ipopt may not be your best choice. How do the theta parameters enter into K(theta)? Is the problem convex? Would the dual problem be easier to work with? Have you read Boyd and Vandenberghe more recently than I have? This sounds suspiciously like it can be expressed as a semidefinite program, though I think the Julia interfaces to good SDP solvers are still WIP. Yes the majority of solvers for convex optimization rely on a Cholesky decomposition at some point, the question is whether or not you pose the problem in such a way that you have to differentiate the factors. I suspect you shouldn't have to, but also haven't worked with this particular type of problem in a few years. On Tuesday, June 3, 2014 4:33:01 AM UTC-7, Thomas Covert wrote: Thanks for the suggestion, Tony. The problem I'm working with is that of fitting a gaussian process to data, so my objective function (the log likelihood of a GP given some data x) includes both an x' inv(K(theta)) x term and a log determinant of K(theta). Here K() is a big dense positive definite weight matrix and theta is the parameter(s) controlling the weights. As far as I can tell, the fastest and most numerically stable way to implement these calculations is with a cholesky decomposition of K(theta). In my code where I've already done the work of finding derivatives by hand, I use a PDMat for K and the associated fast routines (like invquad and logdet) available for PDMats. To use the DualNumbers autodiff code, I needed a way to cholesky factorize a symmetric matrix of dual numbers (which I assume was something like K(theta + epsilon) in the autodiff code). However if you think there's a smarter way to do this kind of calculation that avoids a cholesky factorization I would definitely take a look. On Jun 3, 2014, at 5:41 AM, Tony Kelman to...@kelman.net javascript: wrote: Whoops, didn't finish writing a sentence there... was saying that most of the algorithms in Optim don't implement equality constraints, but you've got other choices of optimization solvers. On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote: Thomas, If you're trying to invert (factorize) a matrix as part of your objective function, it's generally a much better idea to reformulate your optimization problem and use a different algorithm. Anywhere that you're essentially trying to use inv(A)*x, you should try introducing a new variable y and corresponding linear equality constraint x = A * y. Most of the algorithms in Optim If your objective function in terms of x and y is linear, then you can use Clp, GLPK, etc. If it's still some complicated nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might work too, but it's not tied into the convenient JuMP reverse-mode autodiff framework yet. -Tony On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote: I'm really excited to use DualNumbers and the autodiff option in Optim.jl. However, the functions I would like to optimize often make use of linear algebra, including cholesky factorizations. At present, DualNumbers.jl does not implement a cholesky factorization. My first pass at a workaround for this was to just write out a standard cholesky solver (i.e., just writing out the equations listed in wikipedia). This seems to work fine - i.e., my choldn() function factors a DualNumber array into upper and lower triangular matrices whose product is the input. However, my choldn function listed below is MUCH slower than the LAPACK call built into chol(). I can't say I'm surprised by this, but the speed hit makes the whole enterprise of using autodiff with DualNumbers considerably less useful. I was hoping to find some neat linear algebra trick that would let me compute a DualNumber cholesky factorization without having to resort to non-LAPACK code, but I haven't found it yet. That is, I figured that I could compute the cholesky factorization of the real part of the matrix and then separately compute a matrix which represented the DualNumber part. However, I've not yet found a math text describing such a beast. Any ideas? Am I wrong in thinking that a Cholesky decomposition of a square array with DualNumbers is a well-defined mathematical object? I've included my code for choldn() below. Thanks in advance. CODE function choldn!(A::Array) N = size(A,1); T = eltype(A); tmp = zero(T); for j = 1:N # cols for i = j:N # rows if i == j if i == 1 A[i,j] = sqrt(A[i,j]); else tmp = zero(T); for k = 1:j-1 tmp += A[j,k] ^ 2; end A[i,j] =
Re: [julia-users] Re: Optim/DualNumbers question (maybe a math question?)
In the GP setting I'm using, K is the set of pair wise gaussian kernel evaluations between all the points in my data. So if a point is x_i,z_i (x is the dependent variable and the z's are effectively explanatory variables), K_ij is more or less exp(-.5 * (z_i - z_j)^2 * theta). I don't think the problem is convex, but MATLAB's fminunc + tens of random starting values always converge to the same point. Anyway, a previous commenter pointed out that LU factorization is implemented for dual numbers, so I'll give that a try. On Jun 3, 2014, at 7:15 AM, Tony Kelman t...@kelman.net wrote: Alright if K is dense then Ipopt may not be your best choice. How do the theta parameters enter into K(theta)? Is the problem convex? Would the dual problem be easier to work with? Have you read Boyd and Vandenberghe more recently than I have? This sounds suspiciously like it can be expressed as a semidefinite program, though I think the Julia interfaces to good SDP solvers are still WIP. Yes the majority of solvers for convex optimization rely on a Cholesky decomposition at some point, the question is whether or not you pose the problem in such a way that you have to differentiate the factors. I suspect you shouldn't have to, but also haven't worked with this particular type of problem in a few years. On Tuesday, June 3, 2014 4:33:01 AM UTC-7, Thomas Covert wrote: Thanks for the suggestion, Tony. The problem I'm working with is that of fitting a gaussian process to data, so my objective function (the log likelihood of a GP given some data x) includes both an x' inv(K(theta)) x term and a log determinant of K(theta). Here K() is a big dense positive definite weight matrix and theta is the parameter(s) controlling the weights. As far as I can tell, the fastest and most numerically stable way to implement these calculations is with a cholesky decomposition of K(theta). In my code where I've already done the work of finding derivatives by hand, I use a PDMat for K and the associated fast routines (like invquad and logdet) available for PDMats. To use the DualNumbers autodiff code, I needed a way to cholesky factorize a symmetric matrix of dual numbers (which I assume was something like K(theta + epsilon) in the autodiff code). However if you think there's a smarter way to do this kind of calculation that avoids a cholesky factorization I would definitely take a look. On Jun 3, 2014, at 5:41 AM, Tony Kelman to...@kelman.net wrote: Whoops, didn't finish writing a sentence there... was saying that most of the algorithms in Optim don't implement equality constraints, but you've got other choices of optimization solvers. On Tuesday, June 3, 2014 3:39:44 AM UTC-7, Tony Kelman wrote: Thomas, If you're trying to invert (factorize) a matrix as part of your objective function, it's generally a much better idea to reformulate your optimization problem and use a different algorithm. Anywhere that you're essentially trying to use inv(A)*x, you should try introducing a new variable y and corresponding linear equality constraint x = A * y. Most of the algorithms in Optim If your objective function in terms of x and y is linear, then you can use Clp, GLPK, etc. If it's still some complicated nonlinear thing even in terms of x and y, use JuMP with Ipopt. NLopt might work too, but it's not tied into the convenient JuMP reverse-mode autodiff framework yet. -Tony On Monday, June 2, 2014 11:43:32 AM UTC-7, Thomas Covert wrote: I'm really excited to use DualNumbers and the autodiff option in Optim.jl. However, the functions I would like to optimize often make use of linear algebra, including cholesky factorizations. At present, DualNumbers.jl does not implement a cholesky factorization. My first pass at a workaround for this was to just write out a standard cholesky solver (i.e., just writing out the equations listed in wikipedia). This seems to work fine - i.e., my choldn() function factors a DualNumber array into upper and lower triangular matrices whose product is the input. However, my choldn function listed below is MUCH slower than the LAPACK call built into chol(). I can't say I'm surprised by this, but the speed hit makes the whole enterprise of using autodiff with DualNumbers considerably less useful. I was hoping to find some neat linear algebra trick that would let me compute a DualNumber cholesky factorization without having to resort to non-LAPACK code, but I haven't found it yet. That is, I figured that I could compute the cholesky factorization of the real part of the matrix and then separately compute a matrix which represented the DualNumber part. However, I've not yet found a math text describing such a beast. Any ideas? Am I wrong in thinking that a Cholesky decomposition of a square array with DualNumbers is a well-defined mathematical object? I've
[julia-users] Re: Optim/DualNumbers question (maybe a math question?)
Hi Thomas, This is definitely a reasonable thing to want to do, and there are a number of methods to differentiate through a matrix factorization. Julia doesn't yet have a generic chol() method (see https://github.com/JuliaLang/julia/issues/1629), but it does have a generic LU decomposition method that works with DualNumbers. So if you replace cholfact with lufact your code might just work. However, this won't use LAPACK, and off-hand I'm not sure off hand if it's possible to reformulate a Cholesky with dual numbers into something LAPACK knows how to compute. There also might be some performance issues with your hand-written Cholesky factorization, try: function choldn!{T}(A::Matrix{T}) instead of function choldn!(A::Array) ... T = eltype(A); Exactly how much slower is it? On Monday, June 2, 2014 2:43:32 PM UTC-4, Thomas Covert wrote: I'm really excited to use DualNumbers and the autodiff option in Optim.jl. However, the functions I would like to optimize often make use of linear algebra, including cholesky factorizations. At present, DualNumbers.jl does not implement a cholesky factorization. My first pass at a workaround for this was to just write out a standard cholesky solver (i.e., just writing out the equations listed in wikipedia). This seems to work fine - i.e., my choldn() function factors a DualNumber array into upper and lower triangular matrices whose product is the input. However, my choldn function listed below is MUCH slower than the LAPACK call built into chol(). I can't say I'm surprised by this, but the speed hit makes the whole enterprise of using autodiff with DualNumbers considerably less useful. I was hoping to find some neat linear algebra trick that would let me compute a DualNumber cholesky factorization without having to resort to non-LAPACK code, but I haven't found it yet. That is, I figured that I could compute the cholesky factorization of the real part of the matrix and then separately compute a matrix which represented the DualNumber part. However, I've not yet found a math text describing such a beast. Any ideas? Am I wrong in thinking that a Cholesky decomposition of a square array with DualNumbers is a well-defined mathematical object? I've included my code for choldn() below. Thanks in advance. CODE function choldn!(A::Array) N = size(A,1); T = eltype(A); tmp = zero(T); for j = 1:N # cols for i = j:N # rows if i == j if i == 1 A[i,j] = sqrt(A[i,j]); else tmp = zero(T); for k = 1:j-1 tmp += A[j,k] ^ 2; end A[i,j] = sqrt(A[i,j] - tmp); end else if j == 1 A[i,j] = A[i,j] / A[j,j]; else tmp = zero(T); for k = 1:j-1 tmp += A[i,k] * A[j,k]; end A[i,j] = (A[i,j] - tmp) / A[j,j]; end end end end # then zero out the non-cholesky stuff for i = 1:N if i N for j = i+1:N A[i,j] = zero(T); end end end end
Re: [julia-users] Re: Optim/DualNumbers question (maybe a math question?)
Thanks for the quick response. I'll check and see if an LU decomposition will do the trick As far as my own cholesky code, it is quite a bit slower, even after your edits. For a 1000x1000 Float64 matrix (i.e., X = randn(1000,1000); X = X' * X), here are the timing results I get: 3 Calls of @time choldn(X), the slow one I wrote elapsed time: 1.313375682 seconds (9102620 bytes allocated) elapsed time: 1.231566056 seconds (896 bytes allocated) elapsed time: 1.240786689 seconds (896 bytes allocated) 3 calls of @time chol(X), the built in cholesky factorizer. elapsed time: 0.032613575 seconds (8000736 bytes allocated) elapsed time: 0.041127357 seconds (8000688 bytes allocated) elapsed time: 0.066425175 seconds (8000688 bytes allocated) If I had to guess, the significant timing differences here are due to smart multi-threading in the LAPACK cholesky code. For reference, these are running times on my mid-2009 macbook pro with 8 gigs of ram and 2.53 GHz Core 2 Duo. On Mon, Jun 2, 2014 at 4:36 PM, Miles Lubin miles.lu...@gmail.com wrote: Hi Thomas, This is definitely a reasonable thing to want to do, and there are a number of methods to differentiate through a matrix factorization. Julia doesn't yet have a generic chol() method (see https://github.com/JuliaLang/julia/issues/1629), but it does have a generic LU decomposition method that works with DualNumbers. So if you replace cholfact with lufact your code might just work. However, this won't use LAPACK, and off-hand I'm not sure off hand if it's possible to reformulate a Cholesky with dual numbers into something LAPACK knows how to compute. There also might be some performance issues with your hand-written Cholesky factorization, try: function choldn!{T}(A::Matrix{T}) instead of function choldn!(A::Array) ... T = eltype(A); Exactly how much slower is it? On Monday, June 2, 2014 2:43:32 PM UTC-4, Thomas Covert wrote: I'm really excited to use DualNumbers and the autodiff option in Optim.jl. However, the functions I would like to optimize often make use of linear algebra, including cholesky factorizations. At present, DualNumbers.jl does not implement a cholesky factorization. My first pass at a workaround for this was to just write out a standard cholesky solver (i.e., just writing out the equations listed in wikipedia). This seems to work fine - i.e., my choldn() function factors a DualNumber array into upper and lower triangular matrices whose product is the input. However, my choldn function listed below is MUCH slower than the LAPACK call built into chol(). I can't say I'm surprised by this, but the speed hit makes the whole enterprise of using autodiff with DualNumbers considerably less useful. I was hoping to find some neat linear algebra trick that would let me compute a DualNumber cholesky factorization without having to resort to non-LAPACK code, but I haven't found it yet. That is, I figured that I could compute the cholesky factorization of the real part of the matrix and then separately compute a matrix which represented the DualNumber part. However, I've not yet found a math text describing such a beast. Any ideas? Am I wrong in thinking that a Cholesky decomposition of a square array with DualNumbers is a well-defined mathematical object? I've included my code for choldn() below. Thanks in advance. CODE function choldn!(A::Array) N = size(A,1); T = eltype(A); tmp = zero(T); for j = 1:N # cols for i = j:N # rows if i == j if i == 1 A[i,j] = sqrt(A[i,j]); else tmp = zero(T); for k = 1:j-1 tmp += A[j,k] ^ 2; end A[i,j] = sqrt(A[i,j] - tmp); end else if j == 1 A[i,j] = A[i,j] / A[j,j]; else tmp = zero(T); for k = 1:j-1 tmp += A[i,k] * A[j,k]; end A[i,j] = (A[i,j] - tmp) / A[j,j]; end end end end # then zero out the non-cholesky stuff for i = 1:N if i N for j = i+1:N A[i,j] = zero(T); end end end end