Thanks, the dsoftmax is going to be used for a toy Multilayer Perceptron Classifier I am writing. Using my original dsoftmax, which is being called hundreds of thousands of times, it took way too long.
Using sm D. 1 has made things quite a bit faster. Thanks. -------------------------------------------- On Tue, 2/28/17, Raul Miller <[email protected]> wrote: Subject: Re: [Jprogramming] Fast derivative of Softmax function To: "Programming forum" <[email protected]> Date: Tuesday, February 28, 2017, 12:24 AM I was about to point out the same thing. a =: 0.5 0.6 0.23 0.66 sm=:(] % +/ )@:^ NB. softmax softmax=: { sm dsoftmax=: 4 : 0 idx=. x vals=. y smx=. idx softmax vals rx=. '' for_j. i.#vals do. if. j = idx do. rx=. rx , smx * (1 - smx) elseif. 1 do. rx=. rx ,(j softmax vals)* (0 - smx) end. end. rx ) sm D.1 a 0.186192 _0.0676431 _0.0467234 _0.0718259 _0.0676431 0.19866 _0.0516374 _0.0793799 _0.0467234 _0.0516374 0.153191 _0.0548305 _0.0718259 _0.0793799 _0.0548305 0.206036 (i.# a) dsoftmax"0 _ a 0.186192 _0.0676431 _0.0467234 _0.0718259 _0.0676431 0.19866 _0.0516374 _0.0793799 _0.0467234 _0.0516374 0.153191 _0.0548305 _0.0718259 _0.0793799 _0.0548305 0.206036 The speedup is not too impressive, but it is a speedup (probably because we are retaining and reusing all results from sm rather than recomputing it so many times -- I imagine using sm directly and lifting it out of the loop ): timespacex '(i.# a) dsoftmax"0 _ a' 5.5e_5 6528 timespacex 'sm D.1 a' 2.3e_5 7424 That said, note that we can approximate this speedup by using a variant on what Pascal proposed: d_softmax=: 4 : 0 rx=. i.0 0 smv=. sm y for_i. x do. ry=. i.0 smx=. i { smv for_j. i.#y do. if. j_index = i_index do. ry=. ry , smx * (1 - smx) else. ry=. ry ,(j {smv)* (0 - smx) end. end. rx=.rx, ry end. rx ) (i.# a) d_softmax a 0.186192 _0.0676431 _0.0467234 _0.0718259 _0.0676431 0.19866 _0.0516374 _0.0793799 _0.0467234 _0.0516374 0.153191 _0.0548305 _0.0718259 _0.0793799 _0.0548305 0.206036 timespacex '(i.# a) d_softmax a' 3e_5 6016 (Remember that it's generally a good idea to ignore speedups which are less than a factor of 2, because of scheduling issues within the machine itself - you can see this by inspecting multiple timing runs) timespacex '(i.# a) d_softmax a' 3e_5 6016 timespacex 'sm D.1 a' 2.3e_5 7424 timespacex '(i.# a) d_softmax a' 2.9e_5 6016 timespacex 'sm D.1 a' 2.3e_5 7424 timespacex '(i.# a) d_softmax a' 2.8e_5 6016 timespacex 'sm D.1 a' 3.7e_5 7424 timespacex '(i.# a) d_softmax a' 3.2e_5 6016 I hope this helps, -- Raul On Mon, Feb 27, 2017 at 8:05 AM, Louis de Forcrand <[email protected]> wrote: > You probably know about it, but I'll mention it anyway: there's a primitive partial derivative operator in J. I think it would do exactly what you want (numerically), and it's probably reasonably fast. It's not too hard to use either: > > dsoftmax=: sm D.1 > > Louis > >> On 27 Feb 2017, at 10:20, 'Pascal Jasmin' via Programming <[email protected]> wrote: >> >> one optimization is removing the rank"0 _, so that the function not need to be reparsed for each x >> >> untested. >> >> >> dsoftmax=: 4 : 0 >> rx=. '' >> for_i. x do. >> smx=. i softmax y >> for_j. i.#vals do. >> if. j_index. = i_index. do. rx=. rx , smx * (1 - smx) >> else. rx=. rx ,(j softmax y)* (0 - smx) end. >> end. end. >> rx >> ) >> >> >> ----- Original Message ----- >> From: 'Jon Hough' via Programming <[email protected]> >> To: Programming Forum <[email protected]> >> Sent: Monday, February 27, 2017 3:09 AM >> Subject: [Jprogramming] Fast derivative of Softmax function >> >> Given an array, we can calculate the softmax function >> https://en.wikipedia.org/wiki/Softmax_function >> >> a =: 0.5 0.6 0.23 0.66 >> sm=:(] % +/ )@:^ NB. softmax >> >> sm a >> 0.247399 0.273418 0.188859 0.290325 >> >> The (partial) derivative of softmax is a little more complicated: >> >> If the array is of length N, we need an NxN matrix of partial derivatives where (in pseudo code) >> >> derivatives[i,j] = sm (array[i] ) *( 1 - sm(array[j]) if i == j >> or >> derivatives[i,j] = -1 * sm (array[i] ) * ( sm(array[j]) if i != j >> >> ( see here for the reasoning: http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/ ) >> >> My implementation of the partial derivatives is this: >> >> >> NB. x value is index, y value is the whole array >> dsoftmax=: 4 : 0 >> idx=. x >> vals=. y >> smx=. idx softmax vals >> rx=. '' >> for_j. i.#vals do. >> if. j = idx do. rx=. rx , smx * (1 - smx) >> elseif. 1 do. rx=. rx ,(j softmax vals)* (0 - smx) end. >> end. >> rx >> ) >> >> >> Then, for example using above array a, >> >> (i.# a) dsoftmax"0 _ a >> >> gives the values, in a 4x4 matrix. >> >> This is quite slow. I have tried to do this without iterating and branching, but cannot figure out a way to do it. >> Any help appreciated. >> Thanks, >> >> Jon >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
