Yes, thanks for the warning. I remember this one:
(^&_2) d. _1 and it also seems this is, too, is a bug, (%@:*:) d. _1 I once tried to figure out how J implements d. and D., but unfortunately I found J's C source code impenetrable. I assume fixing d. and D. are not priorities for new J releases. Regards, Jon -------------------------------------------- On Tue, 2/28/17, Henry Rich <[email protected]> wrote: Subject: Re: [Jprogramming] Fast derivative of Softmax function To: [email protected] Date: Tuesday, February 28, 2017, 8:47 AM The Bugs page is festooned with anomalies in D. and d., especially higher derivatives. Have a look at it before you get too far committed. Henry Rich On 2/27/2017 6:45 PM, 'Jon Hough' via Programming wrote: > Thanks, the dsoftmax is going to be used for a toy Multilayer Perceptron Classifier I am writing. Using my original dsoftmax, which is being called hundreds of thousands of times, it took way too long. > > Using sm D. 1 has made things quite a bit faster. > > Thanks. > -------------------------------------------- > On Tue, 2/28/17, Raul Miller <[email protected]> wrote: > > Subject: Re: [Jprogramming] Fast derivative of Softmax function > To: "Programming forum" <[email protected]> > Date: Tuesday, February 28, 2017, 12:24 AM > > I was about to point out > the same thing. > > a =: 0.5 0.6 0.23 0.66 > sm=:(] % +/ )@:^ NB. softmax > softmax=: { sm > dsoftmax=: 4 : 0 > idx=. x > > vals=. y > > smx=. idx softmax vals > > rx=. '' > > for_j. i.#vals do. > > if. j = idx do. rx=. rx , smx * (1 - smx) > elseif. 1 do. rx=. rx ,(j > softmax vals)* (0 - smx) end. > > end. > rx > ) > > sm D.1 a > > 0.186192 _0.0676431 _0.0467234 _0.0718259 > _0.0676431 0.19866 _0.0516374 > _0.0793799 > _0.0467234 > _0.0516374 0.153191 _0.0548305 > _0.0718259 _0.0793799 > _0.0548305 0.206036 > (i.# a) dsoftmax"0 _ a > 0.186192 _0.0676431 _0.0467234 _0.0718259 > _0.0676431 0.19866 _0.0516374 > _0.0793799 > _0.0467234 > _0.0516374 0.153191 _0.0548305 > _0.0718259 _0.0793799 > _0.0548305 0.206036 > > The speedup is not too impressive, but it is a > speedup (probably > because we are retaining > and reusing all results from sm rather than > recomputing it so many times -- I imagine using > sm directly and > lifting it out of the loop > ): > > timespacex > '(i.# a) dsoftmax"0 _ a' > 5.5e_5 6528 > timespacex 'sm D.1 a' > 2.3e_5 7424 > > That said, note that we can approximate this > speedup by using a > variant on what Pascal > proposed: > > d_softmax=: 4 : > 0 > rx=. i.0 0 > smv=. sm > y > for_i. x do. > ry=. > i.0 > smx=. i { smv > > for_j. i.#y do. > if. j_index = > i_index do. ry=. ry , smx * (1 - smx) > > else. ry=. ry ,(j {smv)* (0 - smx) end. > end. > rx=.rx, ry > end. > rx > ) > > (i.# a) d_softmax a > 0.186192 _0.0676431 _0.0467234 _0.0718259 > _0.0676431 0.19866 _0.0516374 > _0.0793799 > _0.0467234 > _0.0516374 0.153191 _0.0548305 > _0.0718259 _0.0793799 > _0.0548305 0.206036 > timespacex '(i.# a) > d_softmax a' > 3e_5 6016 > > (Remember that it's > generally a good idea to ignore speedups which are > less than a factor of 2, because of scheduling > issues within the > machine itself - you can > see this by inspecting multiple timing runs) > > timespacex > '(i.# a) d_softmax a' > 3e_5 6016 > timespacex 'sm D.1 a' > 2.3e_5 7424 > timespacex '(i.# a) > d_softmax a' > 2.9e_5 6016 > timespacex 'sm D.1 a' > 2.3e_5 7424 > timespacex '(i.# a) > d_softmax a' > 2.8e_5 6016 > timespacex 'sm D.1 a' > 3.7e_5 7424 > timespacex '(i.# a) > d_softmax a' > 3.2e_5 6016 > > I hope this helps, > > -- > Raul > > On Mon, Feb 27, 2017 at 8:05 > AM, Louis de Forcrand <[email protected]> > wrote: > > You probably know about it, but > I'll mention it anyway: there's a primitive partial > derivative operator in J. I think it would do exactly what > you want (numerically), and it's probably reasonably > fast. It's not too hard to use either: > > > > dsoftmax=: sm D.1 > > > > Louis > > > >> On 27 Feb 2017, > at 10:20, 'Pascal Jasmin' via Programming <[email protected]> > wrote: > >> > >> one > optimization is removing the rank"0 _, so that the > function not need to be reparsed for each x > >> > >> untested. > >> > >> > >> dsoftmax=: 4 : 0 > >> rx=. '' > >> for_i. x do. > >> > smx=. i softmax y > >> for_j. i.#vals > do. > >> if. j_index. = i_index. do. > rx=. rx , smx * (1 - smx) > >> else. > rx=. rx ,(j softmax y)* (0 - smx) end. > >> end. end. > >> > rx > >> ) > >> > >> > >> ----- > Original Message ----- > >> From: > 'Jon Hough' via Programming <[email protected]> > >> To: Programming Forum <[email protected]> > >> Sent: Monday, February 27, 2017 3:09 > AM > >> Subject: [Jprogramming] Fast > derivative of Softmax function > >> > >> Given an array, we can calculate the > softmax function > >> https://en.wikipedia.org/wiki/Softmax_function > >> > >> a =: 0.5 0.6 > 0.23 0.66 > >> sm=:(] % +/ )@:^ NB. > softmax > >> > >> sm > a > >> 0.247399 0.273418 0.188859 > 0.290325 > >> > >> > The (partial) derivative of softmax is a little more > complicated: > >> > >> If the array is of length N, we need > an NxN matrix of partial derivatives where (in pseudo > code) > >> > >> > derivatives[i,j] = sm (array[i] ) *( 1 - > sm(array[j]) if i == j > >> > or > >> derivatives[i,j] = -1 * sm > (array[i] ) * ( sm(array[j]) if i != j > >> > >> ( see here > for the reasoning: >http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/ > ) > >> > >> My > implementation of the partial derivatives is this: > >> > >> > >> NB. x value is index, y value is the > whole array > >> dsoftmax=: 4 : 0 > >> idx=. x > >> > vals=. y > >> smx=. idx softmax vals > >> rx=. '' > >> for_j. i.#vals do. > >> if. j = idx do. rx=. rx , smx * (1 - > smx) > >> elseif. 1 do. rx=. rx ,(j > softmax vals)* (0 - smx) end. > >> > end. > >> rx > >> > ) > >> > >> > >> Then, for example using above array > a, > >> > >> (i.# a) > dsoftmax"0 _ a > >> > >> gives the values, in a 4x4 matrix. > >> > >> This is quite > slow. I have tried to do this without iterating and > branching, but cannot figure out a way to do it. > >> Any help appreciated. > >> Thanks, > >> > >> Jon > >> > ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > >> > ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
