Thanks, the dsoftmax is going to be used for a toy Multilayer Perceptron 
Classifier I am writing. Using my original dsoftmax, which is being called 
hundreds of thousands of times, it took way too long.

Using sm D. 1 has made things quite a bit faster.

Thanks.
--------------------------------------------
On Tue, 2/28/17, Raul Miller <[email protected]> wrote:

 Subject: Re: [Jprogramming] Fast derivative of Softmax function
 To: "Programming forum" <[email protected]>
 Date: Tuesday, February 28, 2017, 12:24 AM
 
 I was about to point out
 the same thing.
 
    a =: 0.5 0.6 0.23 0.66
    sm=:(] % +/ )@:^ NB. softmax
    softmax=: { sm
    dsoftmax=: 4 : 0
      idx=. x
  
    vals=. y
  
    smx=. idx softmax vals
  
    rx=. ''
  
    for_j. i.#vals do.
    
    if. j = idx do. rx=. rx , smx * (1 - smx)
        elseif. 1 do. rx=. rx ,(j
 softmax vals)* (0 - smx) end.
  
    end.
      rx
   )
 
    sm D.1 a
  
 0.186192 _0.0676431 _0.0467234 _0.0718259
 _0.0676431    0.19866 _0.0516374
 _0.0793799
 _0.0467234
 _0.0516374   0.153191 _0.0548305
 _0.0718259 _0.0793799
 _0.0548305   0.206036
    (i.# a) dsoftmax"0 _  a
   0.186192 _0.0676431 _0.0467234 _0.0718259
 _0.0676431    0.19866 _0.0516374
 _0.0793799
 _0.0467234
 _0.0516374   0.153191 _0.0548305
 _0.0718259 _0.0793799
 _0.0548305   0.206036
 
 The speedup is not too impressive, but it is a
 speedup (probably
 because we are retaining
 and reusing all results from sm rather than
 recomputing it so many times -- I imagine using
 sm directly and
 lifting it out of the loop
 ):
 
    timespacex
 '(i.# a) dsoftmax"0 _  a'
 5.5e_5 6528
    timespacex 'sm D.1 a'
 2.3e_5 7424
 
 That said, note that we can approximate this
 speedup by using a
 variant on what Pascal
 proposed:
 
 d_softmax=: 4 :
 0
   rx=. i.0 0
   smv=. sm
 y
   for_i. x do.
     ry=.
 i.0
     smx=. i { smv
  
   for_j. i.#y do.
       if. j_index =
 i_index do. ry=. ry , smx * (1 - smx)
    
   else.  ry=. ry ,(j {smv)* (0 - smx) end.
     end.
     rx=.rx, ry
   end.
   rx
 )
 
    (i.# a) d_softmax  a
   0.186192 _0.0676431 _0.0467234 _0.0718259
 _0.0676431    0.19866 _0.0516374
 _0.0793799
 _0.0467234
 _0.0516374   0.153191 _0.0548305
 _0.0718259 _0.0793799
 _0.0548305   0.206036
    timespacex '(i.# a)
 d_softmax  a'
 3e_5 6016
 
 (Remember that it's
 generally a good idea to ignore speedups which are
 less than a factor of 2, because of scheduling
 issues within the
 machine itself - you can
 see this by inspecting multiple timing runs)
 
    timespacex
 '(i.# a) d_softmax  a'
 3e_5 6016
    timespacex 'sm D.1 a'
 2.3e_5 7424
    timespacex '(i.# a)
 d_softmax  a'
 2.9e_5 6016
    timespacex 'sm D.1 a'
 2.3e_5 7424
    timespacex '(i.# a)
 d_softmax  a'
 2.8e_5 6016
    timespacex 'sm D.1 a'
 3.7e_5 7424
    timespacex '(i.# a)
 d_softmax  a'
 3.2e_5 6016
 
 I hope this helps,
 
 -- 
 Raul
 
 On Mon, Feb 27, 2017 at 8:05
 AM, Louis de Forcrand <[email protected]>
 wrote:
 > You probably know about it, but
 I'll mention it anyway: there's a primitive partial
 derivative operator in J. I think it would do exactly what
 you want (numerically), and it's probably reasonably
 fast. It's not too hard to use either:
 >
 > dsoftmax=: sm D.1
 >
 > Louis
 >
 >> On 27 Feb 2017,
 at 10:20, 'Pascal Jasmin' via Programming <[email protected]>
 wrote:
 >>
 >> one
 optimization is removing the rank"0 _, so that the
 function not need to be reparsed for each x
 >>
 >> untested.
 >>
 >>
 >> dsoftmax=: 4 : 0
 >> rx=. ''
 >> for_i. x do.
 >>
 smx=. i softmax y
 >> for_j. i.#vals
 do.
 >> if. j_index. = i_index. do.
 rx=. rx , smx * (1 - smx)
 >> else. 
 rx=. rx ,(j softmax y)* (0 - smx) end.
 >> end. end.
 >>
 rx
 >> )
 >>
 >>
 >> -----
 Original Message -----
 >> From:
 'Jon Hough' via Programming <[email protected]>
 >> To: Programming Forum <[email protected]>
 >> Sent: Monday, February 27, 2017 3:09
 AM
 >> Subject: [Jprogramming] Fast
 derivative of Softmax function
 >>
 >> Given an array, we can calculate the
 softmax function
 >> https://en.wikipedia.org/wiki/Softmax_function
 >>
 >> a =: 0.5 0.6
 0.23 0.66
 >> sm=:(] % +/ )@:^ NB.
 softmax
 >>
 >> sm
 a
 >> 0.247399 0.273418 0.188859
 0.290325
 >>
 >>
 The (partial) derivative of softmax is a little more
 complicated:
 >>
 >> If the array is of length N, we need
 an NxN matrix of partial derivatives where (in pseudo
 code)
 >>
 >>
 derivatives[i,j] = sm (array[i] )  *( 1 -
 sm(array[j])   if i == j
 >>
 or
 >> derivatives[i,j] =  -1 * sm
 (array[i] )  * ( sm(array[j])   if i != j
 >>
 >> ( see here
 for the reasoning: 
http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
 )
 >>
 >> My
 implementation of the partial derivatives is this:
 >>
 >>
 >> NB. x value is index, y value is the
 whole array
 >> dsoftmax=: 4 : 0
 >> idx=. x
 >>
 vals=. y
 >> smx=. idx softmax vals
 >> rx=. ''
 >> for_j. i.#vals do.
 >>  if. j = idx do. rx=. rx , smx * (1 -
 smx)
 >>  elseif. 1 do. rx=. rx ,(j
 softmax vals)* (0 - smx) end.
 >>
 end.
 >> rx
 >>
 )
 >>
 >>
 >> Then, for example using above array
 a,
 >>
 >> (i.# a)
 dsoftmax"0 _  a
 >>
 >> gives the values, in a 4x4 matrix.
 >>
 >> This is quite
 slow. I have tried to do this without iterating and
 branching, but cannot figure out a way to do it.
 >> Any help appreciated.
 >> Thanks,
 >>
 >> Jon
 >>
 ----------------------------------------------------------------------
 >> For information about J forums see http://www.jsoftware.com/forums.htm
 >>
 ----------------------------------------------------------------------
 >> For information about J forums see http://www.jsoftware.com/forums.htm
 >
 >
 ----------------------------------------------------------------------
 > For information about J forums see http://www.jsoftware.com/forums.htm
 ----------------------------------------------------------------------
 For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to