The Bugs page is festooned with anomalies in D. and d., especially higher derivatives. Have a look at it before you get too far committed.

Henry Rich

On 2/27/2017 6:45 PM, 'Jon Hough' via Programming wrote:
Thanks, the dsoftmax is going to be used for a toy Multilayer Perceptron 
Classifier I am writing. Using my original dsoftmax, which is being called 
hundreds of thousands of times, it took way too long.

Using sm D. 1 has made things quite a bit faster.

Thanks.
--------------------------------------------
On Tue, 2/28/17, Raul Miller <[email protected]> wrote:

  Subject: Re: [Jprogramming] Fast derivative of Softmax function
  To: "Programming forum" <[email protected]>
  Date: Tuesday, February 28, 2017, 12:24 AM
I was about to point out
  the same thing.
a =: 0.5 0.6 0.23 0.66
     sm=:(] % +/ )@:^ NB. softmax
     softmax=: { sm
     dsoftmax=: 4 : 0
       idx=. x
vals=. y smx=. idx softmax vals rx=. '' for_j. i.#vals do. if. j = idx do. rx=. rx , smx * (1 - smx)
         elseif. 1 do. rx=. rx ,(j
  softmax vals)* (0 - smx) end.
end.
       rx
    )
sm D.1 a 0.186192 _0.0676431 _0.0467234 _0.0718259
  _0.0676431    0.19866 _0.0516374
  _0.0793799
  _0.0467234
  _0.0516374   0.153191 _0.0548305
  _0.0718259 _0.0793799
  _0.0548305   0.206036
     (i.# a) dsoftmax"0 _  a
    0.186192 _0.0676431 _0.0467234 _0.0718259
  _0.0676431    0.19866 _0.0516374
  _0.0793799
  _0.0467234
  _0.0516374   0.153191 _0.0548305
  _0.0718259 _0.0793799
  _0.0548305   0.206036
The speedup is not too impressive, but it is a
  speedup (probably
  because we are retaining
  and reusing all results from sm rather than
  recomputing it so many times -- I imagine using
  sm directly and
  lifting it out of the loop
  ):
timespacex
  '(i.# a) dsoftmax"0 _  a'
  5.5e_5 6528
     timespacex 'sm D.1 a'
  2.3e_5 7424
That said, note that we can approximate this
  speedup by using a
  variant on what Pascal
  proposed:
d_softmax=: 4 :
  0
    rx=. i.0 0
    smv=. sm
  y
    for_i. x do.
      ry=.
  i.0
      smx=. i { smv
for_j. i.#y do.
        if. j_index =
  i_index do. ry=. ry , smx * (1 - smx)
else. ry=. ry ,(j {smv)* (0 - smx) end.
      end.
      rx=.rx, ry
    end.
    rx
  )
(i.# a) d_softmax a
    0.186192 _0.0676431 _0.0467234 _0.0718259
  _0.0676431    0.19866 _0.0516374
  _0.0793799
  _0.0467234
  _0.0516374   0.153191 _0.0548305
  _0.0718259 _0.0793799
  _0.0548305   0.206036
     timespacex '(i.# a)
  d_softmax  a'
  3e_5 6016
(Remember that it's
  generally a good idea to ignore speedups which are
  less than a factor of 2, because of scheduling
  issues within the
  machine itself - you can
  see this by inspecting multiple timing runs)
timespacex
  '(i.# a) d_softmax  a'
  3e_5 6016
     timespacex 'sm D.1 a'
  2.3e_5 7424
     timespacex '(i.# a)
  d_softmax  a'
  2.9e_5 6016
     timespacex 'sm D.1 a'
  2.3e_5 7424
     timespacex '(i.# a)
  d_softmax  a'
  2.8e_5 6016
     timespacex 'sm D.1 a'
  3.7e_5 7424
     timespacex '(i.# a)
  d_softmax  a'
  3.2e_5 6016
I hope this helps, --
  Raul
On Mon, Feb 27, 2017 at 8:05
  AM, Louis de Forcrand <[email protected]>
  wrote:
  > You probably know about it, but
  I'll mention it anyway: there's a primitive partial
  derivative operator in J. I think it would do exactly what
  you want (numerically), and it's probably reasonably
  fast. It's not too hard to use either:
  >
  > dsoftmax=: sm D.1
  >
  > Louis
  >
  >> On 27 Feb 2017,
  at 10:20, 'Pascal Jasmin' via Programming <[email protected]>
  wrote:
  >>
  >> one
  optimization is removing the rank"0 _, so that the
  function not need to be reparsed for each x
  >>
  >> untested.
  >>
  >>
  >> dsoftmax=: 4 : 0
  >> rx=. ''
  >> for_i. x do.
  >>
  smx=. i softmax y
  >> for_j. i.#vals
  do.
  >> if. j_index. = i_index. do.
  rx=. rx , smx * (1 - smx)
  >> else.
  rx=. rx ,(j softmax y)* (0 - smx) end.
  >> end. end.
  >>
  rx
  >> )
  >>
  >>
  >> -----
  Original Message -----
  >> From:
  'Jon Hough' via Programming <[email protected]>
  >> To: Programming Forum <[email protected]>
  >> Sent: Monday, February 27, 2017 3:09
  AM
  >> Subject: [Jprogramming] Fast
  derivative of Softmax function
  >>
  >> Given an array, we can calculate the
  softmax function
  >> https://en.wikipedia.org/wiki/Softmax_function
  >>
  >> a =: 0.5 0.6
  0.23 0.66
  >> sm=:(] % +/ )@:^ NB.
  softmax
  >>
  >> sm
  a
  >> 0.247399 0.273418 0.188859
  0.290325
  >>
  >>
  The (partial) derivative of softmax is a little more
  complicated:
  >>
  >> If the array is of length N, we need
  an NxN matrix of partial derivatives where (in pseudo
  code)
  >>
  >>
  derivatives[i,j] = sm (array[i] )  *( 1 -
  sm(array[j])   if i == j
  >>
  or
  >> derivatives[i,j] =  -1 * sm
  (array[i] )  * ( sm(array[j])   if i != j
  >>
  >> ( see here
  for the reasoning: 
http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
  )
  >>
  >> My
  implementation of the partial derivatives is this:
  >>
  >>
  >> NB. x value is index, y value is the
  whole array
  >> dsoftmax=: 4 : 0
  >> idx=. x
  >>
  vals=. y
  >> smx=. idx softmax vals
  >> rx=. ''
  >> for_j. i.#vals do.
  >>  if. j = idx do. rx=. rx , smx * (1 -
  smx)
  >>  elseif. 1 do. rx=. rx ,(j
  softmax vals)* (0 - smx) end.
  >>
  end.
  >> rx
  >>
  )
  >>
  >>
  >> Then, for example using above array
  a,
  >>
  >> (i.# a)
  dsoftmax"0 _  a
  >>
  >> gives the values, in a 4x4 matrix.
  >>
  >> This is quite
  slow. I have tried to do this without iterating and
  branching, but cannot figure out a way to do it.
  >> Any help appreciated.
  >> Thanks,
  >>
  >> Jon
  >>
  ----------------------------------------------------------------------
  >> For information about J forums see http://www.jsoftware.com/forums.htm
  >>
  ----------------------------------------------------------------------
  >> For information about J forums see http://www.jsoftware.com/forums.htm
  >
  >
  ----------------------------------------------------------------------
  > For information about J forums see http://www.jsoftware.com/forums.htm
  ----------------------------------------------------------------------
  For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to