Yes, thanks for the warning.
I remember this one:
(^&_2) d. _1
and it also seems this is, too, is a bug,
(%@:*:) d. _1
I once tried to figure out how J implements d. and D., but unfortunately I
found J's C source code impenetrable.
I assume fixing d. and D. are not priorities for new J releases.
Regards,
Jon
--------------------------------------------
On Tue, 2/28/17, Henry Rich <[email protected]> wrote:
Subject: Re: [Jprogramming] Fast derivative of Softmax function
To: [email protected]
Date: Tuesday, February 28, 2017, 8:47 AM
The Bugs page is
festooned with anomalies in D. and d., especially
higher derivatives. Have a look at it before
you get too far committed.
Henry Rich
On
2/27/2017 6:45 PM, 'Jon Hough' via Programming
wrote:
> Thanks, the dsoftmax is going to
be used for a toy Multilayer Perceptron Classifier I am
writing. Using my original dsoftmax, which is being called
hundreds of thousands of times, it took way too long.
>
> Using sm D. 1 has
made things quite a bit faster.
>
> Thanks.
>
--------------------------------------------
> On Tue, 2/28/17, Raul Miller <[email protected]>
wrote:
>
> Subject: Re: [Jprogramming]
Fast derivative of Softmax function
> To: "Programming
forum" <[email protected]>
> Date: Tuesday, February 28,
2017, 12:24 AM
>
> I was about to point out
> the same thing.
>
>
a =: 0.5 0.6 0.23 0.66
> sm=:(] %
+/ )@:^ NB. softmax
> softmax=: {
sm
> dsoftmax=: 4 : 0
> idx=. x
>
> vals=. y
>
> smx=.
idx softmax vals
>
> rx=. ''
>
> for_j.
i.#vals do.
>
> if. j = idx do. rx=. rx , smx * (1
- smx)
> elseif. 1 do. rx=.
rx ,(j
> softmax vals)* (0 -
smx) end.
>
>
end.
> rx
> )
>
>
sm D.1 a
>
> 0.186192 _0.0676431
_0.0467234 _0.0718259
> _0.0676431 0.19866
_0.0516374
> _0.0793799
> _0.0467234
> _0.0516374 0.153191
_0.0548305
> _0.0718259
_0.0793799
> _0.0548305 0.206036
> (i.# a) dsoftmax"0 _ a
> 0.186192 _0.0676431
_0.0467234 _0.0718259
> _0.0676431 0.19866
_0.0516374
> _0.0793799
> _0.0467234
> _0.0516374 0.153191
_0.0548305
> _0.0718259
_0.0793799
> _0.0548305 0.206036
>
> The speedup is not too
impressive, but it is a
> speedup (probably
> because we are retaining
> and reusing all results from
sm rather than
> recomputing
it so many times -- I imagine using
> sm directly and
> lifting it out of the loop
> ):
>
>
timespacex
> '(i.# a)
dsoftmax"0 _ a'
> 5.5e_5 6528
> timespacex 'sm D.1 a'
> 2.3e_5 7424
>
> That said, note that we can
approximate this
> speedup
by using a
> variant on what
Pascal
> proposed:
>
> d_softmax=: 4 :
> 0
>
rx=. i.0 0
>
smv=. sm
> y
>
for_i. x do.
>
ry=.
> i.0
> smx=. i { smv
>
>
for_j. i.#y do.
>
if. j_index =
> i_index do. ry=. ry , smx *
(1 - smx)
>
> else. ry=. ry ,(j {smv)*
(0 - smx) end.
>
end.
>
rx=.rx, ry
>
end.
>
rx
> )
>
>
(i.# a) d_softmax a
>
0.186192 _0.0676431 _0.0467234 _0.0718259
> _0.0676431 0.19866
_0.0516374
> _0.0793799
> _0.0467234
> _0.0516374 0.153191
_0.0548305
> _0.0718259
_0.0793799
> _0.0548305 0.206036
> timespacex '(i.# a)
> d_softmax a'
> 3e_5 6016
>
> (Remember that it's
> generally a good idea to
ignore speedups which are
> less than a factor of 2,
because of scheduling
> issues within the
> machine itself - you can
> see this by inspecting
multiple timing runs)
>
> timespacex
> '(i.# a) d_softmax
a'
> 3e_5 6016
> timespacex 'sm D.1 a'
> 2.3e_5 7424
> timespacex '(i.# a)
> d_softmax a'
> 2.9e_5 6016
> timespacex 'sm D.1 a'
> 2.3e_5 7424
> timespacex '(i.# a)
> d_softmax a'
> 2.8e_5 6016
> timespacex 'sm D.1 a'
> 3.7e_5 7424
> timespacex '(i.# a)
> d_softmax a'
> 3.2e_5 6016
>
> I hope this helps,
>
> --
> Raul
>
> On Mon, Feb 27, 2017 at
8:05
> AM, Louis de Forcrand
<[email protected]>
> wrote:
> > You probably know about
it, but
> I'll mention
it anyway: there's a primitive partial
> derivative operator in J. I
think it would do exactly what
> you want (numerically), and
it's probably reasonably
> fast. It's not too hard
to use either:
> >
> > dsoftmax=: sm D.1
> >
> > Louis
> >
> >> On 27 Feb 2017,
> at 10:20, 'Pascal
Jasmin' via Programming <[email protected]>
> wrote:
> >>
> >> one
> optimization is removing the
rank"0 _, so that the
> function not need to be
reparsed for each x
> >>
> >> untested.
> >>
> >>
> >> dsoftmax=: 4 : 0
> >> rx=. ''
> >> for_i. x do.
> >>
> smx=. i softmax y
> >> for_j. i.#vals
> do.
> >> if. j_index. =
i_index. do.
> rx=. rx , smx
* (1 - smx)
> >>
else.
> rx=. rx ,(j softmax
y)* (0 - smx) end.
> >> end. end.
> >>
> rx
> >> )
> >>
> >>
> >> -----
> Original Message -----
> >> From:
> 'Jon Hough' via
Programming <[email protected]>
> >> To: Programming
Forum <[email protected]>
> >> Sent: Monday,
February 27, 2017 3:09
> AM
> >> Subject:
[Jprogramming] Fast
> derivative of Softmax
function
> >>
> >> Given an array, we
can calculate the
> softmax
function
> >> https://en.wikipedia.org/wiki/Softmax_function
> >>
> >> a =: 0.5 0.6
> 0.23 0.66
> >> sm=:(] % +/ )@:^
NB.
> softmax
> >>
> >> sm
> a
> >> 0.247399 0.273418
0.188859
> 0.290325
> >>
> >>
> The (partial) derivative of
softmax is a little more
> complicated:
> >>
> >> If the array is of
length N, we need
> an NxN
matrix of partial derivatives where (in pseudo
> code)
> >>
> >>
> derivatives[i,j] = sm
(array[i] ) *( 1 -
> sm(array[j]) if
i == j
> >>
> or
> >> derivatives[i,j] =
-1 * sm
> (array[i] ) * (
sm(array[j]) if i != j
> >>
> >> ( see here
> for the reasoning:
http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
> )
> >>
> >> My
> implementation of the partial
derivatives is this:
> >>
> >>
> >> NB. x value is
index, y value is the
> whole array
> >> dsoftmax=: 4 : 0
> >> idx=. x
> >>
> vals=. y
> >> smx=. idx softmax
vals
> >> rx=.
''
> >> for_j.
i.#vals do.
> >> if.
j = idx do. rx=. rx , smx * (1 -
> smx)
> >> elseif. 1 do. rx=.
rx ,(j
> softmax vals)* (0 -
smx) end.
> >>
> end.
> >> rx
> >>
> )
> >>
> >>
> >> Then, for example
using above array
> a,
> >>
> >> (i.# a)
> dsoftmax"0 _ a
> >>
> >> gives the values, in
a 4x4 matrix.
> >>
> >> This is quite
> slow. I have tried to do this
without iterating and
> branching, but cannot figure
out a way to do it.
> >> Any help
appreciated.
> >>
Thanks,
> >>
> >> Jon
> >>
> ----------------------------------------------------------------------
> >> For information
about J forums see http://www.jsoftware.com/forums.htm
> >>
> ----------------------------------------------------------------------
> >> For information
about J forums see http://www.jsoftware.com/forums.htm
> >
> >
> ----------------------------------------------------------------------
> > For information about J
forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J
forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm