(Also answers Bill’s post, just in)
think I misled you. Brian’s “dot” is more correctly the matrix product, such as
2 3 (+/ . *)&i. 3 4
20 23 26 29
56 68 80 92
so we’re talking about dot =: +/ . *
In some cases, Brian needs to multiply an mxn matrix A, by a kxn matrix B for a
mxk result,
A dot |: B
In others, he needs C, shape mxn, by D, shape mxk, for an nxk result,
(|: C) dot D
and of course, some are straight matrix multiplications.
I defined Tdot =: |:@:[ +/ .* ] and dotT =: dot |:
Are matrix multiplications going to be enhanced? And what about such variants
as these?
Thanks,
Mik
Sent from my iPad
> On 16 May 2019, at 18:43, Henry Rich <[email protected]> wrote:
>
> In the next beta +/@:*"1 uses 256-bit instructions, which should help with
> dot-products.
>
> Henry rich
>
>> On 5/16/2019 1:27 PM, 'Mike Day' via Programming wrote:
>> I've tried various timings and tweaks - the dot products seem to consume the
>> most time;
>>
>> it's marginally worth dividing by "num_examples" after summing
>> "correct_logprobs" rather
>>
>> than summing the quotient, " correct_logprobs%num_examples "
>>
>> I added a couple of dot fns, Tdot =: |:@[ dot ] and dotT =: dot |:
>> to neaten up the code a bit. Those transposes seem unavoidable.
>>
>> In a practical application, you'd probably run cycles until either a
>> suitable level of convergence
>>
>> is achieved - or until it's obvious that the process is divergent.
>>
>> Cheers,
>>
>> Mike
>>
>>
>>> On 16/05/2019 15:20, Brian Schott wrote:
>>> Mike,
>>>
>>> Yes, I new the reason that the calculation was done, but was surprised by
>>> the manner in which these authors applied the calculation (without the
>>> multiplication) and I applied the Amend incorrectly, by not remembering
>>> that it was being applied to an array.
>>>
>>> And you are correct that the Amend approach is slower and more space
>>> consuming than the Product approach. I re-applied -- correctly, this time,
>>> finally🤞 -- the Amend approach on a 'dbstopped' version of `train` and
>>> got the following timings. In retrospect both methods require the condition
>>> check and then multiplying by 0 and 1 may be very fast relative to Amend's
>>> needs.
>>>
>>> mnd =: 0:`(I.@(0&>:)@[)`]}"1
>>> ((hidden_layer>0)*dscores dot|:W2)-:hidden_layer mnd dscores dot|:W2
>>> 1
>>> 10 timespacex'(hidden_layer>0)*dscores dot|:W2'
>>> 0.0004102 301568
>>> 10 timespacex'hidden_layer mnd dscores dot|:W2'
>>> 0.0006501 535360
>>>
>>> And btw, mnd1 =: 0:`(I.@(0>:[))`]}"1 using a fork is very slightly faster
>>> than mnd.
>>>
>>>
>>> Thanks, again,
>>>
>>> On Thu, May 16, 2019 at 5:32 AM 'Mike Day' via Programming <
>>> [email protected]> wrote:
>>>
>>>> The Python authors' comments here explain (well, they assert) why we're
>>>> doing that filtering for hidden_layer > 0:
>>>>
>>>> " Now we have the gradient on the outputs of the hidden layer. Next, we
>>>> have to backpropagate the ReLU non-linearity. This turns out to be easy
>>>> because ReLU during the backward pass is effectively a switch. Since
>>>> r=max(0,x) , we have that dr/dx = 1 (x>0) . Combined with the chain
>>>> rule, we see that the ReLU unit lets the gradient pass through unchanged
>>>> if its input was greater than 0, but kills it if its input was less than
>>>> zero [or equal to zero - Mike's edit] during the forward pass."
>>>>
>>>> Isn't it curious that the J-way of doing it,
>>>>
>>>> if. # ilow=. (<"1@:($ #: I.@:(0 >: ,))) hidden_layer do. NB. find
>>>> indices of elements <: 0
>>>> dhidden =. 0 ilow } dhidden
>>>> end.
>>>>
>>>> is much slower than the naive
>>>>
>>>> dhidden =. (hidden_layer >0) * dscores dotT W2
>>>> ?
>>>>
>>>> Mike
>>>>
>>>>
>>>> --
>>> (B=)
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm