This looks very interesting. Sorry, I am traveling until next week so cannot 
give it much more than a quick look through at the moment. Next week I will try 
to run it.
By the way, following you advice and issues you discovered with my convnet 
(bias shape in particular), I am refactoring my source code. I am struggling to 
get much more than 65% accuracy on cifar-10... very irritating.

It looks like your backprop padding is much nicer than mine. Once I have a 
chance to look it over properly I will try to integrate that into my source 
code.

Thanks,
Jon

     On Tuesday, May 21, 2019, 12:11:51 PM GMT+9, Brian Schott 
<[email protected]> wrote:  
 
 I have been developing a toy convolutional neural network patterned after
the toy non-convolutional nn discussed in this thread. The status of the
toy is produced below here for any comments.

NB. simpler_conv_test.ijs
NB. 5/19/19
NB. based on Jon Hough's simple_conv_test.ijs
NB. especially Jon's conv and convFunc in backprob
NB. and patterned after toy nn case study at
NB. http://cs231n.github.io/neural-networks-case-study/#net

NB. This example demonstrates a 3-layer neural net:
NB.  an input layer receives 8x8 Black and White (binary)images,
NB.  a hidden convolutional layer with 2 4x4 filters
NB.  which stride 2 over the 8x8 input images with no zero-padding,
NB.  and 3 classes in the softmax activated output layer.
NB. These exact features of the filter sizes and the image
NB.  size and the stride and the lack of zero-padding,
NB.  produce 9 readings by each filter.
NB. The names W and W2 for filters/weights are kept from the
NB.  original toy case study which had only fully connected
NB.  layers, and no convolutional layer.

NB. ==============================
NB. revised data to use only a single image channel instead of 3 channels
A1=:8
8$"."0'1111111100000000000000001111111100000000111111111111111100000000'
A2=:8 8$"."0'1111111100000000111111111111111100000000'
A3=:8 8$"."0'111111110000000000000000'
A4=:8 8$"."0'111111110000000000000000111111111111111100000000'
A5=: 2 |. A4

B1=: |:"2 A1
B2=: |:"2 A2
B3=: |:"2 A3
B4=: |:"2 A4
B5=: |:"2 A5

C1=:8
8$"."0'1000000101000010001001000001100000011000001001000100001010000001'
C2=:8
8$"."0'1000000001000000001000000001000000001000000001000000001000000001'
C3=:8
8$"."0'1010100001010100001010100001010110001010010001011010001001010001'
C4=: |."1 C3
C5=:8
8$"."0'1111000000111100000011111100001111110000001111000000111111000011'

A=: 5 8 8 $, A1, A2, A3, A4, A5
B=: 5 8 8 $, B1, B2, B3, B4, B5
C=: 5 8 8 $, C1, C2, C3, C4, C5
X =: INPUT=: A,B,C
Y =: 5#0 1 2                NB. specific target values for this case

NB. some utility verbs
dot =: +/ . *
probs =: (%+/)@:^"1          NB. used in softmax activiation
amnd =: (1-~{)`[`]}
mean =: +/ % #
rand01 =: ?.@$ 0:  NB. ? replaced with ?. for demo purposes only
normalrand =: (2 o. [: +: [: o. rand01) * [: %: [: - [: +: [: ^. rand01
NB. Hough's backprop magic verb
deconv =: 4 : '(1 1,:5 5) x&(conv"2 _) tesl y'
conv =: +/@:,@:*            NB. Jon Hough's tessellation verb for conv
tesl =: ;._3
NB. backprop requires some zero-padding, thus msk
msk =: (1j1 1j1 1)&#        NB. creates zero-padding for 3x3 array
NB. tessellation requires and produces rectangular data
NB.  so collapse and sqr reworks and creates such data
collapse =: (,"2)&(1 2&|:)  NB. list from internal array
sqr =: $&,~}:@$,2&$@:%:@{:@$ NB. square array from list

NB. some training parameters
step_size =: 1e_0
reg =: 1e_3
N =: 5 NB. number of points per class
D =: 2 NB. dimensionality
K =: 3 NB. number of classes
bias =: 0


train =: dyad define
'W b W2 b2' =. x    NB. weights and biases
num_examples =. #X
for_i. 1+i. y do.    NB. y is number of batches
    hidden_layer =. 0>.b +"1(2 2,:4 4) W&(conv"2 _) tesl"3 2 X
    c_hidden_layer =. collapse hidden_layer
    scores =. mean 0>.b2 +"1 (1 0 2|:c_hidden_layer) dot"2 W2

    prob =. probs scores
    correct_logprobs =. -^.Y}|:prob

    data_loss =. (+/correct_logprobs)%num_examples
    reg_loss =. (0.5*reg*+/,W*W) + 0.5*reg*+/,W2*W2
    loss =. data_loss + reg_loss
    if. 0=(y%10)|i do.
    smoutput i,loss
    end.
    dscores =. prob
    dscores =. Y amnd"0 1 dscores
    dscores =. dscores%num_examples

    dW2 =. 1 0 2|:(|:collapse hidden_layer)dot dscores
    db2 =. mean dscores

    NB. Mike Day fixed next line 5/16/19
    dhidden =. (0<|:"2 c_hidden_layer)*dscores dot|:W2
    padded =. msk"1 msk"2 sqr |:"2 dhidden
    dW =. mean 0 3 1 2|: padded deconv"3 2  X
    db =. mean mean dhidden

    dW2 =. dW2 + reg*W2
    dW =. dW + reg*W

    W =. W-step_size * dW
    b =. b-step_size * db

    W2 =. W2-step_size * dW2
    b2 =. b2-step_size * db2

end.
W;b;W2;b2
)

Note 'example start of training'
NB. 2 4 4 is the shape of W.
NB.  That shape reflects 2 filters of wxh = 4x4
NB.  over the 8x8 image layer
NB. The hidden layer is convolutional; W defines the
NB.  weights (filters) between input layer and hidden layer.
  $xW =. 0.026*normalrand 2 4 4  NB. 0.026 is wild-ass guess
2 4 4

NB. 2 9 3 is the shape of W2.
NB.  W2 are weights that receive the 2 conv filters, and
NB.  produce the 3 output classes, and each filter
NB.  takes 9 readings over 8x8 image of 4x4 filters
NB.  using a stride of 2.
NB. The output or scores layer is fully connected; W2
NB.  defines the weights between the hidden and output layers.
  $xW2 =. 0.026*normalrand 2 9 3
2 9 3
  cc =. (xW;0 0;xW2;0 0 0) train 200  NB. start run like this
  NB. then continue with 200 more batches with the following
  cx =. cc train 200
)

Note 'check prediction proportion'
  $h_l =. 0>.(>1{cc) +"1(2 2,:4 4) (>0{cc)&(conv"2 _) tesl"3 2 X
  $c_h_l =. collapse h_l
  $sc =. mean 0>.(>3{cc) +"1 (1 0 2|:c_h_l) dot"2 (>2{cc)
  $predicted_class =. (i.>./)"1 sc
  mean predicted_class = Y

)



-- 
(B=) <-----my sig
Brian Schott
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
  
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to