This looks very interesting. Sorry, I am traveling until next week so cannot
give it much more than a quick look through at the moment. Next week I will try
to run it.
By the way, following you advice and issues you discovered with my convnet
(bias shape in particular), I am refactoring my source code. I am struggling to
get much more than 65% accuracy on cifar-10... very irritating.
It looks like your backprop padding is much nicer than mine. Once I have a
chance to look it over properly I will try to integrate that into my source
code.
Thanks,
Jon
On Tuesday, May 21, 2019, 12:11:51 PM GMT+9, Brian Schott
<[email protected]> wrote:
I have been developing a toy convolutional neural network patterned after
the toy non-convolutional nn discussed in this thread. The status of the
toy is produced below here for any comments.
NB. simpler_conv_test.ijs
NB. 5/19/19
NB. based on Jon Hough's simple_conv_test.ijs
NB. especially Jon's conv and convFunc in backprob
NB. and patterned after toy nn case study at
NB. http://cs231n.github.io/neural-networks-case-study/#net
NB. This example demonstrates a 3-layer neural net:
NB. an input layer receives 8x8 Black and White (binary)images,
NB. a hidden convolutional layer with 2 4x4 filters
NB. which stride 2 over the 8x8 input images with no zero-padding,
NB. and 3 classes in the softmax activated output layer.
NB. These exact features of the filter sizes and the image
NB. size and the stride and the lack of zero-padding,
NB. produce 9 readings by each filter.
NB. The names W and W2 for filters/weights are kept from the
NB. original toy case study which had only fully connected
NB. layers, and no convolutional layer.
NB. ==============================
NB. revised data to use only a single image channel instead of 3 channels
A1=:8
8$"."0'1111111100000000000000001111111100000000111111111111111100000000'
A2=:8 8$"."0'1111111100000000111111111111111100000000'
A3=:8 8$"."0'111111110000000000000000'
A4=:8 8$"."0'111111110000000000000000111111111111111100000000'
A5=: 2 |. A4
B1=: |:"2 A1
B2=: |:"2 A2
B3=: |:"2 A3
B4=: |:"2 A4
B5=: |:"2 A5
C1=:8
8$"."0'1000000101000010001001000001100000011000001001000100001010000001'
C2=:8
8$"."0'1000000001000000001000000001000000001000000001000000001000000001'
C3=:8
8$"."0'1010100001010100001010100001010110001010010001011010001001010001'
C4=: |."1 C3
C5=:8
8$"."0'1111000000111100000011111100001111110000001111000000111111000011'
A=: 5 8 8 $, A1, A2, A3, A4, A5
B=: 5 8 8 $, B1, B2, B3, B4, B5
C=: 5 8 8 $, C1, C2, C3, C4, C5
X =: INPUT=: A,B,C
Y =: 5#0 1 2 NB. specific target values for this case
NB. some utility verbs
dot =: +/ . *
probs =: (%+/)@:^"1 NB. used in softmax activiation
amnd =: (1-~{)`[`]}
mean =: +/ % #
rand01 =: ?.@$ 0: NB. ? replaced with ?. for demo purposes only
normalrand =: (2 o. [: +: [: o. rand01) * [: %: [: - [: +: [: ^. rand01
NB. Hough's backprop magic verb
deconv =: 4 : '(1 1,:5 5) x&(conv"2 _) tesl y'
conv =: +/@:,@:* NB. Jon Hough's tessellation verb for conv
tesl =: ;._3
NB. backprop requires some zero-padding, thus msk
msk =: (1j1 1j1 1)&# NB. creates zero-padding for 3x3 array
NB. tessellation requires and produces rectangular data
NB. so collapse and sqr reworks and creates such data
collapse =: (,"2)&(1 2&|:) NB. list from internal array
sqr =: $&,~}:@$,2&$@:%:@{:@$ NB. square array from list
NB. some training parameters
step_size =: 1e_0
reg =: 1e_3
N =: 5 NB. number of points per class
D =: 2 NB. dimensionality
K =: 3 NB. number of classes
bias =: 0
train =: dyad define
'W b W2 b2' =. x NB. weights and biases
num_examples =. #X
for_i. 1+i. y do. NB. y is number of batches
hidden_layer =. 0>.b +"1(2 2,:4 4) W&(conv"2 _) tesl"3 2 X
c_hidden_layer =. collapse hidden_layer
scores =. mean 0>.b2 +"1 (1 0 2|:c_hidden_layer) dot"2 W2
prob =. probs scores
correct_logprobs =. -^.Y}|:prob
data_loss =. (+/correct_logprobs)%num_examples
reg_loss =. (0.5*reg*+/,W*W) + 0.5*reg*+/,W2*W2
loss =. data_loss + reg_loss
if. 0=(y%10)|i do.
smoutput i,loss
end.
dscores =. prob
dscores =. Y amnd"0 1 dscores
dscores =. dscores%num_examples
dW2 =. 1 0 2|:(|:collapse hidden_layer)dot dscores
db2 =. mean dscores
NB. Mike Day fixed next line 5/16/19
dhidden =. (0<|:"2 c_hidden_layer)*dscores dot|:W2
padded =. msk"1 msk"2 sqr |:"2 dhidden
dW =. mean 0 3 1 2|: padded deconv"3 2 X
db =. mean mean dhidden
dW2 =. dW2 + reg*W2
dW =. dW + reg*W
W =. W-step_size * dW
b =. b-step_size * db
W2 =. W2-step_size * dW2
b2 =. b2-step_size * db2
end.
W;b;W2;b2
)
Note 'example start of training'
NB. 2 4 4 is the shape of W.
NB. That shape reflects 2 filters of wxh = 4x4
NB. over the 8x8 image layer
NB. The hidden layer is convolutional; W defines the
NB. weights (filters) between input layer and hidden layer.
$xW =. 0.026*normalrand 2 4 4 NB. 0.026 is wild-ass guess
2 4 4
NB. 2 9 3 is the shape of W2.
NB. W2 are weights that receive the 2 conv filters, and
NB. produce the 3 output classes, and each filter
NB. takes 9 readings over 8x8 image of 4x4 filters
NB. using a stride of 2.
NB. The output or scores layer is fully connected; W2
NB. defines the weights between the hidden and output layers.
$xW2 =. 0.026*normalrand 2 9 3
2 9 3
cc =. (xW;0 0;xW2;0 0 0) train 200 NB. start run like this
NB. then continue with 200 more batches with the following
cx =. cc train 200
)
Note 'check prediction proportion'
$h_l =. 0>.(>1{cc) +"1(2 2,:4 4) (>0{cc)&(conv"2 _) tesl"3 2 X
$c_h_l =. collapse h_l
$sc =. mean 0>.(>3{cc) +"1 (1 0 2|:c_h_l) dot"2 (>2{cc)
$predicted_class =. (i.>./)"1 sc
mean predicted_class = Y
)
--
(B=) <-----my sig
Brian Schott
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm