Jessy yes this is the idea exactly. I don't want to use the global average
because I don't want to include zero activated points in the cost value
neither on the gradient calculation. So the local pooling might help to
give the average on that local activated area.
by applying the sum the trace
I think this idea would be something like
y = [1, 2, 3, 0]
y_current_avgpool = (1 + 2 + 3 + 0) / 4
y_new_avgpool = (1 + 2 + 3) / 3
I'm not sure that there is a simple way to do this currently. You could do
sum pooling first, then compute the divisors by looking at the number of
non-zero