Hi nouiz,
I've attached an example. You can run the same file on new and old conv2d
by removing the strides and padding.
On another note since I've upgrade gcc/g++ from 4.9 to 6.1 I've been having
problems trying to compile the theano tests. The issues are similar to
those here <https://github.com/Theano/Theano/issues/4955>. If you guys
could have a look and let us know how to proceed because everything else
I've tried have failed. Even when using the correct compiler and linker
settings grabbed from intel mkl link line advisor. The errors persist even
if I don't use intel mkl and simply link to openblas. Tried compiling test
without openblas simply g++ and still got errors.
On Saturday, September 10, 2016 at 12:56:54 AM UTC+1, nouiz wrote:
>
> I'm really surprised that the old one is faster in multiple cores then the
> new one. Can you provide a script that shot that? It could be just for some
> shapes
>
> Le 4 sept. 2016 12:41, <[email protected] <javascript:>> a écrit :
>
>> Thank you nouiz! Great to know. Does that imply that the new conv2d
>> doesn't use full parallelism capabilities as the old one?
>> BTW in both the experiments that I've run I've tested the new conv2d
>> against the old with openblas and mkl-intel libraries.
>> The results were the same the old one utilizes better the available
>> cores. The new one fluctuates a lot. But the down side is that
>> the old conv2d doesn't support arguments for strides and padding besides
>> the pre-defined ones.
>>
>> Thanks again.
>>
>>
>> On Thursday, September 1, 2016 at 4:55:05 PM UTC+1, nouiz wrote:
>>>
>>> They use different implementation. Make sure Theano use a parallel blas
>>> library. The new conv2d use it for part of the parallelism.
>>>
>>> Le 25 août 2016 21:53, <[email protected]> a écrit :
>>>
>>>> Hi everyone,
>>>>
>>>> I've recently come across some weird behaviour regarding the new
>>>> theano.tensor.nnet.conv2d
>>>> and the old
>>>> theano.tensor.nnet.conv.conv2d
>>>> convolution functions.
>>>>
>>>> I have 2 different models one uses the old the other the new conv2d
>>>> method.
>>>> The difference between the two is that the the model that uses the new
>>>> conv2d methods has more layers than the other one, plus that I've
>>>> explicitly defined padding and stride.
>>>>
>>>> Other than that everything else is the same. Number of data, training
>>>> algorithm, batchSize .... etc. pretty much the same.
>>>>
>>>> Once I execute them, the smaller model with the old conv2d method
>>>> utilizes all the cores in my system ;) great.
>>>> The bigger model with the new conv2d method doesn't, which is strange
>>>> because in this case the bigger the model the more resources would need.
>>>>
>>>> Are there any differences in the way the two conv2d methods utilize
>>>> openmp?
>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "theano-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.
import theano
import numpy as np
from theano import tensor as T
from theano.tensor.signal.pool import pool_2d
from theano.tensor.nnet import conv2d
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
from load import mnist
srng = RandomStreams(123)
rng = np.random.RandomState(123)
floatX = theano.config.floatX
def init_weights(shape, init_type, poolsize):
if len(shape) > 2:
fan_in = np.prod(shape[1:])
fan_out = (shape[0] * np.prod(shape[2:]) // np.prod(poolsize))
else:
fan_in = shape[0]
fan_out = shape[1]
return theano.shared(
rng.uniform(
low=-np.sqrt(6./(fan_in + fan_out)),
high=np.sqrt(6./(fan_in + fan_out)),
size=shape
).astype(dtype=floatX), allow_downcast=True, borrow=True)
def rectify(X):
return T.maximum(X, 0.)
def softmax(X):
e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))
return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')
def dropout(X, p=0.):
if p > 0:
retain_prob = 1 - p
X *= srng.binomial(X.shape, p=retain_prob, dtype=floatX)
X /= retain_prob
return X
def optimize(cost, params, lr=0.001, rho=0.9, epsilon=1e-6):
grads = T.grad(cost=cost, wrt=params)
updates = []
for p, g in zip(params, grads):
acc = theano.shared(p.get_value() * 0., allow_downcast=True)
acc_new = rho * acc + (1 - rho) * g ** 2
gradient_scaling = T.sqrt(acc_new + epsilon)
g = g / gradient_scaling
updates.append((acc, acc_new))
updates.append((p, p - lr * g))
return updates
def model(X, W, p_drop_conv, p_drop_hidden):
h1 = rectify(conv2d(X, W[0], border_mode=(3, 3), subsample=(3, 3)))
h1_1x1 = rectify(conv2d(h1, W[1], border_mode=(0, 0), subsample=(1, 1)))
h1_1x1 = rectify(conv2d(h1_1x1, W[2], border_mode=(0, 0), subsample=(1, 1)))
h1_pool = pool_2d(h1_1x1, ds=(3, 3), st=(2, 2), padding=(1, 1), ignore_border=True)
h1_drop = dropout(h1_pool, p_drop_conv)
h2 = rectify(conv2d(h1_drop, W[3], border_mode=(2,2), subsample=(1,1)))
h2_1x1 = rectify(conv2d(h2, W[4], border_mode=(0,0), subsample=(1,1)))
h2_1x1 = rectify(conv2d(h2_1x1, W[5], border_mode=(0,0), subsample=(1,1)))
h2_pool = pool_2d(h2_1x1, ds=(3, 3), st=(2,2), padding=(1,1), ignore_border=True)
h2_drop = dropout(h2_pool, p_drop_conv)
h3 = rectify(conv2d(h2_drop, W[6], border_mode=(1,1), subsample=(1,1)))
h3_1x1 = rectify(conv2d(h3, W[7], border_mode=(0,0), subsample=(1,1)))
h3_1x1 = rectify(conv2d(h3_1x1, W[8], border_mode=(0,0), subsample=(1,1)))
h3_pool = pool_2d(h3_1x1, ds=(3, 3), st=(2,2), padding=(1,1), ignore_border=True)
h3_drop = dropout(h3_pool, p_drop_conv)
h4 = rectify(conv2d(h3_drop, W[9], border_mode=(3, 3), subsample=(1, 1)))
h4_1x1 = rectify(conv2d(h4, W[10], border_mode=(0, 0), subsample=(1, 1)))
h4_1x1 = rectify(conv2d(h4_1x1, W[11], border_mode=(0, 0), subsample=(1, 1)))
h5_pool = pool_2d(h4_1x1, ds=(4, 4), st=(2, 2), padding=(1, 1),
mode='average_inc_pad', ignore_border=True)
h5_flat = T.flatten(h5_pool, outdim=2)
h5_drop = dropout(h5_flat, p_drop_conv)
h5_relu = rectify(T.dot(h5_drop, W[12]))
h5_drop = dropout(h5_relu, p_drop_hidden)
pyx = softmax(T.dot(h5_drop, W[13]))
return (h1_drop, h2_drop, h3_drop, h5_drop, pyx)
# load the data
trX, teX, trY, teY = mnist(onehot=True)
trX = trX.reshape(60000, 1, 28, 28)
teX = teX.reshape(10000, 1, 28, 28)
trY = trY.astype(dtype=floatX)
trY = trY.astype(dtype=floatX)
#
# build the model
X = T.tensor4(name='X').astype(dtype=floatX)
Y = T.matrix(name='Y').astype(dtype=floatX)
w1 = init_weights((96, 1, 7, 7), (2, 2))
w2 = init_weights((96, 96, 1, 1), (2, 2))
w3 = init_weights((96, 96, 1, 1), (2, 2))
w4 = init_weights((256, 96, 5, 5), (2, 2))
w5 = init_weights((256, 256, 1, 1), (2, 2))
w6 = init_weights((256, 256, 1, 1), (2, 2))
w7 = init_weights((384, 256, 3, 3), (2, 2))
w8 = init_weights((384, 384, 1, 1), (2, 2))
w9 = init_weights((384, 384, 1, 1), (2, 2))
w10 = init_weights((512, 384, 3, 3), (2, 2))
w11 = init_weights((512, 512, 1, 1), (2, 2))
w12 = init_weights((512, 512, 1, 1), (2, 2))
w13 = init_weights((512 * 3 * 3, 512), (2, 2))
w_o = init_weights((512, 10), (2, 2))
W = [w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12, w13, w_o]
drop_activation_layers = model(X, W, 0.2, 0.5)
drop_py_x = drop_activation_layers[4]
activation_layers = model(X, W, 0., 0.)
py_x = activation_layers[4]
y_x = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(drop_py_x, Y))
updates = optimize(cost, W, lr=0.01)
train = theano.function(inputs=[X, Y], outputs=cost, updates=updates,
allow_input_downcast=True, mode='FAST_RUN')
predict = theano.function(inputs=[X], outputs=y_x,
allow_input_downcast=True, mode='FAST_RUN')
def main_loop():
for i in range(100):
for start, end in zip(range(0, len(trX), 256),
range(256, len(trX), 256)):
cost = train(trX[start:end], trY[start:end])
print("epoch = {:d}, accuracy = {:.4f}, cost = {:.4f}".format(
i + 1, np.mean(np.argmax(teY, axis=1) == predict(teX)), cost
)
)
main_loop()