Re: [theano-users] difference between new and old conv2d regarding openmp utilization

martin . delgado1221 Sun, 11 Sep 2016 16:39:16 -0700

Hi nouiz,

I've attached an example. You can run the same file on new and old conv2d 
by removing the strides and padding.


On another note since I've upgrade gcc/g++ from 4.9 to 6.1 I've been having 
problems trying to compile the theano tests. The issues are similar to 
those here <https://github.com/Theano/Theano/issues/4955>. If you guys 
could have a look and let us know how to proceed because everything else 
I've tried have failed. Even when using the correct compiler and linker 
settings grabbed from intel mkl link line advisor. The errors persist even 
if I don't use intel mkl and simply link to openblas. Tried compiling test 
without openblas simply g++ and still got errors.



On Saturday, September 10, 2016 at 12:56:54 AM UTC+1, nouiz wrote:
>
> I'm really surprised that the old one is faster in multiple cores then the 
> new one. Can you provide a script that shot that? It could be just for some 
> shapes
>
> Le 4 sept. 2016 12:41, <[email protected] <javascript:>> a écrit :
>
>> Thank you nouiz! Great to know. Does that imply that the new conv2d 
>> doesn't use full parallelism capabilities as the old  one?
>> BTW in both the experiments that I've run I've tested the new conv2d 
>> against the old with openblas and mkl-intel libraries.
>> The results were the same the old one utilizes better the available 
>> cores. The new one fluctuates a lot. But the down side is that
>> the old conv2d doesn't support arguments for strides and padding besides 
>> the pre-defined ones.
>>
>> Thanks again.
>>
>>
>> On Thursday, September 1, 2016 at 4:55:05 PM UTC+1, nouiz wrote:
>>>
>>> They use different implementation. Make sure Theano use a parallel blas 
>>> library. The new conv2d use it for part of the parallelism.
>>>
>>> Le 25 août 2016 21:53, <[email protected]> a écrit :
>>>
>>>> Hi everyone,
>>>>
>>>> I've recently come across some weird behaviour regarding the new
>>>> theano.tensor.nnet.conv2d
>>>> and the old
>>>> theano.tensor.nnet.conv.conv2d
>>>> convolution functions.
>>>>
>>>> I have 2 different models one uses the old the other the new conv2d 
>>>> method.
>>>> The difference between the two is that the the model that uses the new 
>>>> conv2d methods has more layers than the other one, plus that I've 
>>>> explicitly defined padding and stride.
>>>>
>>>> Other than that everything else is the same. Number of data, training 
>>>> algorithm, batchSize .... etc. pretty much the same.
>>>>
>>>> Once I execute them, the smaller model with the old conv2d method 
>>>> utilizes all the cores in my system ;) great.
>>>> The bigger model with the new conv2d method doesn't, which is strange 
>>>> because in this case the bigger the model the more resources would need.
>>>>
>>>> Are there any differences in the way the two conv2d methods utilize 
>>>> openmp?
>>>>
>>>> -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "theano-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

import theano
import numpy as np
from theano import tensor as T
from theano.tensor.signal.pool import pool_2d
from theano.tensor.nnet import conv2d
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
from load import mnist

srng = RandomStreams(123)
rng = np.random.RandomState(123)

floatX = theano.config.floatX


def init_weights(shape, init_type, poolsize):
        if len(shape) > 2:
            fan_in = np.prod(shape[1:])
            fan_out = (shape[0] * np.prod(shape[2:]) // np.prod(poolsize))
        else:
            fan_in = shape[0]
            fan_out = shape[1]
        return theano.shared(
            rng.uniform(
                low=-np.sqrt(6./(fan_in + fan_out)),
                high=np.sqrt(6./(fan_in + fan_out)),
                size=shape
            ).astype(dtype=floatX), allow_downcast=True, borrow=True)


def rectify(X):
    return T.maximum(X, 0.)


def softmax(X):
    e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))
    return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')


def dropout(X, p=0.):
    if p > 0:
        retain_prob = 1 - p
        X *= srng.binomial(X.shape, p=retain_prob, dtype=floatX)
        X /= retain_prob
    return X


def optimize(cost, params, lr=0.001, rho=0.9, epsilon=1e-6):
    grads = T.grad(cost=cost, wrt=params)
    updates = []
    for p, g in zip(params, grads):
        acc = theano.shared(p.get_value() * 0., allow_downcast=True)
        acc_new = rho * acc + (1 - rho) * g ** 2
        gradient_scaling = T.sqrt(acc_new + epsilon)
        g = g / gradient_scaling
        updates.append((acc, acc_new))
        updates.append((p, p - lr * g))
    return updates


def model(X, W, p_drop_conv, p_drop_hidden):
    h1      = rectify(conv2d(X, W[0], border_mode=(3, 3), subsample=(3, 3)))
    h1_1x1  = rectify(conv2d(h1, W[1], border_mode=(0, 0), subsample=(1, 1)))
    h1_1x1  = rectify(conv2d(h1_1x1, W[2], border_mode=(0, 0), subsample=(1, 1)))
    h1_pool = pool_2d(h1_1x1, ds=(3, 3), st=(2, 2), padding=(1, 1), ignore_border=True)
    h1_drop = dropout(h1_pool, p_drop_conv)

    h2     = rectify(conv2d(h1_drop, W[3], border_mode=(2,2), subsample=(1,1)))
    h2_1x1 = rectify(conv2d(h2, W[4], border_mode=(0,0), subsample=(1,1)))
    h2_1x1 = rectify(conv2d(h2_1x1, W[5], border_mode=(0,0), subsample=(1,1)))
    h2_pool = pool_2d(h2_1x1, ds=(3, 3), st=(2,2), padding=(1,1), ignore_border=True)
    h2_drop = dropout(h2_pool, p_drop_conv)

    h3      = rectify(conv2d(h2_drop, W[6], border_mode=(1,1), subsample=(1,1)))
    h3_1x1  = rectify(conv2d(h3, W[7], border_mode=(0,0), subsample=(1,1)))
    h3_1x1  = rectify(conv2d(h3_1x1, W[8], border_mode=(0,0), subsample=(1,1)))
    h3_pool = pool_2d(h3_1x1, ds=(3, 3), st=(2,2), padding=(1,1), ignore_border=True)
    h3_drop = dropout(h3_pool, p_drop_conv)

    h4     = rectify(conv2d(h3_drop, W[9], border_mode=(3, 3), subsample=(1, 1)))
    h4_1x1 = rectify(conv2d(h4, W[10], border_mode=(0, 0), subsample=(1, 1)))
    h4_1x1 = rectify(conv2d(h4_1x1, W[11], border_mode=(0, 0), subsample=(1, 1)))

    h5_pool = pool_2d(h4_1x1, ds=(4, 4), st=(2, 2), padding=(1, 1),
                      mode='average_inc_pad', ignore_border=True)

    h5_flat = T.flatten(h5_pool, outdim=2)
    h5_drop = dropout(h5_flat, p_drop_conv)

    h5_relu = rectify(T.dot(h5_drop, W[12]))
    h5_drop = dropout(h5_relu, p_drop_hidden)

    pyx = softmax(T.dot(h5_drop, W[13]))
    return (h1_drop, h2_drop, h3_drop, h5_drop, pyx)

# load the data
trX, teX, trY, teY = mnist(onehot=True)
trX = trX.reshape(60000, 1, 28, 28)
teX = teX.reshape(10000, 1, 28, 28)
trY = trY.astype(dtype=floatX)
trY = trY.astype(dtype=floatX)
#
# build the model
X = T.tensor4(name='X').astype(dtype=floatX)
Y = T.matrix(name='Y').astype(dtype=floatX)

w1 = init_weights((96,     1, 7, 7), (2, 2))
w2 = init_weights((96,    96, 1, 1), (2, 2))
w3 = init_weights((96,    96, 1, 1), (2, 2))
w4 = init_weights((256,   96, 5, 5), (2, 2))
w5 = init_weights((256,  256, 1, 1), (2, 2))
w6 = init_weights((256,  256, 1, 1), (2, 2))
w7 = init_weights((384,  256, 3, 3), (2, 2))
w8 = init_weights((384,  384, 1, 1), (2, 2))
w9 = init_weights((384,  384, 1, 1), (2, 2))
w10 = init_weights((512,  384, 3, 3), (2, 2))
w11 = init_weights((512,  512, 1, 1), (2, 2))
w12 = init_weights((512,  512, 1, 1), (2, 2))
w13 = init_weights((512 * 3 * 3, 512), (2, 2))
w_o = init_weights((512, 10), (2, 2))

W = [w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12, w13, w_o]


drop_activation_layers = model(X, W, 0.2, 0.5)
drop_py_x = drop_activation_layers[4]
activation_layers = model(X, W, 0., 0.)
py_x = activation_layers[4]
y_x = T.argmax(py_x, axis=1)

cost = T.mean(T.nnet.categorical_crossentropy(drop_py_x, Y))
updates = optimize(cost, W, lr=0.01)

train = theano.function(inputs=[X, Y], outputs=cost, updates=updates,
                        allow_input_downcast=True, mode='FAST_RUN')

predict = theano.function(inputs=[X], outputs=y_x,
                          allow_input_downcast=True, mode='FAST_RUN')


def main_loop():
    for i in range(100):
        for start, end in zip(range(0, len(trX), 256),
                              range(256, len(trX), 256)):
            cost = train(trX[start:end], trY[start:end])
        print("epoch = {:d}, accuracy = {:.4f}, cost = {:.4f}".format(
            i + 1, np.mean(np.argmax(teY, axis=1) == predict(teX)), cost
            )
        )

main_loop()

Re: [theano-users] difference between new and old conv2d regarding openmp utilization

Reply via email to