Re: [theano-users] difference between new and old conv2d regarding openmp utilization

martin . delgado1221 Tue, 13 Sep 2016 20:17:27 -0700

Hi Fred,

I'm attaching the example file (tested this time, works) and also the load 
file for the dataset.
Change the path to the mnist dataset in the load.py file and run, should be 
fine this time.



On Tuesday, September 13, 2016 at 3:39:29 PM UTC+1, nouiz wrote:
>
> For the slow down, if you change the stride or padding pattern, then it is 
> normal that this change the speed. The old convolution support only some 
> strides pattern and don't support padding.
>
> The only way to compare the speed between the old and new back-end is with 
> the exact same parameter and by calling the 2 different implementation 
> (theano.tensor.nnet.conv2d vs theano.tensor.nnet.conv.conv2d).
>
> I can't run your code. It don't work. It miss other files and it don't 
> call the init_weight correctly.
>
> For g++, thanks for telling it. We should keep that discussion in that 
> thread. Probably Theano don't compile with it for now.
>
> Fred
>
> On Sun, Sep 11, 2016 at 7:38 PM, <[email protected] <javascript:>> 
> wrote:
>
>> Hi nouiz,
>>
>> I've attached an example. You can run the same file on new and old conv2d 
>> by removing the strides and padding.
>>
>> On another note since I've upgrade gcc/g++ from 4.9 to 6.1 I've been 
>> having problems trying to compile the theano tests. The issues are similar 
>> to those here <https://github.com/Theano/Theano/issues/4955>. If you 
>> guys could have a look and let us know how to proceed because everything 
>> else I've tried have failed. Even when using the correct compiler and 
>> linker settings grabbed from intel mkl link line advisor. The errors 
>> persist even if I don't use intel mkl and simply link to openblas. Tried 
>> compiling test without openblas simply g++ and still got errors.
>>
>>
>>
>> On Saturday, September 10, 2016 at 12:56:54 AM UTC+1, nouiz wrote:
>>>
>>> I'm really surprised that the old one is faster in multiple cores then 
>>> the new one. Can you provide a script that shot that? It could be just for 
>>> some shapes
>>>
>>> Le 4 sept. 2016 12:41, <[email protected]> a écrit :
>>>
>>>> Thank you nouiz! Great to know. Does that imply that the new conv2d 
>>>> doesn't use full parallelism capabilities as the old  one?
>>>> BTW in both the experiments that I've run I've tested the new conv2d 
>>>> against the old with openblas and mkl-intel libraries.
>>>> The results were the same the old one utilizes better the available 
>>>> cores. The new one fluctuates a lot. But the down side is that
>>>> the old conv2d doesn't support arguments for strides and padding 
>>>> besides the pre-defined ones.
>>>>
>>>> Thanks again.
>>>>
>>>>
>>>> On Thursday, September 1, 2016 at 4:55:05 PM UTC+1, nouiz wrote:
>>>>>
>>>>> They use different implementation. Make sure Theano use a parallel 
>>>>> blas library. The new conv2d use it for part of the parallelism.
>>>>>
>>>>> Le 25 août 2016 21:53, <[email protected]> a écrit :
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I've recently come across some weird behaviour regarding the new
>>>>>> theano.tensor.nnet.conv2d
>>>>>> and the old
>>>>>> theano.tensor.nnet.conv.conv2d
>>>>>> convolution functions.
>>>>>>
>>>>>> I have 2 different models one uses the old the other the new conv2d 
>>>>>> method.
>>>>>> The difference between the two is that the the model that uses the 
>>>>>> new conv2d methods has more layers than the other one, plus that I've 
>>>>>> explicitly defined padding and stride.
>>>>>>
>>>>>> Other than that everything else is the same. Number of data, training 
>>>>>> algorithm, batchSize .... etc. pretty much the same.
>>>>>>
>>>>>> Once I execute them, the smaller model with the old conv2d method 
>>>>>> utilizes all the cores in my system ;) great.
>>>>>> The bigger model with the new conv2d method doesn't, which is strange 
>>>>>> because in this case the bigger the model the more resources would need.
>>>>>>
>>>>>> Are there any differences in the way the two conv2d methods utilize 
>>>>>> openmp?
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "theano-users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>> -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "theano-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "theano-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

import numpy as np
import os

datasets_dir = '../media/datasets/'


def one_hot(x, n):
    if type(x) == list:
        x = np.array(x)
    x = x.flatten()
    o_h = np.zeros((len(x), n))
    o_h[np.arange(len(x)), x] = 1
    return o_h


def mnist(ntrain=60000, ntest=10000, onehot=True):
    data_dir = os.path.join(datasets_dir, 'mnist/')
    fd = open(os.path.join(data_dir, 'train-images-idx3-ubyte'))
    loaded = np.fromfile(file=fd, dtype=np.uint8)
    trX = loaded[16:].reshape((60000, 28*28)).astype(dtype='float32')

    fd = open(os.path.join(data_dir, 'train-labels-idx1-ubyte'))
    loaded = np.fromfile(file=fd, dtype=np.uint8)
    trY = loaded[8:].reshape((60000))

    fd = open(os.path.join(data_dir, 't10k-images-idx3-ubyte'))
    loaded = np.fromfile(file=fd, dtype=np.uint8)
    teX = loaded[16:].reshape((10000, 28*28)).astype(dtype='float32')

    fd = open(os.path.join(data_dir,'t10k-labels-idx1-ubyte'))
    loaded = np.fromfile(file=fd, dtype=np.uint8)
    teY = loaded[8:].reshape((10000))

    trX = trX/255.
    teX = teX/255.

    trX = trX[:ntrain]
    trY = trY[:ntrain]

    teX = teX[:ntest]
    teY = teY[:ntest]

    if onehot:
        trY = one_hot(trY, 10)
        teY = one_hot(teY, 10)
    else:
        trY = np.asarray(trY)
        teY = np.asarray(teY)

    return trX, teX, trY, teY

import theano
import numpy as np
from theano import tensor as T
from theano.tensor.signal.pool import pool_2d
from theano.tensor.nnet import conv2d
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
from load import mnist

srng = RandomStreams(123)
rng = np.random.RandomState(123)

floatX = theano.config.floatX


def init_weights(shape, poolsize):
        if len(shape) > 2:
            fan_in = np.prod(shape[1:])
            fan_out = (shape[0] * np.prod(shape[2:]) // np.prod(poolsize))
        else:
            fan_in = shape[0]
            fan_out = shape[1]
        return theano.shared(
            rng.uniform(
                low=-np.sqrt(6./(fan_in + fan_out)),
                high=np.sqrt(6./(fan_in + fan_out)),
                size=shape
            ).astype(dtype=floatX), allow_downcast=True, borrow=True)


def rectify(X):
    return T.maximum(X, 0.)


def softmax(X):
    e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))
    return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')


def dropout(X, p=0.):
    if p > 0:
        retain_prob = 1 - p
        X *= srng.binomial(X.shape, p=retain_prob, dtype=floatX)
        X /= retain_prob
    return X


def optimize(cost, params, lr=0.001, rho=0.9, epsilon=1e-6):
    grads = T.grad(cost=cost, wrt=params)
    updates = []
    for p, g in zip(params, grads):
        acc = theano.shared(p.get_value() * 0., allow_downcast=True)
        acc_new = rho * acc + (1 - rho) * g ** 2
        gradient_scaling = T.sqrt(acc_new + epsilon)
        g = g / gradient_scaling
        updates.append((acc, acc_new))
        updates.append((p, p - lr * g))
    return updates


def model(X, W, p_drop_conv, p_drop_hidden):
    h1      = rectify(conv2d(X, W[0], border_mode=(3, 3), subsample=(3, 3)))
    h1_1x1  = rectify(conv2d(h1, W[1], border_mode=(0, 0), subsample=(1, 1)))
    h1_1x1  = rectify(conv2d(h1_1x1, W[2], border_mode=(0, 0), subsample=(1, 1)))
    h1_pool = pool_2d(h1_1x1, ds=(3, 3), st=(2, 2), padding=(1, 1), ignore_border=True)
    h1_drop = dropout(h1_pool, p_drop_conv)

    h2     = rectify(conv2d(h1_drop, W[3], border_mode=(2,2), subsample=(1,1)))
    h2_1x1 = rectify(conv2d(h2, W[4], border_mode=(0,0), subsample=(1,1)))
    h2_1x1 = rectify(conv2d(h2_1x1, W[5], border_mode=(0,0), subsample=(1,1)))
    h2_pool = pool_2d(h2_1x1, ds=(3, 3), st=(2,2), padding=(1,1), ignore_border=True)
    h2_drop = dropout(h2_pool, p_drop_conv)

    h3      = rectify(conv2d(h2_drop, W[6], border_mode=(1,1), subsample=(1,1)))
    h3_1x1  = rectify(conv2d(h3, W[7], border_mode=(0,0), subsample=(1,1)))
    h3_1x1  = rectify(conv2d(h3_1x1, W[8], border_mode=(0,0), subsample=(1,1)))
    h3_pool = pool_2d(h3_1x1, ds=(3, 3), st=(2,2), padding=(1,1), ignore_border=True)
    h3_drop = dropout(h3_pool, p_drop_conv)

    h4     = rectify(conv2d(h3_drop, W[9], border_mode=(3, 3), subsample=(1, 1)))
    h4_1x1 = rectify(conv2d(h4, W[10], border_mode=(0, 0), subsample=(1, 1)))
    h4_1x1 = rectify(conv2d(h4_1x1, W[11], border_mode=(0, 0), subsample=(1, 1)))

    h5_pool = pool_2d(h4_1x1, ds=(4, 4), st=(2, 2), padding=(1, 1),
                      mode='average_inc_pad', ignore_border=True)

    h5_flat = T.flatten(h5_pool, outdim=2)
    h5_drop = dropout(h5_flat, p_drop_conv)

    h5_relu = rectify(T.dot(h5_drop, W[12]))
    h5_drop = dropout(h5_relu, p_drop_hidden)

    pyx = softmax(T.dot(h5_drop, W[13]))
    return (h1_drop, h2_drop, h3_drop, h5_drop, pyx)

# load the data
trX, teX, trY, teY = mnist(onehot=True)
trX = trX.reshape(60000, 1, 28, 28)
teX = teX.reshape(10000, 1, 28, 28)
trY = trY.astype(dtype=floatX)
trY = trY.astype(dtype=floatX)
#
# build the model
X = T.tensor4(name='X').astype(dtype=floatX)
Y = T.matrix(name='Y').astype(dtype=floatX)

w1 = init_weights((96,     1, 7, 7), (2, 2))
w2 = init_weights((96,    96, 1, 1), (2, 2))
w3 = init_weights((96,    96, 1, 1), (2, 2))
w4 = init_weights((256,   96, 5, 5), (2, 2))
w5 = init_weights((256,  256, 1, 1), (2, 2))
w6 = init_weights((256,  256, 1, 1), (2, 2))
w7 = init_weights((384,  256, 3, 3), (2, 2))
w8 = init_weights((384,  384, 1, 1), (2, 2))
w9 = init_weights((384,  384, 1, 1), (2, 2))
w10 = init_weights((512,  384, 3, 3), (2, 2))
w11 = init_weights((512,  512, 1, 1), (2, 2))
w12 = init_weights((512,  512, 1, 1), (2, 2))
w13 = init_weights((512 * 3 * 3, 512), (2, 2))
w_o = init_weights((512, 10), (2, 2))

W = [w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12, w13, w_o]


drop_activation_layers = model(X, W, 0.2, 0.5)
drop_py_x = drop_activation_layers[4]
activation_layers = model(X, W, 0., 0.)
py_x = activation_layers[4]
y_x = T.argmax(py_x, axis=1)

cost = T.mean(T.nnet.categorical_crossentropy(drop_py_x, Y))
updates = optimize(cost, W, lr=0.01)

train = theano.function(inputs=[X, Y], outputs=cost, updates=updates,
                        allow_input_downcast=True, mode='FAST_RUN')

predict = theano.function(inputs=[X], outputs=y_x,
                          allow_input_downcast=True, mode='FAST_RUN')


def main_loop():
    for i in range(100):
        for start, end in zip(range(0, len(trX), 256),
                              range(256, len(trX), 256)):
            cost = train(trX[start:end], trY[start:end])
        print("epoch = {:d}, accuracy = {:.4f}, cost = {}".format(
            i + 1, np.mean(np.argmax(teY, axis=1) == predict(teX)), cost
            )
        )

main_loop()

Re: [theano-users] difference between new and old conv2d regarding openmp utilization

Reply via email to