[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-11-02 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-435561378 @pengzhao-intel Thanks for your approval! This is an automated

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-11-01 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434961612 @pengzhao-intel I think what @rongzha1 is suggesting here is to have a special case for smaller sizes? I've shown that "memcpy"

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-10-31 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434862551 I don't think the workloads provided by @rongzha1 is suitable for determining the memory bandwidth, based on the following arguments:

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-10-30 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434555666 @rongzha1 BTW I don't think your machine is a representative example for most users'...

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-10-30 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434536712 @rongzha1 TBH that's also what I got on my side. my CPU has 4 physical cores, so when I'm using OMP_NUM_THREADS=4 I get the best

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-10-30 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434436260 @rongzha1 Your memory bandwidth seems suspiciously high to me (>100GB/s), are you using a special type of memory? You can measure the

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-10-30 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434408459 @rongzha1 From your code M should be num_cols(61400), that is a pretty big number.

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-10-29 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434043378 @rongzha1 Hi, all shapes here are inferred so they are guaranteed to satisfy divisibility requirements.

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-10-28 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-433747551 @pengzhao-intel script: ```Python import mxnet as mx import random from mxnet.test_utils import rand_ndarray,

[GitHub] HyperZealot commented on issue #12997: A better take forward kernel for CPU

2018-10-28 Thread GitBox
HyperZealot commented on issue #12997: A better take forward kernel for CPU URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-433739297 @pengzhao-intel Hello, I have encountered this problem on my own model so I don't want to share that directly, but I can provide a