HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-435561378
@pengzhao-intel Thanks for your approval!
This is an automated
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434961612
@pengzhao-intel I think what @rongzha1 is suggesting here is to have a
special case for smaller sizes?
I've shown that "memcpy"
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434862551
I don't think the workloads provided by @rongzha1 is suitable for
determining the memory bandwidth, based on the following arguments:
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434555666
@rongzha1 BTW I don't think your machine is a representative example for
most users'...
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434536712
@rongzha1 TBH that's also what I got on my side. my CPU has 4 physical
cores, so when I'm using OMP_NUM_THREADS=4 I get the best
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434436260
@rongzha1 Your memory bandwidth seems suspiciously high to me (>100GB/s),
are you using a special type of memory? You can measure the
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434408459
@rongzha1 From your code M should be num_cols(61400), that is a pretty big
number.
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-434043378
@rongzha1 Hi, all shapes here are inferred so they are guaranteed to satisfy
divisibility requirements.
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-433747551
@pengzhao-intel
script:
```Python
import mxnet as mx
import random
from mxnet.test_utils import rand_ndarray,
HyperZealot commented on issue #12997: A better take forward kernel for CPU
URL: https://github.com/apache/incubator-mxnet/pull/12997#issuecomment-433739297
@pengzhao-intel Hello, I have encountered this problem on my own model so I
don't want to share that directly, but I can provide a
10 matches
Mail list logo