Re: [PyCUDA] Create gpuarrays on different GPUs

2018-05-27 Thread Zhangsheng Lai
My 'can access' simply means that I'm able to access the values in the
variable in python by typing x1 or x2. My understanding is that if the
variables are stored on different GPUs, then I should be able to type x1
and get its values when ctx1 is active and similarly, I can type x2 and get
the x2 values when ctx2 is active, not when ctx1 is active.

On 24 May 2018 at 18:56, Andreas Kloeckner <li...@informa.tiker.net> wrote:

> Zhangsheng Lai <dunno@gmail.com> writes:
> > with the setup above, I tried to check by poping ctx2 and pushing ctx1,
> can
> > I access x1 and not x2 and vice versa, popping ctx1 and pushing ctx2, I
> can
> > access x2 and not x1. However, I realise that I can access x1 and x2 in
> > both contexts.
>
> Can you clarify what you mean by 'can access'? I'm guessing 'submit
> kernel launches with that pointer as an argument'?
>
> Andreas
>
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda


[PyCUDA] Create gpuarrays on different GPUs

2018-05-23 Thread Zhangsheng Lai
Hi,

I'm trying to create different GPU arrays on different GPUs.

```
import pycuda
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import pycuda.curandom as curandom

d = 2 ** 15

cuda.init()
dev1 = cuda.Device(1)
ctx1 = dev1.make_context()

curng1 = curandom.XORWOWRandomNumberGenerator()

x1 = curng1.gen_normal((d,d), dtype = np.float32) # so x1 is stored in GPU
1 memory

ctx1.pop() # clearing ctx of GPU1

dev2 = cuda.Device(1)
ctx2 = dev2.make_context()

curng2 = curandom.XORWOWRandomNumberGenerator()

x2 = curng2.gen_normal((d,d), dtype = np.float32) # so x2 is stored in GPU 2

```

with the setup above, I tried to check by poping ctx2 and pushing ctx1, can
I access x1 and not x2 and vice versa, popping ctx1 and pushing ctx2, I can
access x2 and not x1. However, I realise that I can access x1 and x2 in
both contexts.

Thus I'm wondering if my assumptions of x1 stored in GPU1 and x2 stored in
GPU2 are correct, or if it is actually the UVA and peer access that allows
me to access both x1 and x2 even if only one of the two ctx is active.

Thanks,
Zhangsheng
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda


Re: [PyCUDA] Invalid resource handle error

2018-05-11 Thread Zhangsheng Lai
I'm not very familar with cuda, thus I wld like to ask if you have any
guesses on what is leading to my on-device segfault?
I'm guessing that saving the ctx in the GPU thread class, pushing and
poping it before I run my code earlier might have caused it.
If so, is there any way I can avoid it?

Many thanks,
Zhangsheng

On 12 May 2018 at 12:34, Andreas Kloeckner <li...@informa.tiker.net> wrote:

> Zhangsheng Lai <dunno@gmail.com> writes:
>
> > Hi,
> >
> > I'm trying to do some updates to a state which is a binary array. gputid
> is
> > a GPU thread class (https://wiki.tiker.net/PyCuda/Examples/
> MultipleThreads)
> > and it stores the state and the index of the array to be updated in
> another
> > class which can be accessed with gputid.mp.x_gpu and gputid.mp.neuron_gpu
> > respectively. Below is my kernel that takes in the gputid and performs
> the
> > update of the state. However, it the output of the code is not consistent
> > as it runs into errors and executes perfectly when i run it multiple
> times.
> > The error msg makes no sense to me:
> >
> > File "/root/anaconda3/lib/python3.6/site-packages/pycuda/driver.py",
> line
> > 447, in function_prepared_call
> > func._set_block_shape(*block)
> > pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource
> > handle
>
> I think the right way to interpret this is that if you cause an
> on-device segfault, the GPU context dies, and all the handles of objects
> contained in it (including the function) become invalid.
>
> HTH,
> Andreas
>
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda


[PyCUDA] Invalid resource handle error

2018-05-11 Thread Zhangsheng Lai
Hi,

I'm trying to do some updates to a state which is a binary array. gputid is
a GPU thread class (https://wiki.tiker.net/PyCuda/Examples/MultipleThreads)
and it stores the state and the index of the array to be updated in another
class which can be accessed with gputid.mp.x_gpu and gputid.mp.neuron_gpu
respectively. Below is my kernel that takes in the gputid and performs the
update of the state. However, it the output of the code is not consistent
as it runs into errors and executes perfectly when i run it multiple times.
The error msg makes no sense to me:

File "/root/anaconda3/lib/python3.6/site-packages/pycuda/driver.py", line
447, in function_prepared_call
func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource
handle


My code:

def local_update(gputid):
mod = SourceModule("""
__global__ void local_update(int *x_gpu, float *n_gpu)
{
int tid = threadIdx.x + blockDim.x * blockIdx.x;

if (tid == (int)(n_gpu[0]))
{
x_gpu[tid] = 1 - x_gpu[tid];
}
}
""")

gputid.ctx.push()
x_gpu = gputid.mp.x_gpu
n_gpu = gputid.mp.neuron_gpu

func = mod.get_function("local_update")
func.prepare("PP")


grid = (1,1)
block = (gputid.mp.d,1,1)

func.prepared_call(grid, block, x_gpu, n_gpu)
gputid.ctx.pop()
print ('1Pain')
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda


[PyCUDA] cuModuleLoadDataEx failed: device kernel image is invalid

2018-04-18 Thread Zhangsheng Lai
I'm encountering this error as I run my code on the same docker environment
but on different workstations.

```
Traceback (most recent call last):
  File "simple_peer.py", line 76, in 
tslr_gpu, lr_gpu = mp.initialise()
  File "/root/distributed-mpp/naive/mccullochpitts.py", line 102, in
initialise
""", arch='sm_60')
  File "/root/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py",
line 294, in __init__
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image
is invalid -

```
I did a quick search and only found this :
https://github.com/inducer/pycuda/issues/45 , but it doesn't seem to be
relevant to my problem as it runs on my initial workstation. Can anyone see
what is the issue?

Below is my code that I'm trying to run:
```
def initialise(self):
"""
Documentation here
"""

mod = SourceModule("""
#include 
__global__ void initial(float *tslr_out, float *lr_out, float
*W_gpu,\
float *b_gpu, int *x_gpu, int d, float temp)
{
int tx = threadIdx.x;

// Wx stores the W_ji x_i product value
float Wx = 0;

// Matrix multiplication of W and x
for (int k=0; k