Re: [PyCUDA] Create gpuarrays on different GPUs
My 'can access' simply means that I'm able to access the values in the variable in python by typing x1 or x2. My understanding is that if the variables are stored on different GPUs, then I should be able to type x1 and get its values when ctx1 is active and similarly, I can type x2 and get the x2 values when ctx2 is active, not when ctx1 is active. On 24 May 2018 at 18:56, Andreas Kloeckner <li...@informa.tiker.net> wrote: > Zhangsheng Lai <dunno@gmail.com> writes: > > with the setup above, I tried to check by poping ctx2 and pushing ctx1, > can > > I access x1 and not x2 and vice versa, popping ctx1 and pushing ctx2, I > can > > access x2 and not x1. However, I realise that I can access x1 and x2 in > > both contexts. > > Can you clarify what you mean by 'can access'? I'm guessing 'submit > kernel launches with that pointer as an argument'? > > Andreas > ___ PyCUDA mailing list PyCUDA@tiker.net https://lists.tiker.net/listinfo/pycuda
[PyCUDA] Create gpuarrays on different GPUs
Hi, I'm trying to create different GPU arrays on different GPUs. ``` import pycuda import pycuda.driver as cuda from pycuda.compiler import SourceModule import pycuda.curandom as curandom d = 2 ** 15 cuda.init() dev1 = cuda.Device(1) ctx1 = dev1.make_context() curng1 = curandom.XORWOWRandomNumberGenerator() x1 = curng1.gen_normal((d,d), dtype = np.float32) # so x1 is stored in GPU 1 memory ctx1.pop() # clearing ctx of GPU1 dev2 = cuda.Device(1) ctx2 = dev2.make_context() curng2 = curandom.XORWOWRandomNumberGenerator() x2 = curng2.gen_normal((d,d), dtype = np.float32) # so x2 is stored in GPU 2 ``` with the setup above, I tried to check by poping ctx2 and pushing ctx1, can I access x1 and not x2 and vice versa, popping ctx1 and pushing ctx2, I can access x2 and not x1. However, I realise that I can access x1 and x2 in both contexts. Thus I'm wondering if my assumptions of x1 stored in GPU1 and x2 stored in GPU2 are correct, or if it is actually the UVA and peer access that allows me to access both x1 and x2 even if only one of the two ctx is active. Thanks, Zhangsheng ___ PyCUDA mailing list PyCUDA@tiker.net https://lists.tiker.net/listinfo/pycuda
Re: [PyCUDA] Invalid resource handle error
I'm not very familar with cuda, thus I wld like to ask if you have any guesses on what is leading to my on-device segfault? I'm guessing that saving the ctx in the GPU thread class, pushing and poping it before I run my code earlier might have caused it. If so, is there any way I can avoid it? Many thanks, Zhangsheng On 12 May 2018 at 12:34, Andreas Kloeckner <li...@informa.tiker.net> wrote: > Zhangsheng Lai <dunno@gmail.com> writes: > > > Hi, > > > > I'm trying to do some updates to a state which is a binary array. gputid > is > > a GPU thread class (https://wiki.tiker.net/PyCuda/Examples/ > MultipleThreads) > > and it stores the state and the index of the array to be updated in > another > > class which can be accessed with gputid.mp.x_gpu and gputid.mp.neuron_gpu > > respectively. Below is my kernel that takes in the gputid and performs > the > > update of the state. However, it the output of the code is not consistent > > as it runs into errors and executes perfectly when i run it multiple > times. > > The error msg makes no sense to me: > > > > File "/root/anaconda3/lib/python3.6/site-packages/pycuda/driver.py", > line > > 447, in function_prepared_call > > func._set_block_shape(*block) > > pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource > > handle > > I think the right way to interpret this is that if you cause an > on-device segfault, the GPU context dies, and all the handles of objects > contained in it (including the function) become invalid. > > HTH, > Andreas > ___ PyCUDA mailing list PyCUDA@tiker.net https://lists.tiker.net/listinfo/pycuda
[PyCUDA] Invalid resource handle error
Hi, I'm trying to do some updates to a state which is a binary array. gputid is a GPU thread class (https://wiki.tiker.net/PyCuda/Examples/MultipleThreads) and it stores the state and the index of the array to be updated in another class which can be accessed with gputid.mp.x_gpu and gputid.mp.neuron_gpu respectively. Below is my kernel that takes in the gputid and performs the update of the state. However, it the output of the code is not consistent as it runs into errors and executes perfectly when i run it multiple times. The error msg makes no sense to me: File "/root/anaconda3/lib/python3.6/site-packages/pycuda/driver.py", line 447, in function_prepared_call func._set_block_shape(*block) pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource handle My code: def local_update(gputid): mod = SourceModule(""" __global__ void local_update(int *x_gpu, float *n_gpu) { int tid = threadIdx.x + blockDim.x * blockIdx.x; if (tid == (int)(n_gpu[0])) { x_gpu[tid] = 1 - x_gpu[tid]; } } """) gputid.ctx.push() x_gpu = gputid.mp.x_gpu n_gpu = gputid.mp.neuron_gpu func = mod.get_function("local_update") func.prepare("PP") grid = (1,1) block = (gputid.mp.d,1,1) func.prepared_call(grid, block, x_gpu, n_gpu) gputid.ctx.pop() print ('1Pain') ___ PyCUDA mailing list PyCUDA@tiker.net https://lists.tiker.net/listinfo/pycuda
[PyCUDA] cuModuleLoadDataEx failed: device kernel image is invalid
I'm encountering this error as I run my code on the same docker environment but on different workstations. ``` Traceback (most recent call last): File "simple_peer.py", line 76, in tslr_gpu, lr_gpu = mp.initialise() File "/root/distributed-mpp/naive/mccullochpitts.py", line 102, in initialise """, arch='sm_60') File "/root/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py", line 294, in __init__ self.module = module_from_buffer(cubin) pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image is invalid - ``` I did a quick search and only found this : https://github.com/inducer/pycuda/issues/45 , but it doesn't seem to be relevant to my problem as it runs on my initial workstation. Can anyone see what is the issue? Below is my code that I'm trying to run: ``` def initialise(self): """ Documentation here """ mod = SourceModule(""" #include __global__ void initial(float *tslr_out, float *lr_out, float *W_gpu,\ float *b_gpu, int *x_gpu, int d, float temp) { int tx = threadIdx.x; // Wx stores the W_ji x_i product value float Wx = 0; // Matrix multiplication of W and x for (int k=0; k