[PyCUDA] Incorrect shared memory size for kernel

2010-02-07 Thread Bogdan Opanchuk
Hello, I noticed some strange thing recently. Consider the following kernel: __global__ void test(float *out) { float a[2] = {0,0}; a[0] = 1; a[1] = 2; out[0] = a[0]; out[1] = a[1]; } As far as I understand, a[2] should go into registers. According to PTX

[PyCUDA] Garbage after copying to and from shared memory

2010-02-09 Thread Bogdan Opanchuk
Hello, Yet another stupid question. Most probably, I missed something obvious, but anyway - can someone explain why I get some NaN's in output for the program (listed below)? Surprisingly, bug disappears if I send '1' instead of '-1' as a third parameter to function (or remove 'int' parameters

[PyCUDA] FFT for PyCuda

2010-02-14 Thread Bogdan Opanchuk
Hello, The project I am working on relies heavily on batched 3D FFTs. You all know about the situation with CUFFT and PyCuda, and I decided that I must put some effort in it. So, I ported Apple's OpenCL implementation of FFT to PyCuda. The result you can see on

Re: [PyCUDA] FFT for PyCuda

2010-02-14 Thread Bogdan Opanchuk
Hi Daniel, (sort of awkward situation, I do not know which one should I use as your first name) Thank you for telling me about Parret, I did not know that your CUFFT wrapper code can be found outside these mail list. Nevertheless, I'll stick to the version I'm currently using (and remove it from

Re: [PyCUDA] Garbage after copying to and from shared memory

2010-02-28 Thread Bogdan Opanchuk
2010, Bogdan Opanchuk wrote: Hello, Yet another stupid question. Most probably, I missed something obvious, but anyway - can someone explain why I get some NaN's in output for the program (listed below)? Surprisingly, bug disappears if I send '1' instead of '-1' as a third parameter to function

Re: [PyCUDA] FFT for PyCuda

2010-03-02 Thread Bogdan Opanchuk
have no other complaints about pycuda. It just works! Best regards, Bogdan On Tue, Mar 2, 2010 at 6:51 AM, Andreas Klöckner li...@informa.tiker.net wrote: Hi Bogdan, On Sonntag 14 Februar 2010, Bogdan Opanchuk wrote: The project I am working on relies heavily on batched 3D FFTs. You all know

[PyCUDA] Pycudafft becomes Pyfft

2010-03-20 Thread Bogdan Opanchuk
Hello all, I fixed some bugs in my pycudafft module and added PyOpenCL support, so it is called just pyfft now (and it sort of resolves the question about including it to PyCuda distribution). At the moment, the most annoying (me, at least) things are: 1. OpenCL performance tests show up to 6

Re: [PyCUDA] Pycudafft becomes Pyfft

2010-03-24 Thread Bogdan Opanchuk
some version check too, because there will definitely be other bugs on Python 2.4, which is still used by some Linux distros ) Best regards, Bogdan On Thu, Mar 25, 2010 at 11:36 AM, Bogdan Opanchuk manti...@gmail.com wrote: Hello Imran, I tested it only on 2.6, so it can be the case. Thanks

Re: [PyCUDA] Pycudafft becomes Pyfft

2010-03-24 Thread Bogdan Opanchuk
, Imran Bogdan Opanchuk wrote: Hello Imran, kernel.py requires patching too: - from .kernel_helpers import * + from .kernel_helpers import log2, getRadixArray, getGlobalRadixInfo, getPadding, getSharedMemorySize I hope this will be enough. Sorry for the inconvenience, I'm going to commit

Re: [PyCUDA] How to manually free GPUarray to avoid leak?

2010-04-25 Thread Bogdan Opanchuk
Hi Gerald, I can watch the memory pointers of the gpuarrays increase until I get a launch error... presumably due to lack of memory. Are you sure that failure is caused by the lack of memory? I think, this would rather result in an error during memory allocation, not during kernel execution.

Re: [PyCUDA] question from a lazy person : *** CUDA_ROOT not set, and nvcc not in path. Giving up.

2010-06-03 Thread Bogdan Opanchuk
Hi Michael, The error message is sort of self-explanatory. You need to make 'nvcc' (cuda compiler) available to installer. There are two ways to do it: either add path to it (usually /usr/local/cuda/bin) to your $PATH variable (by modifying bash profile, for example), or pass the path to CUDA

[PyCUDA] Cannot import both pycuda and pyopencl in the same program

2010-09-09 Thread Bogdan Opanchuk
Hi all, I'm observing the following behavior with latest (git-fetched today) pycuda and opencl versions on Snow Leopard 10.6.4: $ python import pycuda.driver import pyopencl Traceback (most recent call last): File stdin, line 1, in module File

Re: [PyCUDA] Cannot import both pycuda and pyopencl in the same program

2010-09-09 Thread Bogdan Opanchuk
On Fri, Sep 10, 2010 at 12:18 AM, Andreas Kloeckner li...@informa.tiker.net wrote: Are you using the shipped version of Boost in both libraries? If so, that might present an issue. Yep, in both. Does it behave in the same way on your system? Best regards, Bogdan

Re: [PyCUDA] Cannot import both pycuda and pyopencl in the same program

2010-09-09 Thread Bogdan Opanchuk
Hi Andreas, On Fri, Sep 10, 2010 at 1:09 AM, Andreas Kloeckner li...@informa.tiker.net wrote: Nope, it seems fine on my machine. I guess that means if you'd like to work with both PyCUDA and PyOpenCL at the same time, you have to build with external (non-shipped) Boost. You were right, I

Re: [PyCUDA] Some help with contexts please

2010-10-12 Thread Bogdan Opanchuk
Hi Javier, It would probably help if you attach the source of the expon_them() function (since something is definitely happening there). I'll try to do some psychic debugging though. I find these lines suspicious: self.weights_lateral = gpuarray.to_gpu(self.weight_matrixLateral())

Re: [PyCUDA] Invalid value when calling to_gpu_async()

2010-10-18 Thread Bogdan Opanchuk
Kloeckner li...@informa.tiker.net wrote: On Sun, 3 Oct 2010 01:44:35 +1000, Bogdan Opanchuk manti...@gmail.com wrote: Hello all, I am getting LogicError from gpuarray.to_gpu_async() for some reason. Code: Any host memory involved in *_async() must be page-locked. FTFY: import pycuda.autoinit

[PyCUDA] Non-randomness of pycuda.curandom.rand()

2010-10-18 Thread Bogdan Opanchuk
Hi all, Consider the following program, which is supposed to check the randomness of pycuda random number generator: import pycuda.autoinit import pycuda.curandom as curandom import numpy def test(size, dtype): a = curandom.rand((size,), dtype=dtype).get() return numpy.sum(a) /

Re: [PyCUDA] Non-randomness of pycuda.curandom.rand()

2010-10-18 Thread Bogdan Opanchuk
Hi Vincent, On Tue, Oct 19, 2010 at 2:28 AM, Vincent Favre-Nicolin vincent.favre-nico...@cea.fr wrote:  I'm not sure what is happening exactly, but there is no indication that the random numbers are *repeating themselves*  However you seem to hit a floating point issue *when computing the

Re: [PyCUDA] pyfft on large 3d arrays

2011-01-17 Thread Bogdan Opanchuk
Hi Saigopal, What pyfft version do you use? Can you please post the full testing code which can be executed to reproduce the bug? Because the code I composed (basically, added comparison with CPU to your code) works normally on my desktop with Tesla C2050 (Ubuntu 10.04 x64, Cuda 3.2, PyCuda

Re: [PyCUDA] pyfft on large 3d arrays

2011-01-17 Thread Bogdan Opanchuk
Hi Saigopal, On Tue, Jan 18, 2011 at 5:14 PM, Saigopal Nelaturi saigo...@gmail.com wrote: Thanks for the quick response. My operating specs are exactly the same as yours, and when I run your test I get an error of ~3e-7. But I think that number may have to do with dividing by the norm of the

Re: [PyCUDA] pyfft on large 3d arrays

2011-01-17 Thread Bogdan Opanchuk
Hi Saigopal, Try adding fast_math=False option when creating plans. It will give small precision increase (for the cost of performance, of course), which may be enough for your purposes. This option works only for single precision. Best regards, Bogdan

[PyCUDA] Undefined symbol in _curand.so

2011-06-01 Thread Bogdan Opanchuk
Hello, There is some problem with current PyCuda version (most recent commit from repo). On my Ubuntu 10.04 x64, Python 2.6, Cuda 4.0 after 'submodule update', compilation and installation, _curand cannot be imported: import pycuda._curand Traceback (most recent call last): File stdin, line

Re: [PyCUDA] Undefined symbol in _curand.so

2011-06-05 Thread Bogdan Opanchuk
Hello Andreas, On Sun, Jun 5, 2011 at 5:43 PM, Andreas Kloeckner li...@informa.tiker.net wrote: If worst comes to worst, we'll just shove the _curand wrappers back into the main PyCUDA wrapper binary. I've done just that, for lack of better ideas. Scott, Bogdan--can you check whether this

Re: [PyCUDA] Kernel calls and types

2011-06-06 Thread Bogdan Opanchuk
Hello Irwin, On Mon, Jun 6, 2011 at 2:16 PM, Irwin Zaid irwin.z...@physics.ox.ac.uk wrote: Anyway, I was wondering if there is a better way to provide this functionality? In normal CUDA code, this could be done with templates, but that doesn't seem to be an option here. I know metaprogramming

Re: [PyCUDA] Undefined symbol in _curand.so

2011-06-06 Thread Bogdan Opanchuk
Hello, I created the pull request (https://github.com/inducer/pycuda/pull/5) which fixes this issue for me. People with macs, could you please check it on your systems? Best regards, Bogdan On Sun, Jun 5, 2011 at 10:16 PM, Bogdan Opanchuk manti...@gmail.com wrote: Hello Andreas, On Sun, Jun

Re: [PyCUDA] PyCUDA logo?

2011-06-07 Thread Bogdan Opanchuk
Hello, How about this (very drafty draft, just to illustrate an idea)? 2011/6/7 Andreas Kloeckner li...@informa.tiker.net: On Tue, 7 Jun 2011 14:16:31 -0400, Frédéric Bastien no...@nouiz.org wrote: Hi, I'm preparing a Tutorial about Theano and PyCUDA. Is there any PyCUDA logo that I can put

Re: [PyCUDA] PyCUDA logo?

2011-06-07 Thread Bogdan Opanchuk
And, speaking of parallel snakes, was it something like this (attached)? 2011/6/7 Bogdan Opanchuk manti...@gmail.com: Hello, How about this (very drafty draft, just to illustrate an idea)? 2011/6/7 Andreas Kloeckner li...@informa.tiker.net: On Tue, 7 Jun 2011 14:16:31 -0400, Frédéric

Re: [PyCUDA] PyCUDA logo?

2011-06-17 Thread Bogdan Opanchuk
Hello, I shamelessly stole David's hollowness idea and Andreas' parallel snakes design and made snakes look more like ones from Python logo - see variant1.pdf. In addition, there's variant2.pdf inspired by the Little Prince. These are drafts of course, neither shapes nor colors are not final. On

[PyCUDA] compyte architecture

2011-06-18 Thread Bogdan Opanchuk
Hello, I finally have the time to contribute something to compyte, so I had a look at its sources. As far as I understand, at the moment it has: - sources for GPU platform-dependent memory operations (malloc()/free()/...) - sources for array class, which uses abstract API of these operations -

Re: [PyCUDA] compyte architecture

2011-06-20 Thread Bogdan Opanchuk
Hello Andreas, Frederic, 2011/6/21 Andreas Kloeckner li...@informa.tiker.net: On Mon, 20 Jun 2011 09:40:02 -0400, Frédéric Bastien no...@nouiz.org wrote: Currently there is not a good compilation system for this project as you saw. What I currently have in mind is that it should

[PyCUDA] Reshaping GPUArray

2011-07-01 Thread Bogdan Opanchuk
Hello Andreas, Is there some way to change the shape of GPUArray object same as it can be done with numpy.ndarray? The following naive code raises exception on the last line: import pycuda.autoinit import pycuda.gpuarray as gpuarray import numpy arr = gpuarray.GPUArray((64, 64), numpy.float64)

[PyCUDA] Possible source of bugs in gpuarray.to_gpu()

2011-07-05 Thread Bogdan Opanchuk
Hello, I just bumped into a certain problem with copying numpy arrays to GPU. Consider the following code: --- import pycuda.autoinit import pycuda.gpuarray as gpuarray from pycuda.elementwise import ElementwiseKernel import numpy arr = numpy.random.randn(50, 50).astype(numpy.float32) arr_tr =

Re: [PyCUDA] Possible source of bugs in gpuarray.to_gpu()

2011-07-05 Thread Bogdan Opanchuk
Hello Andreas, On Wed, Jul 6, 2011 at 12:04 AM, Andreas Kloeckner li...@informa.tiker.net wrote: Ok, we should introduce a warning when to_gpu'ing arrays that are not in C order. And probably also add a function gpuarray.i_know_about_strides() to turn that warning off. Yep, that'll work too.

Re: [PyCUDA] Error: launch out of resources

2011-07-07 Thread Bogdan Opanchuk
Hello Mikhail, This program worked without any changes on Tesla C2050. Such error message usually points to insufficient number of registers on the device, so try to choose block size = MAX_REGISTERS_PER_BLOCK (device attribute) / func.num_regs Best regards, Bogdan On Fri, Jul 8, 2011 at 1:46

[PyCUDA] Using CURAND to fill complex array

2011-08-02 Thread Bogdan Opanchuk
Hello Andreas, Currently CURAND wrapper cannot fill_normal() or fill_uniform() the array of complex64 or complex128. I can add this functionality, but first I'd like to clarify some details: 1. Should I add this to XORWOW RNG only? In CURAND *2 functions were not implemented for Sobol

Re: [PyCUDA] CURAND 4 - next try

2011-08-13 Thread Bogdan Opanchuk
Hello Tomasz, Against which commit have you diffed your patch? I was going to run it on Tesla 2050 (test_gpuarray.py is enough, right?) but I am having problems trying to apply it. Best regards, Bogdan On Sat, Aug 13, 2011 at 8:41 PM, Tomasz Rybak bogom...@post.pl wrote: Hello. I have been

Re: [PyCUDA] pyCUDA parallel scan performance

2011-09-27 Thread Bogdan Opanchuk
Hello Алексей, As far as I can see, there are two things you may try. 1. ElementwiseKernel.__call__ calculates necessary grid and block sizes every time, along with doing some other stuff, which can be significant if the kernel execution time is of the order of tens of microseconds. So you can

Re: [PyCUDA] weird if branch in all tutorial example

2011-09-27 Thread Bogdan Opanchuk
Hello, In your example the condition is necessary: if N is some large prime number, you cannot create grid/block pair which contains exactly N total threads; so you have to skip excessive ones somehow. Moreover, the if statement is not expensive by itself, it becomes expensive if it causes

Re: [PyCUDA] Why my matrix transpose function doesn't work?

2011-10-30 Thread Bogdan Opanchuk
Hello Apostolis, There are two errors: 1. You are trying to use 32x32 block, but this size is only supported by compute compatibility 2.0 devices (Teslas and probably other new cards, look it up in the programming guide). Older cards (such as mine) only allow maximum 512 threads per block, so I

Re: [PyCUDA] PyCuda 3x slower than nvcc

2012-04-04 Thread Bogdan Opanchuk
On Wed, Apr 4, 2012 at 9:15 PM, Michiel Bruinink michiel.bruin...@mapperlithography.com wrote: First of all, I made a typo in my sample program. The value of 10 should be 169. That makes those array declarations less problematic, I think. Much less. This now amounts to ~6kb per thread,

Re: [PyCUDA] Thread Problem

2012-07-10 Thread Bogdan Opanchuk
Hi Andrea, On Tue, Jul 10, 2012 at 11:55 PM, Andrea Cesari andrea_ces...@hotmail.it wrote: But if i modify the kernel in this mode: const int i = threadIdx.x+2 dest[i]=i; the result is: [1 0 2 3 4 5 6 7 8 9] while, in my opinion,should be [0,0,2,3,4,5,6,7,8,9] (confirmed by C code). why?

Re: [PyCUDA] Thread Problem

2012-07-10 Thread Bogdan Opanchuk
On Wed, Jul 11, 2012 at 12:15 AM, Andrea Cesari andrea_ces...@hotmail.it wrote: so, the firs two elements of a vector are always garbage? can i solve it allocating manually the memory? but should be the same of drv.Out() i think..or no? The first two elements are garbage because: 1) you have

Re: [PyCUDA] Thread Problem

2012-07-11 Thread Bogdan Opanchuk
Hi Andrea, On Wed, Jul 11, 2012 at 10:25 PM, Andrea Cesari andrea_ces...@hotmail.it wrote: __global__ void gpu_kernel(int *corrGpu,int *aMod,int *b,int *kernelSize_h) { int j,step1=kernelSize_h[0]/2; // --- ... ) When I remove /2 where the arrow points, I get results identical with the

[PyCUDA] GPU algorithms library

2012-07-18 Thread Bogdan Opanchuk
Hi all, Some of you may remember compyte discussions last year when I made the suggestion of creating a library with a compilation of GPGPU algorithms, working both with PyOpenCL and PyCuda. Long story short, I have finally found some time and created a prototype. The preliminary tutorial can be

Re: [PyCUDA] GPU algorithms library

2012-07-18 Thread Bogdan Opanchuk
Hi Frédéric, On Thu, Jul 19, 2012 at 8:58 AM, Frédéric Bastien no...@nouiz.org wrote: How much useful it is to abstract between PyCUDA and PyOpenCL? Personnaly, I probably won't use that part, but I want to abstract between CUDA and OpenCL. It was either that or to write almost identical

Re: [PyCUDA] Thread Problem

2012-07-18 Thread Bogdan Opanchuk
Hi Andrea, On Thu, Jul 19, 2012 at 2:39 AM, Andrea Cesari andrea_ces...@hotmail.it wrote: Hi, this is my code that, keep a 3d array, and for each pixel of the matrix find the minimum and put it to the corresponding pixel of a matrix b. Then compare the result with the cpu. Obviously, with

Re: [PyCUDA] Thread Problem

2012-07-19 Thread Bogdan Opanchuk
Hi Andrea, On Thu, Jul 19, 2012 at 4:26 PM, Andrea Cesari andrea_ces...@hotmail.it wrote: The problem is that the results match with cpu only for dim_x and dim_y minor of 32. For higher dimensions the cpu and gpu results are different. When you change dim_x and dim_y values, do you also

Re: [PyCUDA] Thread Problem

2012-07-19 Thread Bogdan Opanchuk
Hi Andrea, On Thu, Jul 19, 2012 at 4:37 PM, Andrea Cesari andrea_ces...@hotmail.it wrote: yes..for example if i do: dim_x=33 dim_y=33 then chenge grid and block to this: (32,32,1) and (2,1) because i do ( 33*33=1089 threads, so grid= 1089/1024=1,063-- 2) When you do this, you read values

Re: [PyCUDA] Using pyCUDA with one main kernel that is requiring other object files

2012-08-17 Thread Bogdan Opanchuk
Hi Cédric On Fri, Aug 17, 2012 at 5:15 PM, Cédric LACZNY cedric.lac...@uni.lu wrote: Thanks for the suggestion but it's causing other errors all of the same notion, e.g. the following: kernel.cu(142): error: calling a host function(NVMatrix::eltWiseDivide) from a __device__/__global__

Re: [PyCUDA] Using pyCUDA with one main kernel that is requiring other object files

2012-08-17 Thread Bogdan Opanchuk
Hi Cédric, On Fri, Aug 17, 2012 at 4:49 PM, Cédric LACZNY cedric.lac...@uni.lu wrote: extern C { void main_kernel(float* inp_P, unsigned int N, float* mappedX, unsigned int no_dims) { // … Some code … } } You have to prefix your exported kernel definition with '__global__'. See the code

Re: [PyCUDA] Using pyCUDA with one main kernel that is requiring other object files

2012-08-18 Thread Bogdan Opanchuk
Hi Cédric, On Fri, Aug 17, 2012 at 6:04 PM, Cédric LACZNY cedric.lac...@uni.lu wrote: Executing the python script now, gives me the following error: pycuda.driver.CompileError: nvcc compilation of /tmp/tmpe1ZS7Z/kernel.cu failed [command: nvcc --cubin -arch sm_20

Re: [PyCUDA] Contributing to pycuda

2012-08-25 Thread Bogdan Opanchuk
Hi Eelco, On Sun, Aug 26, 2012 at 3:00 AM, Eelco Hoogendoorn e.hoogendo...@uva.nl wrote: I have some code that I would like to contribute to pycuda. What would the preferred way of doing so be? Create a branch in git? Yep. Perhaps the easiest way to do it is by forking PyCuda repo on github

Re: [PyCUDA] PyCUDA WARNING: a clean-up operation failed (dead context maybe?)

2012-09-05 Thread Bogdan Opanchuk
Hi Mohsen, On Wed, Sep 5, 2012 at 9:53 PM, mohsen jadidi mohsen.jad...@gmail.com wrote: pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch failed PyCUDA WARNING: a clean-up operation failed (dead context maybe?) cuMemFree failed: launch failed what would be the reason ? It means that

Re: [PyCUDA] cuMemAlloc failed: out of memory

2012-09-05 Thread Bogdan Opanchuk
Hi Mohsen, On Thu, Sep 6, 2012 at 3:31 AM, mohsen jadidi mohsen.jad...@gmail.com wrote: File /usr/local/lib/python2.7/dist-packages/scikits.cuda-0.042-py2.7.egg/scikits/cuda/linalg.py, line 323, in dot c_gpu = gpuarray.empty((n, ldc), x_gpu.dtype) File

Re: [PyCUDA] Matrix allocation issue

2012-11-04 Thread Bogdan Opanchuk
Hi Rui, On Mon, Nov 5, 2012 at 8:51 AM, Rui Lopes rmlo...@dei.uc.pt wrote: I've written a kernel to perform a custom dot operation that would work perfectly if there was not an issue with the memory allocation. Maybe I am missing something in the mapping process? From what I understood

Re: [PyCUDA] Custom dot issues

2012-11-04 Thread Bogdan Opanchuk
Hi Rui, On Mon, Nov 5, 2012 at 2:36 PM, Rui Lopes rmlo...@dei.uc.pt wrote: I have built a benchmark for my custom dot kernel, pasted below. It only outperforms cpu dot for big sizes, expectable in my educated guess. Yes, it is to be expected for your kernel, especially on slow video cards.

Re: [PyCUDA] Offsetting DeviceAllocation instances

2013-02-09 Thread Bogdan Opanchuk
Hi Alex, Maybe I am misunderstanding, I am not so familiar with the buffer terminology (having not dealt much with opencl), It is not really OpenCL-specific; basically it's just a wrapper on top of pointer arithmetic. Would the following be sufficient? a =

Re: [PyCUDA] Offsetting DeviceAllocation instances

2013-02-09 Thread Bogdan Opanchuk
Moreover, I do not need the actual data to be copied. I just need a view to the middle of an existing array. On Sun, Feb 10, 2013 at 10:53 AM, Bogdan Opanchuk manti...@gmail.com wrote: Hi Alex, Maybe I am misunderstanding, I am not so familiar with the buffer terminology (having not dealt

Re: [PyCUDA] Offsetting DeviceAllocation instances

2013-02-09 Thread Bogdan Opanchuk
'pycuda.gpuarray.GPUArray' [ 5. 5.] [-4. -4.] [ 5. 5. 0. 0. 0. 0. -4. -4.] On Sat, Feb 9, 2013 at 7:01 PM, Bogdan Opanchuk manti...@gmail.com wrote: Moreover, I do not need the actual data to be copied. I just need a view to the middle of an existing array. On Sun, Feb 10, 2013 at 10:53 AM, Bogdan

Re: [PyCUDA] PyCUDA: non square matrix transpose

2013-02-22 Thread Bogdan Opanchuk
Hi Giuseppe, It seems that the problem is in these lines: w, h = src.shape result = gpuarray.empty((h, w), dtype=src.dtype, order='C') The order of numpy arrays is row-major, so you should write instead: h, w = src.shape result = gpuarray.empty((w, h), dtype=src.dtype,

Re: [PyCUDA] Problems installing PyCUDA on Mac with CUDA 5.5

2013-08-19 Thread Bogdan Opanchuk
Hi David, What libraries do you have in cuda_installation_dir/lib? (cuda_installation_dir is /usr/local/cuda by default). I have both libcuda.dylib and libcudart.dylib there. On Mon, Aug 19, 2013 at 1:39 PM, David P. Sanders dpsand...@ciencias.unam.mx wrote: Hi, I am trying to install PyCUDA

Re: [PyCUDA] FFT

2013-10-31 Thread Bogdan Opanchuk
Hi Isaac, You can try my package Reikna (http://reikna.publicfields.net). The FFT there is somewhat slower than the CUFFT one, but it works with Python 3. On Thu, Oct 31, 2013 at 11:54 PM, Isaac Gerg isaac.g...@gergltd.com wrote: They have no support for python 3.2 64 bit :( On Oct 31, 2013

Re: [PyCUDA] cuMemAlloc failed: out of memory

2013-12-05 Thread Bogdan Opanchuk
Hi Ahmed, On Fri, Dec 6, 2013 at 12:27 PM, Ahmed Fasih wuzzyv...@gmail.com wrote: I ran into a similar issue: http://stackoverflow.com/questions/13187443/nvidia-cufft-limit-on-sizes-and-batches-for-fft-with-scikits-cuda Batch 1 of 64x1024 complex64 arrays amounts to 5Gb of data, which

Re: [PyCUDA] cuMemAlloc failed: out of memory

2013-12-05 Thread Bogdan Opanchuk
Hi Jayanth, I can run a 8192x8192 transform on a Tesla C2050 without problems. I think you are limited by the available video memory, see my previous message in this thread --- a 8192x4096 buffer takes 250Mb, and you have to factor in the temporary buffers PyFFT creates. By the way, I would

Re: [PyCUDA] why my code yields 'pycuda._driver.LaunchError'?

2013-12-07 Thread Bogdan Opanchuk
Hi oyster, I have fixed two things in order to make your program runnable: - replaced 'numPoint.x' and 'numPoint.y' with 'numPointX' and 'numPointY', - added 'startTime = time.time()' line before the kernel call There are the following problems with the code: - The shape of 'iter' is incorrect:

Re: [PyCUDA] C++ style comment yields bad code. Bug?

2014-03-15 Thread Bogdan Opanchuk
Hi 金陆, The \n\n in your code correspond to two actual newlines in the .cu file being compiled, not to a \n\n string, because they are resolved by the Python interpreter at the parsing stage. See the kernel.cu you quoted for the result, you have 'CUPRINTF(' commented and an unmatched quote and

[PyCUDA] Passing a custom struct to a kernel by value

2014-05-20 Thread Bogdan Opanchuk
Hello, Does PyCUDA support struct arguments to kernels? From the Python side it means an element of an array with a struct dtype (a numpy.void object), e.g. dtype = numpy.dtype([('first', numpy.int32), ('second', numpy.int32)]) pair = numpy.empty(1, dtype)[0] See

Re: [PyCUDA] Passing a custom struct to a kernel by value

2014-05-26 Thread Bogdan Opanchuk
format On Tue, May 27, 2014 at 2:45 PM, Andreas Kloeckner li...@informa.tiker.net wrote: Hi Bogdan, Bogdan Opanchuk manti...@gmail.com writes: Thank you for the correction. Just curious, how come in PyOpenCL it works with rank-0 numpy arrays (which, in my opinion, is more intuitive than

Re: [PyCUDA] MatrixTranspose.py example has CompileError

2014-09-03 Thread Bogdan Opanchuk
at 9:39 PM, Bogdan Opanchuk manti...@gmail.com wrote: Hi Bruce, Seems to be a typo in the Wiki. If you look at http://wiki.tiker.net/PyCuda/Examples/MatrixTranspose (where MatrixTranspose.py originally comes from), you can see in line 24 two #defines in one line. Incidentally, if someone has

Re: [PyCUDA] Non-contiguous elementwise kernels

2015-10-27 Thread Bogdan Opanchuk
Hi Thomas, Does PyCUDA have any support for non-contiguous arrays at all? > (I've tried implementing my own version, but was unable to figure out how to map the thread IDs to valid memory addresses in a general way. Any pointers/) I have support for custom strides in my Reikna library, and it

Re: [PyCUDA] What do I need to do when mixing PyCUDA, Reikna and scikit-cuda?

2018-07-31 Thread Bogdan Opanchuk
ry shouldn't leave > the GPU). > > On 31. Jul 2018, at 09:16, Bogdan Opanchuk wrote: > > First of all, are you using multiple contexts or a single one? If you only > have one context, `Thread(pycuda.autoinit.context)` should be enough for > Reikna (don't know about scikit-cuda, tho

Re: [PyCUDA] What do I need to do when mixing PyCUDA, Reikna and scikit-cuda?

2018-07-31 Thread Bogdan Opanchuk
First of all, are you using multiple contexts or a single one? If you only have one context, `Thread(pycuda.autoinit.context)` should be enough for Reikna (don't know about scikit-cuda, though). Now if you have several contexts, things become more complicated. CUDA maintains a global context