from:"Andreas Kloeckner"

[PyCUDA] Re: FP16 header, extern C and preamble

2020-05-13 Thread Andreas Kloeckner

Vincent Favre-Nicolin  writes:
> 1) if there is a way to have an element-wise kernel with
> no_extern_c=True - but I don’t know how to resolve the name mangling
> issue to access the kernel function ?
>
> 2) add a ‘cpp_preamble’ option to SourceModule and ElementwiseKernel
> (and others) to add a preamble before the ‘extern “C”’

1) would be the preferred option from my perspective. Simply sticking an
"extern "C" in front of the kernel declaration would likely
suffice. There is a backward compatibility concern here for other
potentially mangled names in the preamble, but ElementwiseKernel doesn't
really expose the SourceModule, so it's IMO unlikely that someone tried
to get those symbols.

Andreas

signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: Issues with PyCUDA on Ubuntu 20.04

2020-05-02 Thread Andreas Kloeckner

Vernon Perry  writes:
> My CUDA install was just via apt; do you suggest doing it the
> old-fashioned way from Nvidia itself?

Please keep the list cc'd for archival.

Via apt from Ubuntu's package sources? Or from some other sources (check
your /etc/apt/sources.list*)? If it was from Ubuntu, then that's a bug
in their packages... Inconsistent software state is what apt
dependencies are designed to prevent. Maybe reboot to activate a new
display driver.

Andreas

signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: Issues with PyCUDA on Ubuntu 20.04

2020-05-02 Thread Andreas Kloeckner

Vernon Perry  writes:

> Hello,
>
> I've installed PyCUDA using several different methods, including pip, apt, as 
> well as compiling from source, but there is still a conflict with the version 
> of CUDA that I am running it would appear:
>
> $ nvcc --version
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2019 NVIDIA Corporation
> Built on Sun_Jul_28_19:07:16_PDT_2019
> Cuda compilation tools, release 10.1, V10.1.243
>
> Python 3.8.2 (default, Apr 27 2020, 15:53:34)
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
 import pycuda.driver as cuda
 import pycuda.autoinit
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/home/gp/.local/lib/python3.8/site-packages/pycuda/autoinit.py", line 
> 5, in 
> cuda.init()
> pycuda._driver.LogicError: cuInit failed: system has unsupported display 
> driver / cuda driver combination
 pycuda.VERSION
> (2019, 1, 2)

This has nothing to do with PyCUDA; it just means that your CUDA
installation is broken. You probably installed your X11/Wayland/Mir GPU
driver from a different source than your CUDA driver (such as by running
the Nvidia installer on a system that already had the Nvidia driver via
the OS). This means that your computer is now in an inconsistent
state. Unless you know where to look, a reinstall and, subsequently,
sufficient care about what you trust to mess with your software install,
might be your best options.

Andreas

signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: PyCUDA and cuSPARSE

2020-04-08 Thread Andreas Kloeckner

"Gutenkunst, Ryan N - (rgutenk)"  writes:

> Hello,
>
> I need to access the tridiagonal solving routines gtsv2StridedBatch
> and gtsvInterleavedBatch from the cuSPARSE library in a Python/C
> program. Is there a way to access/link to the cuSPARSE library using
> PyCUDA?
>
> For background, I’m hoping to port a Python (with inner loops in C)
> application to leverage GPU computing. The most intensive part of the
> computation is solving tridiagonal systems, so I was excited to see
> that the standard cuSPARSE library includes routines for this. But I’m
> struggling to see how to access cuSPARSE using any of the existing
> Python to CUDA interfaces. For PyCUDA, I couldn’t find a similar
> example in the documentation. Pyculib has bindings for an older
> version of cuSPARSE, but it’s not maintained and I couldn’t get it
> installed easily:
> https://pyculib.readthedocs.io/en/latest/cusparse.html. CuPy seems to
> support only a very small fraction of cuSPARSE:
> https://docs-cupy.chainer.org/en/stable/reference/sparse.html .

There's scikit.cuda, which offers access to cuSOLVER, which bills itself
as "the CUDA sparse matix library."

https://scikit-cuda.readthedocs.io/en/latest/

There's also discussion going on about adapting PyCUDA to Nvidia's
"primary context" notion for easier interoperability with the runtime API.

Hope that helps,
Andreas



signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: PyCUDA OpenGL Build for Windows 10.

2020-02-25 Thread Andreas Kloeckner

Fabio da Silva  writes:

> Good morning,
> I was wondering if there are any binaries for OpenGL enabled PyCUDA
> for Windows 10. My understanding (thanks, Andreas) is that it I will
> probably need to build it on my own. Since I never did that, I went
> online and found some resources here
> (https://stackoverflow.com/questions/19634073/pip-install-pycuda-on-windows)
> with the source code from here
> (https://files.pythonhosted.org/packages/5e/3f/5658c38579b41866ba21ee1b5020b8225cec86fe717e4b1c5c972de0a33c/pycuda-2019.1.2.tar.gz).
>  Specifically:1. Downloaded
> the source code from pythonhosted and untarred it.2. On the main
> folder I ran:>> python configure.py3. Then I went to siteconf.py and
> enabled OpenGL on line 9:CUDA_ENABLE_GL = True4. Finally I ran the
> commands:>> python setup.py build>> python setup.py install 

> After that, I tried to open PyCUDA in an IPython window and
> got:ModuleNotFoudError: No module named 'pycuda._driver'And obviously
> no pycuda.gl either.

If your build from above completed, then you should have PyCUDA
installed *somewhere*. This somewhere may just not be the same Python
interpreter as what your Jupyter notebook uses. You can find out where
these interpreters live by examining sys.path (from both the notebook
and the Python prompt for the Python that you used to build/install)

HTH,
Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: Question about PyCUDA

2019-11-22 Thread Andreas Kloeckner

"thierry.moudiki"  writes:

> Hi Andreas,
>
> I'm interested in using your package PyCUDA, and I have one question about it 
> (just to make sure that I understand how it works). In the example presented 
> here: https://documen.tician.de/pycuda/index.html, when you call the sourced 
> function `multiply_them`:
>
> - The option `block` has 3 elements in case it's a 1D, 2D or 3D block,
> and each tuple element is the number of threads per block. E.g: if I
> use threadIdx.x and threadIdx.y with blocks of 400 threads each, will
> I have `block = (400, 400, 1)`?

Yup.

> - For the option `grid`, I'm not sure. Is it: we can have 1D or 2D
> grids? What about this tuple's elements? Tuple element == Number of
> blocks per grid?

Yup.

In the future, please address requests like this to the mailing
list. I've cc'd them for archival.

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: gpuarray.zeros / to_gpu crash

2019-11-15 Thread Andreas Kloeckner

Dan,

Please make sure the list stays cc'd for archival.

"Guralnik,Dan"  writes:
> Andreas, I'm so sorry, should've done it myself so you have more info. Here 
> is what happens:
>
> -
> C:\Users\danguralnik\Documents\GitHub\kodlab-uma-sims\mice\smooth>python 
> cuda_test.py
> Traceback (most recent call last):
>   File "cuda_test.py", line 23, in 
> """)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 
> 291, in __init__
> arch, code, cache_dir, include_dirs)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 
> 254, in compile
> return compile_plain(source, options, keep, nvcc, cache_dir, target)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 
> 137, in compile_plain
> stderr=stderr.decode("utf-8", "replace"))
> pycuda.driver.CompileError: nvcc compilation of 
> C:\Users\DANGUR~1\AppData\Local\Temp\tmppoj82k3c\kernel.cu failed
> [command: nvcc --cubin -arch sm_75 -m64 
> -Ic:\programdata\anaconda3\lib\site-packages\pycuda\cuda kernel.cu]
> [stdout:
> nvcc fatal   : Cannot find compiler 'cl.exe' in PATH
> ]
> 
> Is this something about environment variables?

Yes. The Visual Studio compilers also need to be on your PATH
environment variable. VS installs a batch file vcvars.bat that should
arrange for that.

Andreas



signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: gpuarray.zeros / to_gpu crash

2019-11-15 Thread Andreas Kloeckner

Dan,

Do the PyCUDA example programs (e.g. [1]) work? What happens if you run
the command 'nvcc'?

Andreas

[1] https://github.com/inducer/pycuda/blob/master/examples/demo.py

"Guralnik,Dan"  writes:
> Hello,
>
>
> I have just installed pycuda on a new machine running anaconda3 and cuda 
> 10.1.243_426 for Windows 10, and tested the installation by running a program 
> that had been run successfully on another machine, but with anaconda2.
>
>
> My program breaks on a call for gpuarray.zeros, following a call to 
> gpuarray.to_gpu (no other calls to pycuda have been made before that). Both 
> calls are trying to establish 2D arrays of dtype=np.float32.
>
>
> The included headers are:
>
> import pycuda.driver as cuda
> import pycuda.tools
> import pycuda.autoinit
> import pycuda.gpuarray as gpuarray
> from pycuda.compiler import SourceModule
>
> I've attached the error trace from python below. I'll be grateful if you 
> could offer an explanation/fix.
>
> Many thanks,
> -Dan
>
>
>
>
> C:\Users\danguralnik\Documents\GitHub\kodlab-uma-sims\mice\smooth>python 
> mouse_base.py
> Traceback (most recent call last):
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\tools.py", line 
> 428, in context_dependent_memoize
> return ctx_dict[cur_ctx][args]
> KeyError: 
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "mouse_base.py", line 832, in 
> main()
>   File "mouse_base.py", line 812, in main
> random_state=RS(),
>   File "mouse_base.py", line 587, in __init__
> self.addRepn('global_view',Repn_global_arena_wmouse(self))
>   File "mouse_base.py", line 641, in __init__
> gpuarray.zeros(shape=(self.ysize,self.xsize),dtype=np.float32)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\gpuarray.py", line 
> 1068, in zeros
> result.fill(zero)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\gpuarray.py", line 
> 549, in fill
> func = elementwise.get_fill_kernel(self.dtype)
>   File 
> "", 
> line 2, in get_fill_kernel
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\tools.py", line 
> 432, in context_dependent_memoize
> result = func(*args)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\elementwise.py", 
> line 496, in get_fill_kernel
> "fill")
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\elementwise.py", 
> line 161, in get_elwise_kernel
> arguments, operation, name, keep, options, **kwargs)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\elementwise.py", 
> line 147, in get_elwise_kernel_and_types
> keep, options, **kwargs)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\elementwise.py", 
> line 75, in get_elwise_module
> options=options, keep=keep)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 
> 291, in __init__
> arch, code, cache_dir, include_dirs)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 
> 254, in compile
> return compile_plain(source, options, keep, nvcc, cache_dir, target)
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 
> 78, in compile_plain
> checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 
> 55, in preprocess_source
> cmdline, stderr=stderr)
> pycuda.driver.CompileError: nvcc preprocessing of 
> C:\Users\DANGUR~1\AppData\Local\Temp\tmphnd7bq1w.cu failed
> [command: nvcc --preprocess -arch sm_75 -m64 
> -Ic:\programdata\anaconda3\lib\site-packages\pycuda\cuda 
> C:\Users\DANGUR~1\AppData\Local\Temp\tmphnd7bq1w.cu --compiler-options -EP]
>
>
>
> ___
> PyCUDA mailing list -- pycuda@tiker.net
> To unsubscribe send an email to pycuda-le...@tiker.net



signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: [Pycuda] LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered

2019-11-12 Thread Andreas Kloeckner

Jie Liu  writes:

> Hallo,
>
> I have a Pycuda code, which deals with two kernels. Both kernels run well
> separately, but when I put them together, there is  a memory problem
> "LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered".
> In the second kernel "DotKernel", I can't change the values of any shared
> array or global array. Could you please have a look at the code? Thank you
> very much!

If you look at your kernel log (dmesg), do you see lines starting with
"NVRM: Xid" (or so)? If so, that indicates one of your kernels
encounters a segfault (that you're being told about at the next
dependent operation---in your case maybe the memory transfer---in the
same stream).

CUDA-Memcheck might be of help in debugging this.

Andreas

signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: pycuda works only from the terminal

2019-11-09 Thread Andreas Kloeckner

Artur,

Artur Makhmutov  writes:
> I am not sure if this is the right place to ask for a tech support, 
> please ignore the message if it is not.
>
> My problem is described as follows: I am trying to run one of the 
> example scripts (I use ubuntu, pycharm with venv python virtual 
> environment):
>
>> |import pycuda.autoinit from pycuda.compiler import SourceModule 
>> import numpy a = numpy.random.randn(4,4) a = a.astype(numpy.float32) 
>> a_gpu = cuda.mem_alloc(a.size * a.dtype.itemsize) 
>> cuda.memcpy_htod(a_gpu, a) mod = SourceModule(""" __global__ void 
>> doublify(float *a) { int idx = threadIdx.x + threadIdx.y*4; a[idx] *= 
>> 2; } """) func = mod.get_function("doublify") func(a_gpu, 
>> block=(4,4,1)) a_doubled = numpy.empty_like(a) 
>> cuda.memcpy_dtoh(a_doubled, a_gpu) print ("original array:") print (a) 
>> print ("doubled with kernel:") print (a_doubled)|
> And I get the following error:
>
> (snip)
> But if I run it with the 'python test.py' from the terminal itself, 
> then  it executes just fine:
>
>> |(test380) art@HP:~/PycharmProjects/pycuda$ python test.py original 
>> array: [[ 0.63286567 -0.5732655 0.0824481 0.17147891] [-0.24867015 
>> -2.4119377 -0.41027954 -0.67181575] [-1.2339077 -1.23354 1.0630324 
>> 0.3807849 ] [-1.5976559 -1.5595584 -0.03161036 -0.50650793]] doubled 
>> with kernel: [[ 1.2657313 -1.146531 0.1648962 0.34295782] [-0.4973403 
>> -4.8238754 -0.8205591 -1.3436315 ] [-2.4678154 -2.46708 2.1260648 
>> 0.7615698 ] [-3.1953118 -3.1191168 -0.06322072 -1.0130159 ]] |
> I wonder if it supposed to work like that or am I missing something? I 
> have checked the .bashrc file, nvcc --version, gcc --version and other 
> hints from the manual, everything seems to be correct. Would much 
> appreciate your help.

See if you can run

os.system("nvcc --help")

from within PyCharm. (I suspect not.) If not, then PyCharm is changing
your $PATH so that nvcc is no longer visible. If so, then you should ask
the PyCharm folks how to fix that.

HTH,
Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: GPUArray class gives negative "s" with large size gpuarray

2019-10-09 Thread Andreas Kloeckner

takayanagi.tets...@jp.panasonic.com writes:

> Hi, All. 
> I have developed Lattice Boltzmann Method Code with PyCUDA in our company for 
> simulating Air flow.
> Then, I need to handle large gpuarray such like arr[velocity][Z][Y][X] for 
> 3-dimensional fluid flow.
> My code run correctly relatively small size gpuarray such as (27, 300, 300, 
> 300).
>
> But Changing gpuarray size from (27, 300, 300, 300) to (27, 450, 450, 450) 
> gives following error.
>
> Error message
> OverflowError : can't convert negative int to unsigned 
>
> For debugging it, I'm testing following simple code, which also arise error 
> if I designate large size numpy array such like (27, 450, 450, 450).
>
> //
> // sample code start
> //
>
> import math
> import numpy as np
> import pycuda.gpuarray as gpuarray
> from pycuda.compiler import SourceModule
> import pycuda.autoinit
>
> module = SourceModule("""
> __global__ void plus_one_3d(int nx, int ny, int nz, int nv, float *arr){
> const int x = threadIdx.x + blockDim.x * blockIdx.x;
> const int y = threadIdx.y + blockDim.y * blockIdx.y;
> const int z = threadIdx.z + blockDim.z * blockIdx.z;
> const int nxyz = nx * ny * z + nx * y + x;
> int ijk = nx * ny * z + nx * y + x;
> if (x < nx && y < ny && z < nz){
> for (int c = 0; c < nv; c++){
> arr[nxyz * c + ijk] += 1.0;
> }
> }
> }
> """)
>
> plus_one = module.get_function("plus_one_3d")
>
> num_x, num_y, num_z = np.int32(450), np.int32(450), np.int32(450)

Your shape can't consist of int32's. Convert them to (Python) int before
using them in the array shape.

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: How to free page-locked memory?

2019-10-03 Thread Andreas Kloeckner

Rengan Xu  writes:
> In PyCUDA, what is the API to free the allocated page-locked memory? In
> CUDA, we have cudaFreeHost(void* ptr) to free the page-locked memory, but I
> didn't find the corresponding API in PyCUDA. Any help would be appreciated.

https://documen.tician.de/pycuda/driver.html#pycuda.driver.PagelockedHostAllocation.free

HTH,
Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: LogicError: cuMemHostAlloc failed: OS call failed or operation not supported on this OS

2019-08-13 Thread Andreas Kloeckner

Dear Ali,

Ali Punjani  writes:
> We develop scientific software for molecular biology applications using
> pyCUDA. A user is having a very strange issue where the same code works
> perfectly fine on the same machine (with CUDA 10.1) with GTX 1060 cards,
> but not with Titan XP cards. The error message is
>
> *LogicError: cuMemHostAlloc failed: OS call failed or operation not
> supported on this OS*
>
> Is there any explanation of what this could mean? Initial googling does not
> return many reports of anything similar.
>
> A more detailed report of the issue is here:
> https://discuss.cryosparc.com/t/logicerror-cumemhostalloc-failed-os-call-failed-or-operation-not-supported-on-this-os/3278

I'm sorry to say that I've never seen or heard of this error
message. One thing that comes to mind is that this might be an issue of
PCIe versioning. The 1060 might be PCIe3, while the XP might be PCIe2
(guessing, might be better to check), and driver support might
differ. Is this on a  Windows or Linux machine?

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: Bind-less Texture Patch for PyCUDA

2019-08-12 Thread Andreas Kloeckner

Binu,

Binu Mathew  writes:
> First up, as a long time user of PyOpenCL and lately PyCUDA, thank you very 
> much for all the effort you have dedicated to these two extremely useful 
> projects.
> Last year, I implemented support in PyCUDA for bind-less textures that make 
> programming texture based kernels much easier.
> Things got busy and I did not get the chance to generate a patch till now.
>
> I have attached a patch for this new functionality that I generated against 
> August 9, commit 66a8ad7c28c06045bfd8eb60c2a5d6710f596cbc 
>
> I have also included a simple test script that should run without errors if 
> the new functions are built correctly.
> I have implemented a decent amount of private code that uses these features. 
> So I am reasonably confident that most things work, but the simple test 
> script is all I am able to open source at present.
> The new code does not significantly change any of the core functions in 
> PyCUDA, so it should not have any side effects on the existing code.
>  
> I am not very familiar with the internals of the PyCUDA project or with 
> boost. So I’d appreciate it if you could review and merge the contribution.

Thanks for your contribution! I've created a merge request with your
code, here:

https://gitlab.tiker.net/inducer/pycuda/merge_requests/19

I've reviewed and slightly modified your code, it passes CI, and it's
basically ready to go in. One thing it is missing before I can merge it
is documentation. Could you take a look? Emailing me a patch against
doc/source/driver.rst is fine.

I've also cc'd the list in case anyone else is interested in this.

Thanks again,
Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: returned a result with an error set

2019-07-17 Thread Andreas Kloeckner

Hi Ziqiao,

Do you have numpy installed on your machine? That's the only thing I can
think of as getting imported during import of the PyCUDA module. I've
not seen that error before---have you tried building from source? Or
using Christoph Gohlke's binaries?

Andreas

z...@asagi.waseda.jp writes:
> I am Ziqiao, trying to use PyCUDA  to boost my calculation.
> There were several attempts and I finally succeed to install PyCUDA.
> However, when I try the sample code on the Wiki for a test, it shows this 
> import error.
>
> 
> Traceback (most recent call last):
>
>   File "", line 1, in 
> runfile('D:/NICT/Python Scripts/PyCUDA_test.py', wdir='D:/NICT/Python 
> Scripts')
>
>   File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", 
> line 880, in runfile
> execfile(filename, namespace)
>
>   File "C:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", 
> line 102, in execfile
> exec(compile(f.read(), filename, 'exec'), namespace)
>
>   File "D:/NICT/Python Scripts/PyCUDA_test.py", line 9, in 
> import pycuda.driver as cuda
>
>   File "C:\Anaconda3\lib\site-packages\pycuda\driver.py", line 5, in 
> from pycuda._driver import *  # noqa
>
> SystemError:  returned a result with an error set
> 
>
> My current environment is:
> Win 10 64 bit
> NVIDIA Quadro P1000 (with the latest driver)
> Python 3.6.1 (with Anaconda 3 4.4.0)
> Cuda 10.1.168_425.25 (for win 10)
> Microsoft Visual Studio 14.0 (for the compiler)
>
> I used:
> pycuda-2019.1+cuda101-cp36-cp36m-win_amd64.whl
> pytools-2019.1.1-py2.py3-none-any.whl
> to install PyCUDA and PyTools and they ended successfully.
>
> If anyone knows how to fix this, please let me know.
> Thank you in advance!
> ___
> PyCUDA mailing list -- pycuda@tiker.net
> To unsubscribe send an email to pycuda-le...@tiker.net
>



signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: PyCuda installation - Colab

2019-06-21 Thread Andreas Kloeckner

Dear Chris,

Chris Fourie  writes:
> Thanks for the wiki on the PyCuda installation =)
>
> I have just installed it on Colab following a mix of the instructions.
> From these two pages...
> https://wiki.tiker.net/PyCuda/Installation/Linux
> https://wiki.tiker.net/PyCuda/Installation/Linux/Ubuntu
>
> Thought it might be useful to include an example on your wiki but didn't
> want to go about changing anything.
>
> If you are interested here is the link
> https://colab.research.google.com/drive/1_EdLm4jXblZ1epKRwD-gWyOjv8ETy5Br

Thanks for investigating how to use PyCUDA on Google's
Colab/Colaboratory. I've cc'd the mailing list, as someone on there may
also find this useful. In the meantime, please feel free to add some
text linking to your work to the Wiki---that's what it's for!

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: Pycuda error in windows 10 Python 3.7.3 Cuda9.0

2019-06-07 Thread Andreas Kloeckner

olay...@gmail.com writes:

> I installed pycuda by downloading .whl file.
>
> When I run the command :
>
> import pycuda.gpuarray as gpuarray
>
> I get the below error:
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "C:\Anaconda\lib\site-packages\pycuda\gpuarray.py", line 4, in 
> import pycuda.elementwise as elementwise
>   File "C:\Anaconda\lib\site-packages\pycuda\elementwise.py", line 35, in 
> 
> from pycuda.tools import context_dependent_memoize
>   File "C:\Anaconda\lib\site-packages\pycuda\tools.py", line 34, in 
> import pycuda.driver as cuda
>   File "C:\Anaconda\lib\site-packages\pycuda\driver.py", line 6, in 
> from pycuda._driver import *  # noqa
> ModuleNotFoundError: No module named 'pycuda._driver'
>
> The folder has "_driver.cp37-win_amd64.pyd" file but not pycuda._driver file.
> ___
> PyCUDA mailing list -- pycuda@tiker.net
> To unsubscribe send an email to pycuda-le...@tiker.net
>

Use the tool "Dependency Walker" on _driver.cp37-win_amd64.pyd to see
what DLLs you're missing. Do you have the CUDA drivers installed?

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: Import Error: No module named compyte.dtypes

2019-05-24 Thread Andreas Kloeckner

Dan Guralnik  writes:

> Thanks!
>
> Actually, I originally got the wheel from 
> (https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycuda).
>
> Then I tried again after reinstalling using "pip install pycuda".
>
> Finally, I've just now tried building and installing the version posted 
> at link you've provided, yielding the same results.
>
> Could this have anything to do with my doing this on a Windows 10 machine?

Super weird.  This file:

https://files.pythonhosted.org/packages/4d/29/5a3eb66c2f1a4adc681f6c8131e9ed677af31b0c8a78726d540bd44b3403/pycuda-2019.1.tar.gz

(the download linked from https://pypi.org/project/pycuda/#files)
contains pycuda/compyte/dtypes.py for me. After installing that, could
you check your site-packages to see whether it got installed? Maybe
you've got a stale copy of PyCUDA floating around somewhere else on your
Python path?

It does look like Christoph's binaries [1] (I checked 
pycuda‑2019.1+cuda101‑cp37‑cp37m‑win_amd64.whl) are missing the compyte
files. I've cc'd Christoph to see if that's intentional.

[1] https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycuda

Andreas

signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: Import Error: No module named compyte.dtypes

2019-05-23 Thread Andreas Kloeckner

Dan Guralnik  writes:

> Hello,
>
> When attempting to run pycuda (the latest 2019 version), I encounter the 
> following error message from python (2.7):
>
> --
>   File "test_cuda.py", line 9, in 
>  import pycuda.tools
>File "C:\Users\kotmasha\Anaconda2\lib\site-packages\pycuda\tools.py", line 
> 44, in 
>  from pycuda.compyte.dtypes import (
> ImportError: No module named compyte.dtypes
> --
>
> Do you think my installation is missing something? If so, what is and
> why? (I've refreshed pip, setuptools, pytools ahead of installing
> pycuda)

I'm guessing you downloaded the release from Github. If so, don't do
that. Grab it from https://pypi.org. Github insists on making available
broken tarballs that I can't even turn off.

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: cl.exe

2019-04-25 Thread Andreas Kloeckner

Hi Ajit,

"Ajit Limaye"  writes:
> I'm just getting started with PyCUDA. I installed it with pip (on a Win-10 +
> Anaconda machine using "pip install") and then tried to run the tutorial
> example here: https://documen.tician.de/pycuda/tutorial.html. When I do, I
> get the following error messages:
>
>  
>
> Traceback (most recent call last):
>
>   File "", line 7, in 
>
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line
> 291, in __init__
>
> arch, code, cache_dir, include_dirs)
>
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line
> 254, in compile
>
> return compile_plain(source, options, keep, nvcc, cache_dir, target)
>
>   File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line
> 137, in compile_plain
>
> stderr=stderr.decode("utf-8", "replace"))
>
> pycuda.driver.CompileError: nvcc compilation of
> C:\Users\Ajit\AppData\Local\Temp\tmp4z9j5802\kernel.cu failed
>
> [command: nvcc --cubin -arch sm_61 -m64
> -Ic:\programdata\anaconda3\lib\site-packages\pycuda\cuda kernel.cu]
>
> [stdout:
>
> nvcc fatal   : Cannot find compiler 'cl.exe' in PATH
>
> I have Visual Studio Community 2017 installed, but there is no directory for
> it under "Program Files(x86)" - I only see 2014. And in that directory, I
> can't find cl.exe. So my question is: where is cl.exe?

Visual Studio should ship with a batch file 'vcvars.bat' that you can
run. After running that on the command line, the VS C comiler (cl.exe)
should be on your path and runnable on the command line within that same
shell.

HTH,
Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: error about index type when running sparse cg example

2019-03-20 Thread Andreas Kloeckner

bren...@u.northwestern.edu writes:
> I'm trying to run the cg example posted at 
> https://andreask.cs.illinois.edu/PyCuda/Examples/SparseSolve
>
> where I have copied the file into one called: py_cuda_cg_test.py
>
> I'm testing using a 5 by 5 sparse symmetric .mm file I found here
> https://people.sc.fsu.edu/~jburkardt/data/mm/m_05_05_crs.mm
>
> though after some messing around I don't think that the file matters much
>
> when I run the command
> $ python py_cuda_cg_test.py m_05_05_crs.mm --is-symmetric
>
> I get the error:
>
> Traceback (most recent call last):
>   File "py_cuda_cg_test.py", line 88, in 
> main_cg()
>   File "py_cuda_cg_test.py", line 35, in main_cg
> spmv = PacketedSpMV(csr_mat, options.is_symmetric, csr_mat.dtype)
>   File 
> "/home/bs162/.local/lib/python2.7/site-packages/pycuda/sparse/packeted.py", 
> line 185, in __init__
> local_row_costs)
>   File "pycuda/sparse/pkt_build_cython.pyx", line 22, in 
> pycuda.sparse.pkt_build_cython.build_pkt_data_structure 
> (/home/bs162/.pyxbld/temp.linux-x86_64-2.7/pyrex/pycuda/sparse/pkt_build_cython.c:1219)
> TypeError: 'numpy.float64' object cannot be interpreted as an index
>
> which seems to indicate that the index information is somehow being cast from 
> an int to a float during the call to packeted.py
>
> I get this error with both python 2.7 and 3.6 and I am using numpy 1.13.3. 
> Any help would be greatly appreciated! 

While I'd be happy to merge patches, I do not currently have time to
support the (undocumented/experimental) sparse matrix/linear solver
functionality in PyCUDA.

Sorry,
Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

[PyCUDA] Re: releasing the GIL during init and make_context

2019-01-08 Thread Andreas Kloeckner

Antoine Martin  writes:
> We use pycuda to access NVENC and our application (xpra) if very
> sensitive to latency, unfortunately it seems that pycuda will hold the
> GIL during driver.init() and driver.make_context() and those calls can
> take hundreds of milliseconds to complete.
> Is there any reason why this is the case?
> Would you consider a change to pycuda to release the GIL?

No particular reason from my end. I'd be happy to consider a PR with
such a change. Should be pretty straightforward.

To help me run CI, it would be lovely if you could submit the PR here:

https://gitlab.tiker.net/inducer/pycuda

I've created an account for you.

Best,
Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

Re: [PyCUDA] simple question about PyCuda

2018-11-29 Thread Andreas Kloeckner

Davide Bassano  writes:

> Dear Mr/Mrs
>
>
>
> I have just started working with PyCuda and I have a simple question: how
> can I parallel a Python code if PyCuda wants a kernel written in C?
>
>
>
> Let me clarify: I have a Python code (with classes and other things all
> suitable with Python and unsuitable with C). I have 256 independent for
> loops that I want to parallelize. These loops contain Python code that
> can’t be translated to C. So I tried using PyCuda package but it turned out
> that the kernel must be written in C.
>
> How can I parallelize an actual Python code with PyCuda package without
> translating my code to C?

You could try using the gpuarray functionality built into PyCUDA. Some
numpy-based codes can be effectively made GPU-aware through it.

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Tricks to avoid device2device data copy when slicing the gpuarray?

2018-11-28 Thread Andreas Kloeckner

黄 瓒  writes:

> Hi All,
>
> @inducer THANK YOU for providing PyCUDA.
>
> As cudaMalloc could be time-consuming, it seems even slicing would include 
> such operation in PyCUDA, are there any tricks to avoid frequent gpu memory 
> operation in PyCUDA?

Slicing a GPUArray involves no allocations. PyCUDA includes a memory
pool which can help avoid redundant allocation.

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Context being sporadically destroyed when using multiple threads and contexts

2018-11-08 Thread Andreas Kloeckner

Noah Young  writes:
> I'm trying to run jobs on several GPUs at the same time using multiple
> threads, each with its own context. Sometimes this works flawlessly, but
> ~75% of the time I get a cuModuleLoadDataEx error telling me the context
> has been destroyed. What's frustrating is that nothing changes between
> failed and successful runs of the code. From what I can tell it's down to
> luck whether or not the error comes up:

"Context destroyed" is akin to a segmentation fault on the CPU. You
should find evidence that your code performed an illegal access, e.g.,
using 'dmesg' in the kernel log. (If you see a message "NVRM Xid ...",
that points to the problem) My first suspicion would be a bug in your
code.

Andreas

signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Fwd: pycude error about importError: No module named _driver

2018-10-04 Thread Andreas Kloeckner

Hilary,

Hilary L  writes:
> I a newer to install pycuda on windows 10, 64bit, with python 2.7 version,
> and GPU is NVIDIA MX150.
> Firstly, I have installed the vcforpython27 to compile c++ (vs2008), and
> then I installed CUDA 3.2. After that, I installed the boost (boost_1_67_0).
> Then, I have tried to installed pycuda-0.94.2, but there are several
> errors, shown below.
> [image: image.png]
>
> Secondly, I installed pycuda-2018.1.1, and it seemed installed
> successfully, shown below.
> [image: image.png]
> But when I test it, it comes to be some errors, shown below.
> [image: image.png]
>
>
> Next, I have tried installing
> (pycuda-2014.1+cuda6514-cp27-none-win_amd64.whl) and
> (pycuda-2018.1+cuda92148-cp27-cp27m-win_amd64.whl). These two versions can
> be installed successfully, shown below.
> [image: image.png]
>
> And also, when I test it, it comes to the similar error like the previous
> one.
> [image: image.png]
>
> I have search google for these errors and someone suggests to send email
> for help.
> I am looking forward to your kindly help.

The likely issue is that PyCUDA can't find the CUDA driver
interface. You may use "Dependency Walker" on "_driver.pyd" to see what
it's trying to find. Installing CUDA from Nvidia's download page should
get you on your way.

HTH,
Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Quick question about GPU-CUDA

2018-09-12 Thread Andreas Kloeckner

Peter, 

Szu-Pei Fu  writes:
> I followed your wiki page to run pycuda/test/python test_driver.py on porter
>
> and got the following error message.
>
> E   CompileError: nvcc compilation of /tmp/tmpImcdf6/kernel.cu
> failed
> E   [command: nvcc --cubin -arch sm_52
> -I/home/sf47/miniconda3/envs/inteq/lib/python3.5/site-packages/pycuda/cuda
> kernel.cu]
> E   [stderr:
> E   /usr/include/c++/8/type_traits(1049): error: type name is not
> allowed
> E
> E   /usr/include/c++/8/type_traits(1049): error: type name is not
> allowed
> E
> E   /usr/include/c++/8/type_traits(1049): error: identifier
> "__is_assignable" is undefined
> E
> E   3 errors detected in the compilation of
> "/tmp/tmpxft_560f_-6_kernel.cpp1.ii".
> E
> ]../../../.local/lib/python2.7/site-packages/pycuda/compiler.py:137:
> CompileError
>
> I'm wondering if I'm missing anything. Thank you for answering my question.

Something is very strange here. You traceback references both Python 2.7
and 3.5. Any clue why that might be the case? At any given moment, only
one version of Python should be active.

Next, the failure you're observing is nvcc refusing to compile some
source code. Could you try your nvcc in isolation? (e.g. with code from
here [1]) If that still fails, you should fix your CUDA installation and
then retry with PyCUDA.

HTH,
Andraes

[1] https://devblogs.nvidia.com/easy-introduction-cuda-c-and-c/


CC'd the PyCUDA list for for archival.


signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Can I compute the sum over only 1 dimension of a matrix?

2018-07-30 Thread Andreas Kloeckner

Rasmus Diederichsen  writes:
> Is it possible to use Reduction operations to reduce a 2-d array to a
> 1-d one, by e.g. computing the rowwise sum or some other operations?
> So far I haven't been successful.

No--ReductionKernel is not meant for that. Its role is  to do global
reductions when there is *no* other source of concurrency available. In
your situation, you can still parallelize over the non-summed axis,
which will lead to vastly more efficient code. As a downside, there
isn't really canned code to do that. But check out

https://documen.tician.de/loopy/

It can help you write that kernel.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCUDA and arch flags

2018-07-20 Thread Andreas Kloeckner

Aleksandar Donev  writes:
> We have a machine now that has both a Titan X (Maxwell) and a Titan V 
> (Volta) card, which have different architectures. My students/postdocs 
> have been running PyCUDA codes but I am not sure if we need to do 
> anything different in this case -- does PyCUDA determine the 
> architecture automatically?
>
> Based on this:
>
> https://docs.nvidia.com/cuda/volta-compatibility-guide/index.html
>
> the right thing to do is to compile with nvcc and the flags
>
> -gencode=arch=compute_61,code=sm_61
> -gencode=arch=compute_70,code=sm_70
>
> Do we need to do anything or will it just work?

Cc'ing the mailing list for archival.

You will only have one context per GPU. PyCUDA's SourceModule (assuming
that that's what you're using) will automatically pass the right arch
flags for the GPU in that context.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] import pycuda.driver fails

2018-07-16 Thread Andreas Kloeckner

Harshit,

Harshit Suri  writes:
> I had a working installation of pycuda. However, after running updates on
> my Ubuntu machine;
> import pycuda.driver as cuda fails.
> ( I had also updated my anaconda install and updated all packages that
> anaconda found that required updates )
>
> When I try running " import pycuda.driver " through jupyter notebook
> I get the following error message. Followed by a kernel crash
> --
> [I 10:02:37.765 NotebookApp] Adapting to protocol v5.1 for kernel
> 5fee93d7-62cf-468a-bcb0-b48e6f1d0987
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  numpy failed to initialize
> [I 10:02:43.117 NotebookApp] KernelRestarter: restarting kernel (1/5), keep
> random ports
> kernel 5fee93d7-62cf-468a-bcb0-b48e6f1d0987 restarted
> --
> It seems numpy fails to initialize.
> However if I try to import numpy independently, it passes successfully.
>
> My guess is that when I ran the updates it updated something that caused
> this. The installation for pycuda was done by "pip install pycuda"

My guess would be Nvidia kernel/driver mismatch. Check the end of dmesg,
there might be a message there.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

[PyCUDA] [Grace Law] An invitation to talk about PyCUDA to 500+ devs at PyBay

2018-07-16 Thread Andreas Kloeckner

Hi all,

see below for a message from PyBay. If you're near the bay area and
would like to chat about PyCUDA, this might be a good opportunity--and
please also reply to Grace and me directly in case you're planning on
going.

Andreas


--- Begin Message ---
Hi Andreas

Congrats on the success and momentum with PyCUDA!  I am Grace, the chair
for PyBay, a community run Python Conference in San Francisco this August
16-19 with 500+ developers attending.  It's a bit like PyCon back in 2008
with 3 simultaneous talks but the majority of the attendees live in the Bay
Area.

Some of the folks in the community are expressing interests in hearing more
about your project.  While the CFP is long closed, we still have 20
lightning talk slots and Open Spaces/BOFs within the main conference days.
We know you live far and this is a short notice, but would you be so kind
as to encourage some of your contributors and users who might live near the
Bay Area to attend and share your project?  Perhaps there are folks at
NVidia in California who might like to attend and talk about PyCUDA?

Here is a quick blurb on PyBay -

PyBay is a regional Python Conference in SF.  2018's program features an
amazing line up of deeply technical talks on Python, data, infrastructure,
and performance.  The 50+ Speakers and Workshop Leaders include:

   - 10 core Python devs such as Raymond Hettinger, Travis Oliphant, Yury
   Selivanov, Carol Willing, Brandon Rhodes, Simon Wilison;
   - CTOs from up and coming startups; and
   - Senior devs from Google, Linkedin, Facebook, Yelp, Twitter, Microsoft,
   Amazon...

There is still a chance to present your ideas via lightning talks and BOFs
at this 500+ developer get-together on August 16-19,   See pybay.com for
more details!

I look forward to hearing back and possibly getting something more official
in for next year!

Cheers,


Grace Law

PyBay Conference Chair and SF Python Organizer

415-323-0388 / gr...@pybay.com


SF Python  is a volunteer-run organization aiming to
foster the Python Community in the Bay Area. We produce ~20 educational
events a year including PyBay , the Regional Python
Conference in SF this August. Learn more 
--- End Message ---
___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Create gpuarrays on different GPUs

2018-05-29 Thread Andreas Kloeckner

Zhangsheng Lai  writes:
> My 'can access' simply means that I'm able to access the values in the
> variable in python by typing x1 or x2. My understanding is that if the
> variables are stored on different GPUs, then I should be able to type x1
> and get its values when ctx1 is active and similarly, I can type x2 and get
> the x2 values when ctx2 is active, not when ctx1 is active.

You could measure bandwidths to between host/presumed gpu1/presumed gpu2
to ascertain where the data actually resides if you have doubts about
that/don't trust the API.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Create gpuarrays on different GPUs

2018-05-25 Thread Andreas Kloeckner

Zhangsheng Lai  writes:
> with the setup above, I tried to check by poping ctx2 and pushing ctx1, can
> I access x1 and not x2 and vice versa, popping ctx1 and pushing ctx2, I can
> access x2 and not x1. However, I realise that I can access x1 and x2 in
> both contexts.

Can you clarify what you mean by 'can access'? I'm guessing 'submit
kernel launches with that pointer as an argument'?

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] I have been through the introduction, checked the faq, where to get help on pycuda -

2018-05-17 Thread Andreas Kloeckner

Dear Anthony,

"Anthony Pleticos"  writes:
> I would like to know where people can go for 'assistance' in difficulties in
> applying the pycuda.
>
> I could not find it in the
> https://wiki.tiker.net/PyCuda/FrequentlyAskedQuestions  and StackExchange
> does not answer my specific issue, especially when running either the
> tutorial and/or examples such as demo.py and hello_gpu in
> C:\Python36\pycuda-2017.1.1\examples.
>
>  
>
> I tried the follow step-by-step the tutorial at
> https://documen.tician.de/pycuda/tutorial.html#where-to-go-from-here .
>
> The problem comes under the heading "Executing a Kernel" where I have a c++
> like module in the py file.
>
> mod = SourceModule("""
>
> __global__ void method(args)
>
> {
>
> C++ like code
>
> }
>
> """)
>
> It happens under the sample code in your tutorial or the
> pycuda-2017.1.1\examples I get the error message
>
> nvcc fatal   : Value 'sm_21' is not defined for option 'gpu-architecture'

Generally, the mailing list (cc'd, needs subscription to post) is a good
place for requests like this. In your case, you seem to have fairly old
GPU ("sm_21") that's no longer supported by your compiler
(nvcc). Downgrading the CUDA toolkit may help.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Example SparseSolve

2018-05-12 Thread Andreas Kloeckner

This is really tech support for PyMetis (i.e. wrong list), but oh well.

Just install pyublas.

Andreas


MarbHarmsen  writes:
> I'm currently trying to build a simple FEA solver in python using an
> incomplete Cholesky decomposition preconditioned conjugate gradient method.
> I have exported an example stiffness matrix from my old (and slow) code into
> a symmetric .mtx file. This matrix is imported into the example code at
> https://wiki.tiker.net/PyCuda/Examples/SparseSolve.
>
> Running the code resulted in errors. The first part was solved by pymetis.
> However now a new problem appeared. I get a type error: "TypeError: No
> registered converter was able to produce a C++ rvalue of type int from this
> Python object of type numpy.int32".
>
> The full traceback:
> In [1]:
> runfile('/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src/cgCUDATest.py',
> wdir='/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src')
> starting...
> building...
> Traceback (most recent call last):
>
>   File "", line 1, in 
>
> runfile('/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src/cgCUDATest.py',
> wdir='/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src')
>
>   File
> "/home/bram/.anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py",
> line 705, in runfile
> execfile(filename, namespace)
>
>   File
> "/home/bram/.anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py",
> line 102, in execfile
> exec(compile(f.read(), filename, 'exec'), namespace)
>
>   File
> "/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src/cgCUDATest.py", line
> 71, in 
> main_cg()
>
>   File
> "/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src/cgCUDATest.py", line
> 21, in main_cg
> spmv = PacketedSpMV(csr_mat, 'symmetric', csr_mat.dtype)
>
>   File
> "/home/bram/.anaconda3/lib/python3.6/site-packages/pycuda-2017.1-py3.6-linux-x86_64.egg/pycuda/sparse/packeted.py",
> line 127, in __init__
> xadj=adj_mat.indptr, adjncy=adj_mat.indices)
>
>   File
> "/home/bram/.anaconda3/lib/python3.6/site-packages/pymetis/__init__.py",
> line 120, in part_graph
> return part_graph(nparts, xadj, adjncy, vweights, eweights, recursive)
>
> TypeError: No registered converter was able to produce a C++ rvalue of type
> int from this Python object of type numpy.int32
>
> Do I need to change the python files in pymetis?
>
>
>
> --
> Sent from: http://pycuda.2962900.n2.nabble.com/
>
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> https://lists.tiker.net/listinfo/pycuda


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Invalid resource handle error

2018-05-11 Thread Andreas Kloeckner

Zhangsheng Lai  writes:

> Hi,
>
> I'm trying to do some updates to a state which is a binary array. gputid is
> a GPU thread class (https://wiki.tiker.net/PyCuda/Examples/MultipleThreads)
> and it stores the state and the index of the array to be updated in another
> class which can be accessed with gputid.mp.x_gpu and gputid.mp.neuron_gpu
> respectively. Below is my kernel that takes in the gputid and performs the
> update of the state. However, it the output of the code is not consistent
> as it runs into errors and executes perfectly when i run it multiple times.
> The error msg makes no sense to me:
>
> File "/root/anaconda3/lib/python3.6/site-packages/pycuda/driver.py", line
> 447, in function_prepared_call
> func._set_block_shape(*block)
> pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource
> handle

I think the right way to interpret this is that if you cause an
on-device segfault, the GPU context dies, and all the handles of objects
contained in it (including the function) become invalid.

HTH,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCUDA ImportError

2018-05-11 Thread Andreas Kloeckner

MarbHarmsen  writes:

> My goal is to speed up my python FEA (finite elements analysis) with my
> quadro GPU. I however have issues when I import pycuda.autoinit or
> pycuda.driver into my python code. See the example from my Console:
>
> **code
> In [6] import pycuda.autoinit
> Traceback (most recent call last):
>
>   File "", line 1, in 
> import pycuda.autoinit
>
>   File
> "/home/bram/.anaconda3/lib/python3.6/site-packages/pycuda-2017.1-py3.6-linux-x86_64.egg/pycuda/autoinit.py",
> line 2, in 
> import pycuda.driver as cuda
>
>   File
> "/home/bram/.anaconda3/lib/python3.6/site-packages/pycuda-2017.1-py3.6-linux-x86_64.egg/pycuda/driver.py",
> line 5, in 
> from pycuda._driver import *  # noqa
>
> ImportError: libcurand.so.8.0: cannot open shared object file: No such file
> or directory
> /***code
>
> Some details of my setup:
> - HP Zbook Studio G3 (Quadro M1000M) Ubuntu 18.04
> - Cuda 9.1 (.run installer)(I added the path variables to ~/.bashrc
> - nvidia-driver-390 as driver
> pycuda 2017.1 (from anaconda)
>
> I've tried the solutions proposed by people encountering similar issues when
> using tensorflow-gpu: It was proposed to make a softlink from
> libcurand.se.8.0 to the libcurand.se.9.1 using the terminal: 
> "user@device:~$ sudo ln -s libcublas.so.9.1 libcublas.so.8.0" This did not
> help however.
>
> I've checked the installation of CUDA by running a simple vectorAdd example
> in Exlips. That worked without any issues and when profiling it showed that
> the gpu was working as expected.
>
> I probably made a mistake somewhere and tell me if you need more information

It appears that your PyCUDA was built against CUDA 8. (it refers to
curand version 8) Since you're using CUDA 9, it cannot find that component.
Either rebuild pycuda yourself, or ask whoever provided your version of
PyCUDA (neither upstream Anaconda nor conda-forge ship it AFAIK) to
supply an updated version built against CUDA 9.

HTH,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] cuModuleLoadDataEx failed: device kernel image is invalid

2018-04-20 Thread Andreas Kloeckner

Zhangsheng Lai  writes:

> Hi Andreas,
>
> Thanks! It worked! Can I ask if you think  cuda.memcpy_peer can be
> used threads for GPUs (
> https://wiki.tiker.net/PyCuda/Examples/MultipleThreads)? I think this is
> more of a threading question than a PyCUDA question but would like your
> insights on this.

Please make sure to keep the list cc'd for archival.

As for your question, I don't see why not.

Andreas



___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] cuModuleLoadDataEx failed: device kernel image is invalid

2018-04-19 Thread Andreas Kloeckner

You're prescribing the GPU architecture (arch='...'). If this doesn't
match your GPU, this could easily cause this issue. Just deleting that
kwarg should be fine.

Andreas


Zhangsheng Lai  writes:
> I'm encountering this error as I run my code on the same docker environment
> but on different workstations.
>
> ```
> Traceback (most recent call last):
>   File "simple_peer.py", line 76, in 
> tslr_gpu, lr_gpu = mp.initialise()
>   File "/root/distributed-mpp/naive/mccullochpitts.py", line 102, in
> initialise
> """, arch='sm_60')
>   File "/root/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py",
> line 294, in __init__
> self.module = module_from_buffer(cubin)
> pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image
> is invalid -
>
> ```
> I did a quick search and only found this :
> https://github.com/inducer/pycuda/issues/45 , but it doesn't seem to be
> relevant to my problem as it runs on my initial workstation. Can anyone see
> what is the issue?
>
> Below is my code that I'm trying to run:
> ```
> def initialise(self):
> """
> Documentation here
> """
>
> mod = SourceModule("""
> #include 
> __global__ void initial(float *tslr_out, float *lr_out, float
> *W_gpu,\
> float *b_gpu, int *x_gpu, int d, float temp)
> {
> int tx = threadIdx.x;
>
> // Wx stores the W_ji x_i product value
> float Wx = 0;
>
> // Matrix multiplication of W and x
> for (int k=0; k {
> float W_element = W_gpu[tx * d + k];
> float x_element = x_gpu[k];
> Wx += W_element * x_element;
> }
>
> // Computing the linear response, signed linear response with
> temp
> lr_out[tx] = Wx + b_gpu[tx];
> tslr_out[tx] = (0.5/temp) * (1 - 2*x_gpu[tx])* (Wx + b_gpu[tx]);
> }
> """, arch='sm_60')
>
> func = mod.get_function("initial")
>
> # format for prepare defined at
> https://docs.python.org/2/library/struct.html
> func.prepare("Pif")
>
> dsize_nparray = np.zeros((self.d,), dtype = np.float32)
>
> lr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
> slr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
> tslr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
>
> grid=(1,1)
> block=(self.d,1,1)
> # block=(self.d,self.d,1)
>
> func.prepared_call(grid, block, tslr_gpu, lr_gpu, self.W_gpu, \
> self.b_gpu, self.x_gpu, self.d, self.temp)
>
> return tslr_gpu, lr_gpu
> ```
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> https://lists.tiker.net/listinfo/pycuda


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Multithreading with a single context per process

2018-03-05 Thread Andreas Kloeckner

Emanuel Rietveld  writes:

> If I understand correctly, the current PyCUDA multithreading examples
> assume you create a separate context for each thread.
>
> If I want to use CUDA 4.0+'s one-context-per-process model instead,
> how would I do that in PyCUDA?
>
> I think you'd call cudaSetDevice instead of cuCtxCreate? Does the
> equivalent exist in PyCUDA? If it does not, can I add it?

Yes, in fact that would be very welcome. PyCUDA has some complicated and
brittle logic in place to manage CUDA's context stacks that I've been
meaning to rip out. Here's an example:

https://github.com/inducer/pycuda/blob/master/src/cpp/cuda.hpp#L525

Patches that get rid of all that code and simplify it would be very
welcome.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCUDA PyPy compatibility

2018-02-26 Thread Andreas Kloeckner

Emanuel Rietveld  writes:
> I'm trying to use PyCUDA with PyPy. With these two patches it seems to work...

Thanks! Merged.

> Is there anything else I'd need to be mindful of? On this PyPy page
> https://bitbucket.org/pypy/compatibility/wiki/Home PyCUDA is
> explicitly listed as incompatible... However with the following patch
> to the tests, I can run those too and they all pass.

Not that I'm aware of. Thanks for working on this!

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] non-contiguous array support

2018-02-23 Thread Andreas Kloeckner

Syam Gadde  writes:

> Sorry if this comes through multiple times, I've been having problems posting 
> from an email alias.
>
>
> Andreas suggested I mail the mailing list and solicit comments here.
>
> I submitted a pull request that adapts the element-wise kernels to support 
> non-contiguous arrays (including negative-strided arrays):
>
> https://github.com/inducer/pycuda/pull/171
>
> There have been a number of requests for this kind of functionality, so I'm 
> hoping this is useful (at least as a proof-of-concept).  It passes all 
> current PyCUDA tests, but I've got some local code that fail for some more 
> complicated cases.  (Unfortunately can't reduce it to a unit test yet)  But 
> most things work.
>
> In some ways it's an elaborate monkey-patch meant to disturb existing code as 
> little as possible, but if you create your own element-wise kernels, to get 
> the new functionality you have to make a few changes.
>
>
> More details at the above PR.
>
>
> Anyway, if anyone is interested in trying it out, I'd be interested to hear 
> how it works for you, or whether you have suggestions for fixes.  The code is 
> in my fork:
>
> https://github.com/SyamGadde/pycuda.git
>
> in the 'noncontig' branch.

Please help take a look and help review the code. If there's one thing I'm
super short of these days, it's code review bandwidth. And so it helps a
lot if potential issues get highlighted and discussed. Thanks!

Here's that link again:

https://github.com/inducer/pycuda/pull/171

Also, thanks again Syam for working on and submitting this!

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCuda in QThread using moveToThread

2018-02-16 Thread Andreas Kloeckner

David G Grier  writes:

> I am using pycuda to compute holograms for an optical trapping
> application that uses PyQt4 for a GUI front end.  I would like to move
> the pycuda computation into a QThread to keep the GUI responsive.
>
> Is there an up-to-date working example of "the right way" to move a
> pycuda computational object into a QThread using moveToThread?
> Despite much searching, I have not turned up sample code.  The
> pycuda FAQ addresses subclassing threads, but not moveToThread
> method of QThread.
>
> The following minimal example of a "do nothing" background
> object appears to work correctly.  It creates an object,
> moves it into a thread, creates a pycuda context for the object in
> its thread, and then stops the object by quitting the thread.
> Before I dig deeper, can anyone confirm that this is the right approach?
> Or am I missing something

As long as you make a new context for each thread (and destroy it at the
end) as you do, you should be OK.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] pyCUDA kill app with using pyQt5

2018-01-06 Thread Andreas Kloeckner

안태우 writes:
> Hello, all.I'm using pycuda for making simple project with pyQt5.  But when I 
> programed like this(in Windows 10, Python 3.6.2), app is aborted with 
> printing bottom 
> logs.---PyCUDA
>  ERROR: The context stack was not empty upon module 
> cleanup.---A 
> context was still active when the context stack was beingcleaned up. At this 
> point in our execution, CUDA may alreadyhave been deinitialized, so there is 
> no way we can finishcleanly. The program will be aborted now.Use 
> Context.pop() to avoid this 
> problem.--My 
> code is like this.Could you give me some help? main.pyimport sys
> from PyQt5 import uicfrom PyQt5.QtGui import *from PyQt5.QtWidgets 
> import *from PyQt5.QtCore import *from PIL.ImageQt import ImageQt
> import fractalform_class = uic.loadUiType("main.ui")[0]class 
> Form(QMainWindow, form_class):DRAW_MANDELBROT = 1def 
> __init__(self):super().__init__()self.setupUi(self)   
>  self.setWindowFlags(Qt.MSWindowsFixedSizeDialogHint)
> self.setFixedSize(self.size())
> self.drawfractal(self.DRAW_MANDELBROT)def wheelEvent(self, event: 
> QWheelEvent):pos = QWidget.mapFromGlobal(QCursor.pos()) // when 
> this event happens, error occurs. print(pos)def 
> drawfractal(self, sort):global imgif sort is 
> self.DRAW_MANDELBROT:img = fractal.mandelbrot(-2, -2, 
> self.size().width() * 2, 400)   qimg = 
> QPixmap.fromImage(ImageQt(img))qimg = qimg.scaled(self.size(), 
> Qt.KeepAspectRatio, Qt.SmoothTransformation)
> self.label.setPixmap(qimg)if __name__ == "__main__":app = 
> QApplication(sys.argv)w = Form()w.show()
> sys.exit(app.exec())fractal.pyimport pycuda.autoinitimport 
> pycuda.driver as cudafrom pycuda.compiler import SourceModuleimport 
> numpy as npimport matplotlib.cm as cmfrom matplotlib.colors import 
> Normalizefrom PIL import Image...def mandelbrot(startx, starty, 
> size, precision):matrix = np.array([startx, starty, size, precision], 
> np.float32)result = np.empty((size, size), np.int32)
> matrix_gpu = cuda.mem_alloc(matrix.nbytes)result_gpu = 
> cuda.mem_alloc(result.nbytes)cuda.memcpy_htod(matrix_gpu, matrix)   
>  func = cu.get_function("mandelbrot")func(matrix_gpu, result_gpu, 
> block=(1, 1, 1), grid=(size, size))cuda.memcpy_dtoh(result, 
> result_gpu)return array2imgarray(result, 'nipy_spectral') Thanks 

Can you supply a stack trace of the crash? Could you confirm that
PyCUDA's atexit function runs?

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Pycuda and boost with Python 3

2017-12-30 Thread Andreas Kloeckner

Chris  writes:

> When running Pycuda code on Python 3.6 I get this error when executing
> "import pycuda.gl as cuda_gl"
>
> ImportError: /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.58.0:
> undefined symbol: PyClass_Type
>
> It looks like it is using the python 2.7 boost files and not
> "libboost_python-py35.so.1.58.0". To use this do I need to configure
> something during Pycuda install or in the install directory?

./configure.py --boost-python-libname=boost-python-py36
rm -Rf build
pip install .

should do what you need.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Please help - can't find any answer on pycuda._driver cuModuleLoadDataEx error

2017-12-26 Thread Andreas Kloeckner

"Hezy, Sharon"  writes:

> Hello,
>
> I’m pretty familiar with CUDA (writing code since CUDA3.0), but PyCUDA is 
> quite new for me.
>
> I’ve been asked to configure our code that runs on CUDA 6.5, to run on 
> GeForce GTX 1080 (compute capability 6.1).
>
> OS is Windows 64bit, CUDA version – 6.5, GPU devices that should be 
> supported: GeForce GTX 980Ti (compute capability 5.2) and GeForce GTX 1080 
> (comp.cap. 6.1), Python version – 2.7.
>
> NVidia driver was updated (much after the CUDA), to support both the new 1080 
> and the old 980Ti cards.
>
>
>
> Just to answer  the question before it’s asked – there are some technical 
> reasons that prevent us from moving now to CUDA 8.0 or 9.0 (it will be done 
> some time later, but the new cards have to be supported today, with the old 
> CUDA and all the existing code…).
>
> The usual C/C++ code, compiled with nvcc with sm 52 (sm 50 is ok too), with 
> CUDA6.5 – runs fine on both 980Ti and 1080.
>
>
>
> The same “trick” (nvcc with sm 52 or 50), and compilation to .cubin files 
> from python – gives the following error when trying to run:
>
>
>
> pycuda._driver.LogicError: cuModuleLoadDataEx failed: invalid source –
>
>
>
> The only guess I have is that the pycuda package (which was installed from 
> binary distribution) - was linked with too old version of NVIDIA binaries 
> (such as driver rt), and I need to recompile the pycuda sources with the 
> current packages installed on my system.
>
>
>
> Am I right? Or there is another explanation for this?
>
> I’ve been looking for an answer in many blogs, but nobody describes my 
> problem…

You could try using this branch, which adds support for using the NVRTC
API by way of a separate JIT class, DynamicSourceModule:

https://gitlab.tiker.net/inducer/pycuda/merge_requests/3

That might help you get around the compiler/driver mismatch you're
seeing.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] installation pycuda2017.1.1 cuda9 debian9

2017-12-15 Thread Andreas Kloeckner

Hi Christoph,

christoph  writes:
> I am new to pycuda and would love to  to install pycuda, but I fail to use
> it because of the following message.
>
>
> ExecError: error invoking 'nvcc --version': [Errno 13] Permission denied

This points to an issue with your CUDA installation. See if you can
compile and run a CUDA C example like the following:

https://www.quantstart.com/articles/Vector-Addition-Hello-World-Example-with-CUDA-on-Mac-OSX

using CUDA C. If you can, then PyCUDA should work for you. If you can't,
then something is wrong with your installation that is independent of PyCUDA.

> and when I try to run the test script i get the following error although
> pytest is installed.
>
> Traceback (most recent call last):
>File "test_driver.py", line 964, in 
>  from py.test.cmdline import main
> ImportError: No module named 'py.test.cmdline'; 'py.test' is not a package

This is known and will need to get fixed eventually. In the meantime,
use

python -m pytest test_driver.py

instead.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] MemoryError: cuCtxCreate failed: out of memory

2017-11-02 Thread Andreas Kloeckner

Arnold Tunick  writes:
>  Hi Andreas,     my CNN training program has the following pycuda set up:
>     import pycuda.driver as drv...    # pycuda set up
>     drv.init()
>     dev = drv.Device(int(config['gpu'][-1]))
>     ctx = dev.make_context()
> When I run the program I get the following error:  File 
> "C:\Users\atunick\theano_alexnet\train.py", line 31, in train_net
>     ctx = dev.make_context()
> MemoryError: cuCtxCreate failed: out of memory
> Note that I am using a new win 10 notebook and have everything re-installed 
> and working, e.g., I have run without error two pycuda test 
> programs...hello_gpu.py and simplespeedtest.py..I have not seen this problem 
> before and when I search the web for solutions, the ones that you recommended 
> earlier don't appear to work, i.e.,   import pycuda
>    pycuda.tools.clear_context_caches()
> or
>    import gc
>    and between calls add gc.collect().Please advise.Thank you.Arnold  

Is there other code in that program that might also create a context?
Contexts are quite memory-intensive, so your device may not have enough
memory to fit more than one.

Andreas


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] test program hello_gpu.py and simplespeedtest.py

2017-10-28 Thread Andreas Kloeckner

ephi5757  writes:

> Hi Andreas
> I found that the pycuda .whl was a combination install for pycuda 2017.1.1 + 
> Cuda 8.0.6.1 for win64 and python 3.6.
>
> I suspect that while the install was successful my test programs are failing 
> because I have Cuda 9.0.
>
> I am home now for the Sabbath but on Sunday I will see if I can install a 
> compatible who + cuda version.
>
> I am sorry that I did not see this earlier. 

Happy to hear you were able to figure this out.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] test program hello_gpu.py and simplespeedtest.py

2017-10-26 Thread Andreas Kloeckner

Arnold Tunick  writes:

> Hi Andreas,    I tried to reinstall pycuda from within my Miniconda3 command 
> window, i.e., using the command... pip install pycuda.    Everything goes 
> well in the build until it tries to execute the following:
>     C:\ProgramData\Miniconda3\Library\mingw-w64\bin\gcc.exe -mdll -O -Wall 
> -DMS_WIN64 -DBOOST_ALL_NO_LIB=1 -DBOOST_THREAD_BUILD_DLL=1 
> -DBOOST_MULTI_INDEX_DISABLE_SERIALIZATION=1 -DBOOST_PYTHON_SOURCE=1 
> -Dboost=pycudaboost -DBOOST_THREAD_DONT_USE_CHRONO=1 -DPYGPU_PACKAGE=pycuda 
> -DPYGPU_PYCUDA=1 -DHAVE_CURAND=1 -Isrc/cpp -Ibpl-subset/bpl_subset 
> "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include" 
> -IC:\ProgramData\Miniconda3\lib\site-packages\numpy\core\include 
> -IC:\ProgramData\Miniconda3\lib\site-packages\numpy\core\include 
> -IC:\ProgramData\Miniconda3\include -IC:\ProgramData\Miniconda3\include -c 
> src/cpp/cuda.cpp -o build\temp.win-amd64-3.6\Release\src\cpp\cuda.o /EHsc
> Then I get this error message:gcc: error: /EHsc: No such file or directory
> error: command 
> 'C:\\ProgramData\\Miniconda3\\Library\\mingw-w64\\bin\\gcc.exe' failed with 
> exit status 1.So I thought that the gcc.exe folder needed to be added to the 
> path, so I added it. Nevertheless, I got the same gcc: error: /EHsc: No such 
> file or directory.
>   
> PATH=C:\ProgramData\Miniconda3;C:\ProgramData\Miniconda3\Library\mingw-w64\bin;C:\ProgramData\Miniconda3\Library\usr\bin;C:\ProgramData\Miniconda3\Library\;C:\ProgramData\Miniconda3\Scripts;C:\ProgramData\Miniconda3\Library\bin;C:\Program
>  Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin;C:\Program Files\NVIDIA GPU 
> Computing 
> Toolkit\CUDA\v9.0\libnvvp;C:\SciSoft\Python36\Scripts\;C:\SciSoft\Python36\; 
> etc...
> At the same time, the pip install provides the following message:
>     *
>     *** I have detected that you have not run configure.py.
>     *
>     *** Additionally, no global config files were found.
>     *** I will go ahead with the default configuration.
>     *** In all likelihood, this will not work out.
>     ***
>     *** See README_SETUP.txt for more information.
>     ***
>     *** If the build does fail, just re-run configure.py with the
>     *** correct arguments, and then retry. Good 
> luck!*However,
>  there are no files named configure.py or README_SETUP.txt on my 
> computer..Please advise.Thank you.Arnold

I am not sure PyCUDA likes being built with Mingw; I'd recommend the
MS compiler (or Christoph Gohlke's binaries) on Windows. At the very
least, you'd have to adjust the CXXFLAGS config variable to not include
flags that only the MS compiler will recognize. Alternatively, PyOpenCL
does much the same thing as PyCUDA and is included in conda forge.

HTH,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] test program hello_gpu.py

2017-10-25 Thread Andreas Kloeckner

Use dependency walker on _driver.pyd/dll to find the DLL you're missing.

Andreas

Arnold Tunick  writes:
>  FYI, I used pip install 
> pycuda-2017.1.1+cuda8061-cp36-cp36m-win_amd64.whl..
>
> On ‎Wednesday‎, ‎October‎ ‎25‎, ‎2017‎ ‎04‎:‎44‎:‎33‎ ‎PM‎ ‎EDT, Arnold 
> Tunick  wrote:  
>  
>  I am testing the following python script in a miniconda shell on a win 10 
> notebook with cuda 9.0 amd python 3.6: 
>  import pycuda.autoinit
> import pycuda.driver as drv
> import numpyfrom pycuda.compiler import SourceModule
> mod = SourceModule("""
> __global__ void multiply_them(float *dest, float *a, float *b)
> {
>   const int i = threadIdx.x;
>   dest[i] = a[i] * b[i];
> }
> """)multiply_them = mod.get_function("multiply_them")a = 
> numpy.random.randn(400).astype(numpy.float32)
> b = numpy.random.randn(400).astype(numpy.float32)dest = numpy.zeros_like(a)
> multiply_them(
>     drv.Out(dest), drv.In(a), drv.In(b),
>     block=(400,1,1), grid=(1,1))c= dest-a*b
> print (c)
> Unfortunately, I get the following error:
> (C:\ProgramData\Miniconda3) C:\SciSoft>python hello_gpu.py
> Traceback (most recent call last):
>   File "hello_gpu.py", line 1, in 
>     import pycuda.autoinit
>   File "C:\ProgramData\Miniconda3\lib\site-packages\pycuda\autoinit.py", line 
> 2, in 
>     import pycuda.driver as cuda
>   File "C:\ProgramData\Miniconda3\lib\site-packages\pycuda\driver.py", line 
> 5, in 
>     from pycuda._driver import *  # noqa
> ImportError: DLL load failed: The specified module could not be found.
>
> Please advise.Arnold
>
>   
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> https://lists.tiker.net/listinfo/pycuda


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] 3D rotation on PyCuda

2017-08-25 Thread Andreas Kloeckner

ghum  writes:
> copy(aligned=True)
>
> return ary
>
> Is there another way to generate a 3D pycuda texture? Or maybe I am close to
> fix the issue, for now I am getting the following error:
>
> Boost.Python.ArgumentError: Python argument types in
> Memcpy3D.__call__(Memcpy3D)
> did not match C++ signature:
> __call__(struct pycuda::memcpy_3d {lvalue}, class pycuda::stream)
> __call__(struct pycuda::memcpy_3d {lvalue})

I think your main issue might be that Memcpy3D doesn't support the
'aligned' kwarg.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Pycuda installation with cuda-9.0

2017-08-09 Thread Andreas Kloeckner

Baskaran,

Baskaran Sankaran  writes:
> I am having hard time installing PyCUDA with cuda-9.0 on rhel7.

Thanks for the report. CUDA 9 support was added to git a while back but
was not yet part of a release. I've just released 2017.1.1 that should
address (at least some of) the issues you ran into.

Best,
Andreas


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCUDA installation troubles

2017-07-14 Thread Andreas Kloeckner

"Burdge, Kevin B."  writes:

> Hi everyone,
>
>
> I've been struggling to get PyCUDA up and running on my new machine (Ubuntu 
> 16.04, using latest python 3 anaconda distro as python). I can get the 
> configure and install to run without any hiccups, and have sorted out the 
> path to the CUDA directory, but I inevitably keep arriving at the following 
> error when I try and run the test script:
>
> python test_driver.pyTraceback (most recent call last):
>   File "test_driver.py", line 6, in 
> from pycuda.tools import mark_cuda_test, dtype_to_ctype
>   File "/home/kburdge/anaconda3/lib/python3.6/site-packages/pycuda/tools.py", 
> line 34, in 
> import pycuda.driver as cuda
>   File 
> "/home/kburdge/anaconda3/lib/python3.6/site-packages/pycuda/driver.py", line 
> 5, in 
> from pycuda._driver import *  # noqa
> ImportError: 
> /home/kburdge/anaconda3/lib/python3.6/site-packages/pycuda/_driver.cpython-36m-x86_64-linux-gnu.so:
>  undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev
>
> Any pointers on how to resolve this would be immensely appreciated.

https://lists.tiker.net/pipermail/pycuda/2017-January/004993.html

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Windows <--> Linux interchangeable code?

2017-06-21 Thread Andreas Kloeckner

Benedikt,

Benedikt Kopp  writes:
> I'm having a few problems with pycuda in combination with Ubuntu 16.04 and
> cuda 8.0.
>
> I've tried to make a minimal working error-example that I run on both
> Windows and Linux:

You may notice that you're passing a numpy.int32 for the block
shape. It's quite possible that that's permissible in Linux and not
permissible in Windows, based on the versions of Boost Python and Numpy
that are being used. Not passing a numpy integer (just a plain Python
int) should work.

Regarding the list, there's still a (non-searchable, unfortunately)
archive of the list here:

https://lists.tiker.net/listinfo/pycuda

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Handle Error in CUDA Kernel

2017-06-20 Thread Andreas Kloeckner

"Rana, Sanjay"  writes:
> Are there any examples out there on ways to catch and handle errors in the 
> CUDA Kernel code ?
> I have seen examples for CUDA programming in C/C++ but not so many for pycuda.

Could you point us towards those examples? That would make it easier to
understand what you mean by "catch and handle errors". In addition,
please describe what happens when you do the equivalent thing in PyCUDA.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] c1xx : fatal error C1083: Cannot open source file: 'kernel.cu': No such file or directory

2017-06-07 Thread Andreas Kloeckner

Hi Sanjay,

"Rana, Sanjay"  writes:
> Yeah. Also note that the AutoRun registry setting (i.e. the default
> folder that opens when cmd is run) also causes problem even building
> the C++(?) CUDA samples in the Microsoft Visual Studio IDE.
>
> I think it's quite reasonable to have a non-default setting for cmd
> startup folder so perhaps you could let NVIDIA devs know of this minor
> issue.

I unfortunately don't have a more direct line to nvidia than anyone
else--just post to their forums.

https://devtalk.nvidia.com/

Andreas


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] c1xx : fatal error C1083: Cannot open source file: 'kernel.cu': No such file or directory

2017-06-07 Thread Andreas Kloeckner

Hi Sanjay,

"Rana, Sanjay"  writes:
> After several days of frustatation, I managed to get it working and
> the reason, which I suspected before while reading through the verbose
> log, but didn't think could be as simple as that and thus discarded,
> turns out to be just to do with the non-default value I had set in the
> registry for the "AutoRun" value of the "Command Processor" entry.
>
> Apparently, nvcc only works with the default windows value for AutoRun
> i.e. "c:\users\.." ! Surely, a potential item for NVIDIA to
> enhance/eliminate?

Good to hear that you solved your problem, and thanks for following up
and letting the group know what the issue was.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] c1xx : fatal error C1083: Cannot open source file: 'kernel.cu': No such file or directory

2017-06-05 Thread Andreas Kloeckner

"Rana, Sanjay"  writes:
> Thanks Andreas for the response. How could I check whether nvcc is work the 
> same way on both computers?  

Compile some sample cuda code.

> On the problematic installation, I did try running the nvcc from a command 
> prompt. I only get the text "kernel.cu" as response when I had tried to run 
> the following command: 
>
> nvcc --cubin -v -arch sm_52 -m64 
> -Ic:\python2713\lib\site-packages\pycuda\cuda 
> c:\users\crome\appdata\local\temp\tmp4gztsd\kernel.cu
>
> There was no error message reported but perhaps one should have been produced?

That sounds odd. Can you determine which nvcc is being picked up on the
PATH, or if you have multiple nvcc versions installed?

Andreas


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] c1xx : fatal error C1083: Cannot open source file: 'kernel.cu': No such file or directory

2017-06-05 Thread Andreas Kloeckner

"Rana, Sanjay"  writes:

> Hi Everyone,
>
> I have an identical set up of pycuda on two computers as follow  :
>
> Windows 10 64bit
> CUDA 8.0
> Pycuda 2017.1+cuda8061-cp27-cp27m-amd64.whl
> Visual Studio 2013 Community
>
> This works perfectly on one of the computers but on the other computer, 
> pycuda fails and reports the issue "c1xx : fatal error C1083: Cannot open 
> source file: 'kernel.cu': No such file or directory"
>
> I have attached the script, script run log, and nvcc.profile.
>
> Is there something I am missing in the problematic installation? Or is the 
> issue somewhere else e.g. windows config, some other interfering application 
> e.g. antivirus etc.
>
> I have ran the NVIDIA's CUDA sample applications and they work just fine.

Does nvcc work the same way on either computer? You could also try
building this branch:

https://gitlab.tiker.net/inducer/pycuda/merge_requests/3

That gets by without needing to call nvcc, which may well solve your
problem.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Pycuda install: gcc: error: /EHsc: No such file or directory

2017-03-27 Thread Andreas Kloeckner

张鲁宁  writes:

> Hello!
>  Excuse me,I have met an error when i was installing the pycuda , i tried 
> many times but it didnt work,i already have compiled boost(by visual studio 
> 2010), and installed the cuda 8.0 .Here is the error traceback and my 
> siteconf.py.
>
>
>
> C:\Users\Administrator\Desktop\work\pycuda-master>python setup.py install
> ---
> The shipped Boost library was not found, but USE_SHIPPED_BOOST is True.
> (The files should be under bpl-subset/bpl_subset/.)
> ---
> If you got this package from git, you probably want to do
>
>
>  $ git submodule update --init
>
>
> to fetch what you are presently missing. If you got this from
> a distributed package on the net, that package is broken and
> should be fixed. For now, I will turn off 'USE_SHIPPED_BOOST'
> to try and see if the build succeeds that way, but in the long
> run you might want to either get the missing bits or turn
> 'USE_SHIPPED_BOOST' off.
> ---
> Continuing in 1 seconds...
> running install
> running bdist_egg
> running egg_info
> writing requirements to pycuda.egg-info\requires.txt
> writing pycuda.egg-info\PKG-INFO
> writing top-level names to pycuda.egg-info\top_level.txt
> writing dependency_links to pycuda.egg-info\dependency_links.txt
> package init file 'pycuda\compyte\__init__.py' not found (or not a regular 
> file)
>
>
> reading manifest file 'pycuda.egg-info\SOURCES.txt'
> reading manifest template 'MANIFEST.in'
> warning: no files found matching 'doc\source\_static\*.css'
> warning: no files found matching 'doc\source\_templates\*.html'
> warning: no files found matching '*.h' under directory 
> 'bpl-subset\bpl_subset\bo
> ost'
> warning: no files found matching '*.hpp' under directory 
> 'bpl-subset\bpl_subset\
> boost'
> warning: no files found matching '*.cpp' under directory 
> 'bpl-subset\bpl_subset\
> boost'
> warning: no files found matching '*.html' under directory 
> 'bpl-subset\bpl_subset
> \boost'
> warning: no files found matching '*.inl' under directory 
> 'bpl-subset\bpl_subset\
> boost'
> warning: no files found matching '*.ipp' under directory 
> 'bpl-subset\bpl_subset\
> boost'
> warning: no files found matching '*.pl' under directory 
> 'bpl-subset\bpl_subset\b
> oost'
> warning: no files found matching '*.txt' under directory 
> 'bpl-subset\bpl_subset\
> boost'
> warning: no files found matching '*.h' under directory 
> 'bpl-subset\bpl_subset\li
> bs'
> warning: no files found matching '*.hpp' under directory 
> 'bpl-subset\bpl_subset\
> libs'
> warning: no files found matching '*.cpp' under directory 
> 'bpl-subset\bpl_subset\
> libs'
> warning: no files found matching '*.html' under directory 
> 'bpl-subset\bpl_subset
> \libs'
> warning: no files found matching '*.inl' under directory 
> 'bpl-subset\bpl_subset\
> libs'
> warning: no files found matching '*.ipp' under directory 
> 'bpl-subset\bpl_subset\
> libs'
> warning: no files found matching '*.pl' under directory 
> 'bpl-subset\bpl_subset\l
> ibs'
> warning: no files found matching '*.txt' under directory 
> 'bpl-subset\bpl_subset\
> libs'
> writing manifest file 'pycuda.egg-info\SOURCES.txt'
> installing library code to build\bdist.win-amd64\egg
> running install_lib
> running build_py
> running build_ext
> building '_driver' extension
> C:\ProgramData\Anaconda2\Scripts\gcc.bat -DMS_WIN64 -mdll -O -Wall 
> -DPYGPU_PYCUD
> A=1 -DPYGPU_PACKAGE=pycuda -DHAVE_CURAND=1 -Isrc/cpp -IE:\boost_1_63_0 
> "-IC:\Pro
> gram Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" 
> -IC:\ProgramData\Anac
> onda2\lib\site-packages\numpy\core\include -IC:\ProgramData\Anaconda2\include 
> -I
> C:\ProgramData\Anaconda2\PC -c src/cpp/cuda.cpp -o 
> build\temp.win-amd64-2.7\Rele
> ase\src\cpp\cuda.o /EHsc
> gcc.exe: error: /EHsc: No such file or directory
> error: command 'C:\\ProgramData\\Anaconda2\\Scripts\\gcc.bat' failed with 
> exit s
> tatus 1
>
>
> BOOST_INC_DIR = [r'E:\boost_1_63_0']
> BOOST_LIB_DIR = [r'E:\boost_1_63_0\stage\lib']
> BOOST_COMPILER = 'msvc'
> BOOST_PYTHON_LIBNAME = ['libboost_python-vc100-mt-1_63']
> BOOST_THREAD_LIBNAME = ['libboost_thread-vc100-mt-1_63']
> CUDA_TRACE = False
> CUDA_ENABLE_GL = False
> CUDADRV_LIB_DIR = ['C:\Program Files\NVIDIA GPU Computing 
> Toolkit\CUDA\v8.0\lib\x64']
> CUDADRV_LIBNAME = ['cuda']
> CXXFLAGS = ['/EHsc']
> LDFLAGS = ['/FORCE']
>
> Could you offer me some help? I am a rookie in this area, Thank you advance.


/EHsc is an option for Visual Studio. You're using gcc to compile
PyCUDA. Not likely to work.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Installation on Ubuntu 14.04

2017-02-28 Thread Andreas Kloeckner

Guillaume Androz  writes:

> Hi,
> I'm trying to install pyCuda 2016.1.2 on Mint 17.3 (same as Ubuntu 14.04),
> I follow the procedure found at
> https://wiki.tiker.net/PyCuda/Installation/Linux/Ubuntu, but I keep having
> the same error
>
> In file included from src/cpp/cuda.cpp:1:0:
> src/cpp/cuda.hpp:14:18: fatal error: cuda.h: No such file or directory
>  #include 
>   ^
> compilation terminated.
> error: command 'gcc' failed with exit status 1
> make: *** [all] Error 1

Can you find cuda.h anywhere on your system?

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] CUDA driver compute mode

2017-02-24 Thread Andreas Kloeckner

Kambiz Tavabi  writes:

> Following TB from test_driver.py with 2016.1.2 build on OS X 10.11 with
> CUDA 8  V8.0.61; was able to workaround by changing
>
> if drv.Context.get_device().compute_mode == drv.compute_mode.*EXCLUSIVE*:
>
> to
>
> if drv.Context.get_device().compute_mode == drv.compute_mode.
> *EXCLUSIVE_PROCESS*:
>
> in test_dirver.py
>
> Is this expected?

Fixed in git, but not yet released:

https://github.com/inducer/pycuda/commit/255644ad802a20191e31bc15f4fd46e6c9d6e38a

Thanks for the report,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] [PyOpenCL] New architectures for PyOpenCL in Debian

2017-02-11 Thread Andreas Kloeckner

Tomasz Rybak <tomasz.ry...@post.pl> writes:

> On Mon, 2016-11-21 at 22:46 -0600, Andreas Kloeckner wrote:
> [ cut ]
>> At the same time I have a question for you Andreas. There will be
>> > freeze of Stretch on 2017-01-05. I’d like to upload PyOpenCL and
>> > PyCUDA
>> > in mid-December, and those versions will be versions for Stretch.
>> > Do
>> > you plan new releases, or should I just take snapshots from git? 
>> 
>> I'll do a 2016.2.1 mid-December.
>> 
>
> Quick update.
>
> PyOpenCL from 2016-11-30 (commit
> 19015994653dffe2ee407271e19a46e1d6a62796) is in Debian testing,
> for architectures amd64, arm64, armhf, i386, and amd64 and i386
> for kFreeBSD. There is no release-critical errors against it
> which means that this version of PyOpenCL should be in Debian Stretch.
>
> As for PyCUDA, the version in Debian is from 2016-10-24,
> commit 50457813cfe3eb359a230c4e1e546ccdef9947f8.
> It's only available for amd64. I tried to build package
> for ppc64el, but wasn't able to properly configure cross-compiler.
>
> As Debian is in deep freeze now, I won't upload new versions
> of packages Stretch release. Using this time I'll try to
> make ppc64el PyCUDA - but it's not something I'll focus on entirely.
>
> If you have questions or suggestions regarding packages,
> please let me know.

That sounds great. Thank you for continuing to do an excellent job
maintaining PyOpenCL and PyCUDA in Debian!

Andreas


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] undefined symbol error

2017-01-29 Thread Andreas Kloeckner

Chris  writes:

> Hey Andreas,
> I am having a similar issue that Kambiz Tavabi was having. Here is the error
> (I have pycuda on anaconda2 just like Kambiz)
>
>  File "main_class.py", line 17, in 
> import pycuda.gl as cuda_gl
>  File
> "/home/uchytilc/anaconda2/lib/python2.7/site-packages/pycuda-2016.1.2-py2.7-linux-x86_64.egg/pycuda/gl/__init__.py",
> line 2, in 
> import pycuda._driver as _drv
> ImportError:
> /home/uchytilc/anaconda2/lib/python2.7/site-packages/pycuda-2016.1.2-py2.7-linux-x86_64.egg/pycuda/_driver.so:
> undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev
>
> The weird thing is that I have some files that are having this problem and
> some that are not, with identical import statements.
>
> import sys
>
> from OpenGL.GL import *
> from OpenGL.GLUT import *
> from OpenGL.GLU import *
> from OpenGL.GL.ARB.vertex_buffer_object import *
> from OpenGL.GL.ARB.pixel_buffer_object import *
>
> import pycuda.gl as cuda_gl
> import pycuda.driver as cuda_driver
>
> The files that are experiencing this issue are all files that run this
> import statement right at the top in the main.py file, the one I initialize
> from. The one that doesn't hit this error has the import statements in a
> second file that is being imported with an __init__.py file. Not sure if
> this is pure coincidence or not but I figured it was worth bringing up.

Sorry for the delay in responding. GCC 5.1 switched its C++ ABI with version
5. I suspect some of the binary packages you have installed are using
the old ABI (Anaconda is built on CentOS 6 I believe and would be using
the old ABI), while C++ software you built on your machine (PyCUDA) will
use the new ABI. See here for more info:

https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html

Try adding (in PyCUDA's siteconf.py)

-D_GLIBCXX_USE_CXX11_ABI=0

to CXXFLAGS.

Hope that helps,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] pipwin error with runpy.py

2017-01-29 Thread Andreas Kloeckner

"Slein, Ryan"  writes:
> I've spent a few days digging through forums without any luck so I figured 
> I'd post to the mailing list as per request of the forums. Any advice would 
> be greatly appreciated. I am new to python and cuda, and entirely 
> self-taught, so sorry in advanced for any gaps of knowledge on my part.
>
> I am trying to install pycuda but am getting hung up on one of the last 
> install steps, pipwin install pycuda. I keep getting an error in runpy.py 
> when trying to install the software through anaconda. BeautifulSoup tells me 
> it is an lxml file error (see attached cmd) in anaconda\lib\runpy.py for both 
> Python/Anaconda 2 and Python/Anaconda 3 (see attached code: Line 184 for 
> anaconda3 and Line 174 for anaconda2). I've tried to do what BeautifulSoup 
> stated but haven't succeeded, hopefully the solution is obvious to someone 
> with much more experience than myself.
>
> I've also tried building from means other than conda with no luck. I
> am open to try other build packages if you don't know of any conda
> work arounds. Some computer specs: Windows 10 Pro V1607 OS Build
> 14393.693 & Command Prompt Version 10.0.14393. Please let me know if
> you need any further information of my system.

The message you show appears to be a non-fatal warning and unrelated to
PyCUDA, which does not use BeatifulSoup. If you show a relevant error
message (bonus points for text format), we may be in a better position
to help.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Autoinit failing after driver update

2017-01-25 Thread Andreas Kloeckner

Josh Willis  writes:

> Hi,
>
> After updating the NVIDIA driver from 367.48 to 375.26, I can no longer get 
> PyCUDA to run.  I have tried a fresh build of PyCUDA-2016.1.2, and the 
> configure/make/make install steps seem to proceed fine. However if I do:
>
> $ python
> Python 2.7.5 (default, Nov  3 2016, 22:05:29) 
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
 import pycuda
 import pycuda.autoinit
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/jwillis/envs/er10/lib/python2.7/site-packages/pycuda-2016.1.2-py2.7-linux-x86_64.egg/pycuda/autoinit.py",
>  line 5, in 
> cuda.init()
> pycuda._driver.Error: cuInit failed: unknown error
 
>
> If I look to make sure that kernel modules are loaded, I see the following 
> (though I’m not sure what I *should* see, this just seemed to be a common 
> source of this kind of problem after an upgrade):
>
> $ lsmod  | grep nvi
> nvidia  11944366  0 
> i2c_core   40756  7 
> ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
>
> Does anyone have any suggestions on what to try next in debugging the source 
> of this error?  I can compile a “hello world” kernel directly with nvcc and 
> run it with no problem.

Check the output of 'dmesg'. You may need to reboot.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Elementwise operations on noncontiguous arrays

2016-12-04 Thread Andreas Kloeckner

Keegan Owsley  writes:
> Something that I don't think I made clear before: the kernels generated by
> get_elwise_module_noncontig are modified using regular expressions, so that
> you don't need to change your code downstream to get strided array support.
> I'm not convinced yet that this is the best approach, but it works okay so
> far.

To make it easier to see your changes and comment on them (such as
exactly how robust the Regex stuff is), could you put this up as a pull
request here?

https://gitlab.tiker.net/inducer/pycuda

This will automatically run tests, so it's easier for me to handle.

Thanks!
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Unable to free shared memory array after pagelocking with register_host_memory

2016-12-01 Thread Andreas Kloeckner

Jaroslaw,

Jaroslaw Blusewicz  writes:
> I'm using numpy-sharedmem 
> to allocate shared memory array across multiple cpu processes. However,
> after page locking it with register_host_memory, the shared memory is never
> cleared at exit. Below is a minimal example of this behavior on Ubuntu
> 16.04, python 2.7.12  and pycuda 2016.1.2:
>
> import sharedmem
> import numpy as np
> from pycuda import autoinit
> import pycuda.driver as driver
>
> arr = sharedmem.zeros(10 ** 8, dtype=np.float32)
> arr = driver.register_host_memory(arr,
> flags=driver.mem_host_register_flags.DEVICEMAP)
>
> At exit, this shared memory array is not cleared. Unregistering the
> pagelocked memory beforehand doesn't work either.
>
> Also, I noticed that RegisteredHostMemory instance in arr.base, which
> according to the documentation
> 
> should have base attribute containing the original array, doesn't actually
> have it.
>
> Is there a manual way of clearing this shared memory in pycuda that I'm
> missing?

I'm honestly not sure that pagelocked and SysV shared memory have a
defined interaction, i.e. I don't even know what's supposed to
happen. And at any rate, for what you're doing, you're just getting the
behavior of the CUDA API--I'm not sure PyCUDA could help or hurt in your
case.

tl;dr: Ask someone at Nvidia if this supposed to work, and if it is, and
if PyCUDA breaks it, I'm happy to try and help fix it.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Elementwise operations on noncontiguous arrays

2016-11-30 Thread Andreas Kloeckner

Keegan,

Keegan Owsley  writes:
> I've just slapped together a patch to pycuda that makes most elementwise
> operations work with noncontiguous arrays. There are a bunch of hacks in
> there, and the code needs some reorg before it's ready to be considered for
> upstream (I made these changes while learning the pycuda codebase, so
> there's a bunch of crud that can be cleaned out), but I figure I might as
> well put it out there in its current state and see what you guys think.
> It's also not extremely well-tested (I have no idea if it interferes with
> skcuda, for example), but all of the main functions appear to work.
>
> You can check out the code at https://bitbucket.org/owsleyk_omega/pycuda.
>
> Briefly, this works by adding new parameters into elementwise kernels that
> describe the stride and shape of your arrays, then using a function that
> computes the location in memory from the stride, shape, and index.
> Elementwise kernel ops are modified so that they use the proper indexing.
> See an example of a kernel that's generated below:

Thanks for putting this together and sharing it! I have one main
question about this, regarding performance:

Modulo (especially variable-denominator modulo) has a habit of being
fantastically slow on GPUs. Could you time contiguous
vs. noncontiguous for various levels of "gappiness" and number of
axes? I'm asking this because I'd be OK with a 50% slowdown, but not
necessarily a factor of 5 slowdown on actual GPU hardware.

Thanks!
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Tests fail with ImportError _driver.so: undefined symbol

2016-10-30 Thread Andreas Kloeckner

Kambiz Tavabi  writes:

> I am at a complete loss. I did a fresh reinstall of the OS (ubuntu 16.04)
> and the first thing I did was:
>
>
>- apt update; sudo apt upgrade
>- apt-get install nvidia-cuda-toolkit nvidia-361 nvidia-modprobe
>- Build install pycuda as before; nosetests failed with same error
>- pip uninstall pycuda
>- pip install pycuda
>- python -c "from pycuda import gpuarray, driver" > Same import error
 >
> I wonder if I am hitting this issue
>  on the installation?

If you're seeing an error message that says

unsupported GNU version! gcc versions later than 4.9 are not supported

then yes, that's your problem. In general, if you don't quote (or
mis-quote, as you did) your error message, don't be surprised if nobody
can or wants to help you.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Tests fail with ImportError _driver.so: undefined symbol

2016-10-28 Thread Andreas Kloeckner

Kambiz Tavabi  writes:

> Hi
>
> I am Trying to get packages including pycuda-2016.1.2 working in a python
> 2.7 (Anaconda) environment. I am Running Ubuntu 1604 with working nvidia
> driver and CUDA 8.
> I installed pycuda via
>
>  $ git clone http://git.tiker.net/trees/pycuda.git
>  $ cd pycuda
>  $ ./configure.py --cuda-enable-gl
>  $ git submodule update --init
>  $ make -j 4
>  $ python setup.py install
>
> nosetests fail with ImportErrors referencing
> ...anaconda2/lib/python2.7/site-packages/pycuda-2016.1.2-py2.7-linux-x86_64.egg/pycuda/_driver.so:
> undefined symbol: gibbirish

Did it literally say 'gibbirish'? Because that symbol is the one thing
that might help me figure out what happened.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Stopping Criterion in for loops

2016-10-17 Thread Andreas Kloeckner

slegrand  writes:

> Hello everybody,
>
> I'm currently using pycuda and scikit-cuda to parallelize a simple code. 
> Basically I repeat this structure inside a for loop:
>
> 1-matrix/vector product (cublas.cublasDgemv)
>
> 2-elementwise division(cumisc.divide)
>
> 3-matrix/vector product
>
> 4-elementwise division
>
> 5-Error calculation
>
> and I leave the loop when the error is small enough (You can see the 
> code at the end of the mail). I want to calculate the error on the GPU 
> and check with a if condition if it's small enough before breaking the 
> loop. error_dev and error_min_dev are both (1,) array but when I try to 
> compare them in the if condition, I get the following error:
>
> File "./lib/Solvers/IPFP_GPU/functionsGPU.py", line 109, in 
> solve_IPFP_simple_gpu
>  if(error_dev < error_min_dev):
> TypeError: an integer is required
>
> and if I try to access to the only element of these arrays:
>
> File "./lib/Solvers/IPFP_GPU/functionsGPU.py", line 129, in 
> solve_IPFP_simple_gpu
>  if(error_dev[0] < error_min_dev[0]):
>File 
> "/home/slegrand/miniconda/lib/python2.7/site-packages/pycuda/gpuarray.py", 
> line 838, in __getitem__
>  array_shape = self.shape[array_axis]
> IndexError: tuple index out of range
>
> The only solution I found was to use the get_async() and to compare both 
> arrays on the CPU but I guess this is not the best solution... I 
> wondered if there is a way to compare these arrays without sending them 
> back to the CPU.
>
> On the other hand, I wondered how is controlled the for loop. How are 
> the iterations synchronized with the GPU calculations?

This code does something similar:

https://github.com/inducer/pycuda/blob/master/pycuda/sparse/cg.py

Looking through that may be helpful.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Pixel Buffer Object Mapping Pointer

2016-10-15 Thread Andreas Kloeckner

Chris  writes:

> I know that PyCUDA runs on the driver API, so it might be a little different
> but I am looking for the equivalent of something like this
> cudaGraphicsResourceGetMappedPointer(). I am manipulating an array in CUDA
> and need to point the PBO to it so that it can be used in Interop between
> OpenGL and PyCUDA. It looked like pycuda.gl.RegisteredMapping might be what
> I need but I am unsure.

This may help:
https://wiki.tiker.net/PyCuda/Examples/GlInterop

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] pycuda ImportError: DLL load failed: The specified module could not be found.

2016-10-13 Thread Andreas Kloeckner

Daniel Gebreiter  writes:
> Andreas,thanks for the quick response. Here's the link to the gist, as
> per your
> request:https://gist.github.com/anonymous/204d33ca84a211b2323fa9d8886d0371I
> hope this works and helps resolving the issue!Thanks,Daniel

This linker errors scream "compiler bug" to me, because it's an
undefined reference to a function that's actually defined. Are you able
to try a different version of the compiler? On the other hand, these
errors may be a red herring.

Could you try Dependency Walker [1] on the pycuda module (_driver.pyd or
so) to see what's actually going on, and what module (if any) is
actually missing?

Andreas

[1] https://en.wikipedia.org/wiki/Dependency_Walker

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCuda - Bindless Textures

2016-10-01 Thread Andreas Kloeckner

Chris  writes:
> I figure this will take a long time to implement, is there anywhere that
> displays the tentative additions to PyCUDA so I can keep myself updated on
> when this might be implemented?

It's really not like I've got a schedule of these things. "When I need
it" or "when somebody submits a patch" are really the best answers I can
give. :)

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCuda - Bindless Textures

2016-09-21 Thread Andreas Kloeckner

Chris  writes:
> I saw that there was a post about this in 2014 and I can't find anything
> about whether bindless textures were supported yet.

IIRC this hasn't happened yet. I'd be happy to take a patch though.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] PyCUDA rng question

2016-09-21 Thread Andreas Kloeckner

Hi Peter,

Please send messages like this to the mailing list in the
future. There's some drama going on with Gmane, but it should be back up
at some point. In the meantime, Mailman and the conventional archives
are still available.

Peter Walsh  writes:
> I have a question regarding cuda random number generation. This isn't
> appropriate for github I think. Also, FYI I tried searching the mailing
> list for details, but the searchable archive seems to be down:
> http://news.gmane.org/gmane.comp.python.cuda
>
> My question is thus:
> How do I generate random numbers in my own kernel (called from pycuda)? I
> am trying to build a montecarlo simulation - and passing in an "entropy
> pool" from the host is not acceptable.

Random123 is available in PyOpenCL. To my mind, that's the right
approach to parallel RNG. Documentation here:

https://documen.tician.de/pyopencl/array.html#module-pyopencl.clrandom

Hope that helps,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Questions about PyCUDA from a former CS450 student

2016-08-29 Thread Andreas Kloeckner

Yiming Peng  writes:

> Hi Andreas,
>
> I am a former student of your CS 450 and now I am a incoming PhD student in
> operations research at Northwestern.
>
> Since I am interested in applying parallel computing, preferably using
> Python, to my future research, I have been looking for software which
> combines Python with CUDA. Then I found PyCUDA on your website. And I found
> NumbaPro. It seems that these two are the most popular choices for people
> with needs like mine.
>
> So my question is: which one do I begin to learn and use first? Could you
> give some comments on pros and cons about the two?

Cc'ing the PyCUDA list for archival/searchability.

- PyCUDA lets you/forces you to write CUDA C for your kernels.

- Numba lets you write (a narrow subset of) Python for your kernels,
  including arrays I believe.

- The code you write for both will be roughly equivalent modulo
  spelling, since you'll have to 

- PyCUDA exposes (nearly) the entire CUDA runtime, including streams,
  profiling, textures, ... Numba is more restricted.

- PyCUDA comes with an on-device array type. I'm not sure if Numba's
  arrays stay on-device after the computation finishes--i.e. you may
  have some implicit copying.

- PyCUDA comes with some pre-made parallel algorithms such as scans
  and reductions.

- You may also want to take a look at

  - https://documen.tician.de/pyopencl/
  - https://documen.tician.de/loopy/

Hope that helps,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Pycuda multiple gpus

2016-08-24 Thread Andreas Kloeckner

Irving Enrique Reyna Nolasco 
writes:

>   I am a student in physics. I am pretty new
>   in pycuda. Currently I am interesting in  finit volume methods running on
>   multiple GPUS in a single node. I have not found  relevant  documentation
>   related to this issue, specifically how to communicate different contexts
>   or how to run the same kernel on different devices at the same time.
> Would you   suggest me some literature/documentation about that?

I think the common approach is to have multiple (CPU) threads and have
each thread manage one GPU. Less common (but also possible, if
cumbersome) is to only use one thread and switch contexts. (FWIW,
(Py)OpenCL makes it much easier to talk to multiple devices from a
single thread.)

Lastly, if you're thinking of scaling up, you could just have one MPI
rank per device.

Hope that helps,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] New to OpenGL Interoperability

2016-08-15 Thread Andreas Kloeckner

Hi Chris,

Not sure what you're asking. The code you show doesn't apply--it uses
the 'runtime API' (cudaXyz...), PyCUDA uses the 'driver API'
(cuXyz...). And the piece of Peter's example that worries about
exchanging data with PyCUDA (lines 162-192) is about the same in
complexity as what you're showing.

Andreas

Chris Uchytil  writes:
> I am brand new to CUDA and OpenGL and I have found that tutorials and
> resources on a lot of this material is rather scares or not straight
> forward so I am hoping I can get some assistance here. I am working on a
> project attempting to convert some CUDA and OpenGL C++ code over to Python.
> The code is a basic Kernal that computes distance from a point (To emulate
> light on a wall from a flashlight) and sends the calculated array to OpenGL
> to display the light intensity. You can move your mouse/"Flashlight" around
> to move the light around on the screen. I have been successful in
> converting the Kernal code over to Python using the Numba python package.
> What I am having trouble with is the Open GL Interoperability stuff. I
> can't really find an info that describes the process of interop in a simple
> fashion so I'm not really even sure what the setup process is. It sounds
> like you need to create something called a pixel buffer and send that to
> the kernal. From what I can tell the C++ code uses this simple function to
> do this.
>
> // texture and pixel objects
> GLuint pbo = 0;
> GLuint tex = 0
> struct cudaGraphicsResource * cuda_pbo_resource;
>
> void render() {
> unchar4 *d_out = 0;
> cudaGraphicsMapResources(1, _pbo_resource, 0);
> cudaGraphicsResourceGetMappedPointer((void**)_out, NULL,
> cuda_pbo_resource);
> kernelLauncher(d_out, W, H, loc);
> cudaGraphicsUnmapResources(1, _pbo_resource, 0);
> }
>
>
> I can't find any info that describes the python equivalent of
> cudaGraphicsMapResources, cudaGraphicsResourceGetMappedPointer, and
> cudaGraphicsUnmapResources. I've found a GL interop example by  by Peter
> Berrington (https://wiki.tiker.net/PyCuda/Examples/GlInterop) but it seems
> to me to be overly complicated in how it creates PBO's and textures and
> such when compared to the C++ code.
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> https://lists.tiker.net/listinfo/pycuda


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Dynamic Parallelism

2016-07-27 Thread Andreas Kloeckner

Hi Eric,

Sorry about the long delay in getting back to you.

Eric Scheffel  writes:
> I am trying to use Pycuda with a device kernel which recursively calls
> itself via dynamic parallelism. I do this with a 750Ti so it should be
> supported. I have also done some research on how to alter the command
> option list in the kernel source pycuda compile command. But I am
> still getting the error:
>
>
> "cuModuleDataEx  failed: named symbol not found - ".

Discussions regarding dynamic parallelism have been going on here:

https://github.com/inducer/pycuda/issues/45

The basic upshot appears to be that we need to switch to a different
binary format (ELF) and use cuLink{Create,AddData,Complete} to load the
module. I won't have time to work on this myself, but I'd be happy to
review patches.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Problems building PyCUDA documentation

2016-07-01 Thread Andreas Kloeckner

Tomasz Rybak  writes:
> Would you apply this patch in repository??
> Debian has policy that python means Python 3, and python3 is for Python 3 
> interpreter.
> If not - that's not a problem, now it is automatically applied during 
> building of Debian packages.

Done.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Problems building PyCUDA documentation

2016-07-01 Thread Andreas Kloeckner

Tomasz Rybak  writes:
> I was trying to rebuild PyCUDA 2016.1.1 (as tagged on GitHub) package using 
> CUDA 7.5 and Sphinx 1.4.4.
> There was warning regarding http to https redirect on documen.tician.de, 
> patch below:
> Index: pycuda-2016.1.1/doc/source/conf.py
> ===
> --- pycuda-2016.1.1.orig/doc/source/conf.py
> +++ pycuda-2016.1.1/doc/source/conf.py
> @@ -187,5 +187,5 @@ latex_documents = [
>  intersphinx_mapping = {
>  'http://docs.python.org/dev': None,
>  'http://docs.scipy.org/doc/numpy/': None,
> -'http://documen.tician.de/codepy/': None,
> +'https://documen.tician.de/codepy/': None,
>  }

Thanks for the patch, I've applied that.

> This is caused by doc/source/conf.py, html_theme_options, which contains 
> Unicode characters for floppy and rocket. I did not have any problems with 
> PyOpenCL, probably because 2016.1 does not contain those glyphs.
>
> Andreas - how have you built documentation for your web page? Is there some 
> special option for Sphinx I should use?

Regarding the Sphinx build problem: I can reproduce this on Python 2,
but I've been using Python 3, where the issue does not occur. I'm not
entirely sure what the issue is--the traceback is somewhere deep in the
weeds.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] [Support Request] How to pass a list of lists to a Py CUDA Kernel?

2016-05-11 Thread Andreas Kloeckner

Hi Frank,

Frank Ihle  writes:
> I try to speed up my Python program with a not so trivial algorithm, so 
> I need to know. What is the correct way of transferring a list of lists 
> of floats to the (Py)CUDA Kernel?

Nested, variable-sized structures are generally tricky to map onto
array-shape hardware. You'll likely want to store your data in a
CSR-like data structure:

https://en.wikipedia.org/wiki/Sparse_matrix

Scans (such as the one in PyCUDA) can help significantly with the
resulting index computations.

Hope that helps,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Pycuda Device Detection

2016-03-21 Thread Andreas Kloeckner

AlexG  writes:

> Hello,
>
> I have a question on the cuda capable device detection using pycuda.
> Does the driver.Device.count() function detect each gpu on dual gpu cards
> such as 
> the nvidia K2 or K80 cards as separate?, that is if I have one K2 card
> installed
> does the count() function return 2?

Yes,

> Also is there a device attribute that matches it with the other gpu(s) on
> the same card?

You could probably reverse engineer that from the Bus ID. Although the
enumeration logic does not have documented behavior in this regard, I'd
expect them to enumerate at consecutive IDs.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

[PyCUDA] ARRAY'16: Workshop on Libraries, Languages and Compilers for Array Programming

2016-03-07 Thread Andreas Kloeckner

Hi all,

I imagine that one of the main things that PyOpenCL and PyCUDA get used
for are computations with large arrays. As such, I can imagine that many
of you are sympathetic to the cause of trying to come up with simpler
abstractions that nonetheless yield high-performance code for such
computations.

ARRAY'16 is a workshop that concerns itself with programming language
aspects of computing with arrays, including language design,
compilation, libraries, and performance optimization.  More information
on the workshop (including the full call for papers) may be found here:

http://conf.researchr.org/track/pldi-2016/ARRAY-2016

The workshop will be held June 14 in Santa Barbara, and the deadline for
submissions is April 1.

I would be delighted to see some submissions from the PyCUDA/PyOpenCL
crowd! (Disclaimer: I am on the organizing committee.)

Andreas


___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] pycuda 2015.1.3 test_driver.py fails once

2016-02-26 Thread Andreas Kloeckner

Dorin Niculescu  writes:

> Hi all,
> I have a fresh installed ubuntu 14.04.3 distribution, with cuda 7.0 and 
> pycuda 2015.1.3. I'm using an NVIDIA GTX 960 card and the latest driver 
> 361.28. All the installation went well but when i run test_driver.py i get :
> = test session starts 
> ==
> platform linux2 -- Python 2.7.6, pytest-2.8.7, py-1.4.31, pluggy-0.3.1
> rootdir: /opt/pycuda-2015.1.3, inifile: 
> collected 23 items 
>
> test_driver.py x..
>
> = 22 passed, 1 xfailed in 5.91 seconds 
> =
>
> Can you please help me to understand why the test fails once? 

That's an "xfail", an expected failure.

python -m pytest test_driver.py -v

will tell you what that is.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Installing pycuda on Win 10 notebook

2016-02-11 Thread Andreas Kloeckner

Arnold Tunick  writes:

> 1. Updated to Visual Studio 20132. Updated to CUDA 7.5.183. downloaded pip 
> and pipwin and pycuda-2015.1.3+cuda7518-cp27-none-win_amd64.whl4. easily 
> installed pycuda by running ... >>install 
> C:\SciSoft\WinPython-64bit-2.7.9.4\settings\pipwin\pycuda-2015.1.3+cuda7518-cp27-none-win_amd64.whl5.
>  ran pycuda test: hello_gpu.py and got the following error:            nvcc 
> fatal: nvcc cannot find a supported version of Microsoft Visual Studio. Only 
> the versions 2010, 2012, and 2013 are supported.6. While I upgraded to VS 
> 2013 I see that I still/only have "C:\Program Files (x86)\Common 
> Files\Microsoft\Visual C++ for Python\9.0" which is the C++ compiler 
> supported for Python 2.7.7. What do I need to do to get pycuda test examples 
> to work?Best,Arnold Tunick

Are you running the correct vcvars script?

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] BOOST dependency needed or not?

2016-02-08 Thread Andreas Kloeckner

ra...@blue.alter.pl writes:
>>> Here is error message during pycuda attempt:
>>> 
>>> radek@black:~/pycuda-2015.1.3$ python setup.py build
>>> running build
>>> running build_py
>>> running build_ext
>>> building '_driver' extension
>>> x86_64-linux-gnu-gcc -pthread -fwrapv -Wall -O3 -DNDEBUG
>>> -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong
>>> -Wformat -Werror=format-security -fPIC -DPYGPU_PYCUDA=1
>>> -DPYGPU_PACKAGE=pycuda -DHAVE_CURAND=1 -Isrc/cpp
>>> -I/usr/local/cuda-7.5/include
>>> -I/usr/lib/python2.7/dist-packages/numpy/core/include
>>> -I/usr/include/python2.7 -c src/cpp/cuda.cpp -o
>>> build/temp.linux-x86_64-2.7/src/cpp/cuda.o
>>> In file included from src/cpp/cuda.cpp:1:0:
>>> src/cpp/cuda.hpp:30:32: fatal error: boost/shared_ptr.hpp: No such file
>>> or directory
>>> compilation terminated.
>>> error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Well, if you look in bpl-subset/bpl_subset, there should be all the
pieces of boost there that you need to build. If you're missing those
files, then that's a problem (and could cause the issue you're
seeing). If you got your source using git, did you

git submodule update --init

?

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Not working after update

2016-01-02 Thread Andreas Kloeckner

It's possible that PyCUDA didn't get entirely rebuilt. In the source
directory, do

  rm -Rf build

and then repeat the build. A workaround for the time being is to simply
use the shipped version of boost.

HTH,
Andreas

Robert  writes:
> Oh I posted the wrong error messages, this is the right one:
>
> Traceback (most recent call last):
>File "numpytest.py", line 3, in 
>  import pycuda.autoinit
>File "/usr/lib/python2.7/site-packages/pycuda/autoinit.py", line 9, 
> in 
>  context = make_default_context()
>File "/usr/lib/python2.7/site-packages/pycuda/tools.py", line 199, in 
> make_default_context
>  return ctx_maker(dev)
>File "/usr/lib/python2.7/site-packages/pycuda/tools.py", line 162, in 
> ctx_maker
>  return dev.make_context()
> TypeError: No to_python (by-value) converter found for C++ type: 
> boost::shared_ptr
> ---
> PyCUDA ERROR: The context stack was not empty upon module cleanup.
> ---
> A context was still active when the context stack was being
> cleaned up. At this point in our execution, CUDA may already
> have been deinitialized, so there is no way we can finish
> cleanly. The program will be aborted now.
> Use Context.pop() to avoid this problem.
> ---
> Abgebrochen (Speicherabzug geschrieben)
>
>
> On 02.01.2016 19:15, Robert wrote:
>> Hi,
>> after upgrading to the newest version I encountered a problem.
>>
>> With python2-pycuda 2015.1.3-6 and boost-libs 1.60.0-1 the following 
>> error gets thrown:
>>
>> Traceback (most recent call last):
>>   File "test.py", line 1, in 
>> import pycuda.autoinit
>>   File "/usr/lib/python2.7/site-packages/pycuda/autoinit.py", line 2, 
>> in 
>> import pycuda.driver as cuda
>>   File "/usr/lib/python2.7/site-packages/pycuda/driver.py", line 5, in 
>> 
>> from pycuda._driver import *  # noqa
>> ImportError: libboost_python.so.1.59.0: cannot open shared object 
>> file: No such file or directory
>> [soaked@Archibald Documents]$ python2 test.py
>> [soaked@Archibald Documents]$ python2 test.py
>> Traceback (most recent call last):
>>   File "test.py", line 1, in 
>> import pycuda.autoinit
>>   File "/usr/lib/python2.7/site-packages/pycuda/autoinit.py", line 9, 
>> in 
>> context = make_default_context()
>>   File "/usr/lib/python2.7/site-packages/pycuda/tools.py", line 199, 
>> in make_default_context
>> return ctx_maker(dev)
>>   File "/usr/lib/python2.7/site-packages/pycuda/tools.py", line 162, 
>> in ctx_maker
>> return dev.make_context()
>> TypeError: No to_python (by-value) converter found for C++ type: 
>> boost::shared_ptr
>> ---
>> PyCUDA ERROR: The context stack was not empty upon module cleanup.
>> ---
>> A context was still active when the context stack was being
>> cleaned up. At this point in our execution, CUDA may already
>> have been deinitialized, so there is no way we can finish
>> cleanly. The program will be aborted now.
>> Use Context.pop() to avoid this problem.
>> ---
>> Abgebrochen (Speicherabzug geschrieben)
>>
>>
>> the source code only contains:
>>
>> import pycuda.autoinit
>>
>>
>> After downgrading to a previous version of pycuda and downgrading 
>> boost-libs to 1.59 everything works fine again.
>>
>> I'm on Arch Linux
>>
>> Good Evening
>> Robert Zimmermann
>>
>>
>> ___
>> PyCUDA mailing list
>> PyCUDA@tiker.net
>> http://lists.tiker.net/listinfo/pycuda
>
>
> ___
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda


___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] GPU selection (round robin)

2015-12-03 Thread Andreas Kloeckner

Keith Brown  writes:

> I have several GPUs and I want to distribute my tasks to each GPU. I
> would like to use multiprocessing.Pool() to accomplish it
>
> import random
> import pycuda.gpuarray as gpuarray
> import atexit
> import pycuda.driver as cuda
> import pycuda.autoinit as autoinit
> import time
> import numpy as np
> import skcuda.linalg as linalg
> import skcuda
> import multiprocessing as mp
> import pycuda.driver as drv
>
> def my_proc(n):
> drv.init()
> dev = drv.Device(n)
> ctx = dev.make_context()
> atexit.register(ctx.pop)
> linalg.init()
> a=np.ones((185,185)).astype(np.float32)
> a_gpu = gpuarray.to_gpu(a)
> c_gpu = linalg.dot(a_gpu,a_gpu)
> return c_gpu.get()
>
> r=[]
> Pool=mp.Pool(10)
> for i in range(1000):
>   Pool.apply_async(my_proc,(random.randint(0,1),))
>
>
> I keep getting
> pycuda._driver.LogicError: cuCtxPopCurrent failed: invalid device context
>
> Is there somthing I should be doing?

If you have the GPUs set to exclusive mode, then PyCUDA should take care
of round-robin device selection on its own.

Andreas


___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Pycuda and RDMA transfer

2015-10-30 Thread Andreas Kloeckner

Baskaran Sankaran  writes:
> Apologies for emailing you directly. I did subscribe to the PyCuda mailing
> list, but my request is not approved yet.

There is no approvals process. It's likely that the subscription request
went to your spam folder. I've CC'd the list. Maybe someone on there
knows.

> I have been using PyCuda in recent times to parallelize Theano across two
> GPUs and I should say that it has been really useful. For example, I was
> able to achieve 1.85x speedup of our Neural MT with Pycuda over the single
> GPU version.

I'm happy to hear you're finding the software useful.

> I am now trying to see if I could parallelize it across more gpus. However,
> the gpus in this case are connected through socket-level links  and not
> through PCI-e switches. Here is the topology of a typical node in the gpu
> cluster.
>
> [xc181] ~ $  nvidia-smi topo -m
>   GPU0GPU1GPU2GPU3mlx4_0  CPU Affinity
> GPU0   X  PIX SOC SOC PHB 0-7
> GPU1  PIX  X  SOC SOC PHB 0-7
> GPU2  SOC SOC  X  PIX SOC 8-15
> GPU3  SOC SOC PIX  X  SOC 8-15
> mlx4_0PHB PHB SOC SOC  X
>
> Legend:
>
>   X   = Self
>   SOC = Path traverses a socket-level link (e.g. QPI)
>   PHB = Path traverses a PCIe host bridge
>   PXB = Path traverses multiple PCIe internal switches
>   PIX = Path traverses a PCIe internal switch
>
> So, I wonder whether the PyCuda peer-to-peer copy (memcpy_peer) will work
> for these socket-level links. I am unable to test this in the cluster here,
> because the GPUDirect is enabled only between pairs of gpus (0-1 and 2-3).
> However, from the nvidia website, it seems the GPUDirect v3 supports RDMA
> that allows these kinds of transfers (across two nodes or across
> socket-linked nodes).
>
> https://developer.nvidia.com/gpudirect
> http://devblogs.nvidia.com/parallelforall/benchmarking-gpudirect-rdma-on-modern-server-platforms/
>
> I must admit that I am not very familiar with the differences in the
> technologies and so my understanding could be incorrect.
> So, my question here is whether PyCuda memcpy_peer will support the RDMA
> style GPUDirect transfers? Any info will be greatly appreciated.

Sorry, I haven't used this technology myself, so I simply don't
know. What I can say is that if any amount of control over this is
available through the CUDA API, that same level of control should also
be achievable through PyCUDA.

Maybe someone on the list has an idea.

Hope that helps at least a bit,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] General question about CUDA compiler and early returns

2015-10-26 Thread Andreas Kloeckner

Walter White  writes:

> Hello,
>
> I have a question and hope that you can help me.
> I am trying to find the bottleneck in my code but I can't get a
> grip at the moment.
>
> For a while I thought it was the writes to global memory
> At the moment I am using an early "return" statement in my
> code to skip parts of the code, e.g. a for-loop.
>
> Now I am wondering if this is working at all.
> Could it be that the code exits even way before
> the "return" statement when the compiler recognizes that
> calculations done in a for-loop are not written to
> global memory or used anywhere else?

The real way to tell is to look at the PTX. But, generally, yes, if you
don't write results to global, I think the Nv compiler will get rid of
your entire kernel.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] device pointer to cudaArray?

2015-10-22 Thread Andreas Kloeckner

Luke Pfister  writes:

> I'm trying to allocate a 3D cudaArray in PyCUDA, then pass the pointer to
> this array through Cython.  T
>
> This is easy to do with 'regular' memory on the device via the GPUarray
> class;  I can just cast GPUarray.gpudata to long and then pass through
> Cython.
>
> Is there a way to do something similar with a pycuda.driver.Array?  I don't
> see a way to get to the device pointer.

There isn't currently, but it's not hard to patch in.

Add a handle_int (or some such) function here:
https://github.com/inducer/pycuda/blob/0797e9390c8c85034cf71ccc46f54fa158da92c4/src/cpp/cuda.hpp#L1060

that returns the handle pointer cast to an integer. Realize that you
just assumed part of the responsibility for management of the lifetime
of the handle.

Wrap you new handle_int here:

https://github.com/inducer/pycuda/blob/master/src/wrapper/wrap_cudadrv.cpp#L1422

Leave handle_int undocumented. Submit a pull request.

HTH,
Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Sharing a common index within a block

2015-10-18 Thread Andreas Kloeckner

Joe  writes:
> in the meantime I added a scan function to find out how many
> indices will be written by a specific thread.
> These results are written to shared memory and it works
> fine.
>
> However, the final writing of the results to global memory
> is very slow and takes up nearly all the time. (18 seconds out of 20).
>
> Is there something I am missing in the following code?
>
> int cnt
> //holds the index where this thread starts to write to the global array
> //this is computed by each thread earlier
>
> thrd_chk_start and thrd_chk_end are the set the data that
> each thread processes.
> Typically (thrd_chk_end - thrd_chk_start) is between 25 and 100.
>
>
> for (int i = thrd_chk_start; i < thrd_chk_end; i++)
> {
>  if(condition)
>  {
>  out[(hIdx * nearNeigh_n) + cnt ] = i;
>  cnt += 1;
>  }
> }
>
> The line with out[...] is very slow, does anyone know if there is
> a reason for that? Indices not known to compiler beforehand or whatever?
> All other writes to global memory are way way faster than this.

Depending on how scattered these writes are, it might be helpful to turn
off caching for them. See the CUDA docs for how.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Questions on pinned memory

2015-10-05 Thread Andreas Kloeckner

Walter White  writes:

> Hello,
>
> I have a question about pinned memory and hope that you can help me.
>
> I found out that copying data from device to host takes
> a very big part of my runtime, so I read about the issue
> and came across "pinned memory".
>
> There are several examples on the mailing list but I am not
> sure if I am doing this the right way.
>
> Do I need to initialize with drv.ctx_flags.MAP_HOST
> or is this automatically activated if one of the
> functions below is used?
>
> drv.init()
> dev = drv.Device(0)
> ctx = dev.make_context(drv.ctx_flags.SCHED_AUTO | drv.ctx_flags.MAP_HOST)

No, this is necessary.

> Is drv.mem_host_register_flags.DEVICEMAP also needed if
> the context is initialized with drv.ctx_flags.MAP_HOST ?
>
> I found several methods that should do this
> but none of them seems to work.
> Are they all equivalent?
>
> --
> x = drv.register_host_memory(x, flags=drv.mem_host_register_flags.DEVICEMAP)
> x_gpu_ptr = np.intp(x.base.get_device_pointer())
>
> --
> x = drv.pagelocked_empty(shape=x.shape, dtype=np.float32,
> mem_flags=drv.mem_host_register_flags.DEVICEMAP)
> --
>
> from pycuda.tools import PageLockedMemoryPool
> pool = PageLockedMemoryPool()
> x_ptr = pool.allocate(dest.shape , np.float32)
> --

The former two are equivalent. The latter just uses 'page-locked' memory
(which *can* be pinned, but normally isn't).

> If I use
> np.intp(x.base.get_device_pointer())
> and
> drv.memcpy_dtoh(a_gpu, x_ptr)
>
> there is an error message
>
> "BufferError: Object is not writable."

This is a sign that it worked--the memory is no longer writable host-side.

Andreas

___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] updating pycuda pypi package

2015-06-30 Thread Andreas Kloeckner

Tomasz Rybak tomasz.ry...@post.pl writes:

 Dnia 2015-06-30, wto o godzinie 14:50 -0500, Andreas Kloeckner pisze:
 [ cut ]

  configure.py is incorrect. It contains:
  from __future__ import
  #! /usr/bin/env python
  
  and because shebang is not in the first line I have troubles
  building Debian package.
 
 Uh-oh, thanks for the report.

 Thanks, but ... uhm - configure.py still has shebang linein incorrect place.

Ouch, sorry. I fixed setup.py, but missed configure.py. 2015.1.2 is out. :)

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] updating pycuda pypi package

2015-06-16 Thread Andreas Kloeckner

Scott,

Scott Gray sg...@nervanasys.com writes:
 I'm curious if you noticed this little project:

 https://github.com/NVIDIA/pynvrtc

 With cuda 7, it seems like that could be leveraged to replace forking off
 an instance of nvcc.  I think compiling cuda-c in this way should be much
 faster.  I'm too busy to play with it right now, but I was wondering if you
 had any plans to integrate this?

I wasn't aware of this, thanks for pointing it out! It'd be great to
integrate this. I personally can't spare the time right now, but I'd
love to take a patch.


 Btw, thanks for all the work you've put into pycuda.  It's truly a pleasure
 to program in.  I only wish I'd written my assembler in python to start out
 with so I could integrate it with pycuda and get dynamically generated
 assembly at run time.  One of these days I'll port it over.  I think my
 perl days are over.

Glad to hear you're finding it useful!

Andraes



signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] updating pycuda pypi package

2015-06-16 Thread Andreas Kloeckner

Alex Park a...@nervanasys.com writes:
 Was wondering if it would be possible for you to submit the more recent
 pycuda version up to pypi to serve as the default version.  We've made
 nervanagpu dependent on some of the async features and are concerned that
 some users might have trouble figuring out how to grab the github or tiker
 versions.

Done, 2015.1 is out.

Andreas


signature.asc
Description: PGP signature
___
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

1 2 3 4 5 6 >

1 - 100 of 588 matches

Mail list logo